Like emo music, Bigfoot peaked in the mid-2000s
In case it wasn’t clear from the picture, I’m not actually a Bigfoot enthusiast. My original plan was to plot election data but that felt a little too heavy. I need to pace myself if I’m going to make it another six months to election day.
Fortunately, a lot of people out there are Bigfoot enthusiasts and they have more data than you’d ever hope to see. I’ll be using this dataset, which contains geolocated sightings compiled by the Bigfoot Field Research Organization (BFRO). According to the BFRO, they were founded in 1995 and are “widely considered the most credible and respected investigative network involved in the study of [Bigfoot].”
Good enough for me.
The plan is to make a scatter plot of all sightings since 1960 on a U.S. map, and a histogram below the map to display sightings per year. I want to go a little further and plot each year as its own image and turn the frames into an animated GIF. This will make it clear to viewers how Bigfoot grew in popularity, peaked in the mid-2000s, and has since gone out of style.
1. Prepare the data.
Begin by reading the dataset into a pandas DataFrame. Drop rows where date or location data is missing.
import pandas as pd import geopandas as gpd import matplotlib.pyplot as plt df = pd.read_csv("bfro_reports_geocoded.csv", parse_dates=["date"]) df.dropna(subset=["date", "latitude", "longitude"], inplace=True)
The goal is to group the data into annual bins so let’s create a year column. The most recent full year in the dataset is 2022 so that will be the final frame of our visualization.
df.loc[:, 'year'] = df['date'].dt.strftime("%Y").astype(int) df = df[df['year'] <= 2022]
It’s safe to eliminate most columns from the DataFrame. Redefine df
and take a closer look.
df = df[["year", "date", "latitude", "longitude"]] print(df.head()) print(df.shape[0])
The output is below. You can see we have 4,102 clean rows and latitude/longitude appear to be in the correct geographic region. In a moment we’ll turn this DataFrame into a scatter plot.
year date latitude longitude 1 2005 2005-12-03 37.58135 -81.29745 2 2005 2005-10-08 43.46540 -72.70510 3 1984 1984-04-08 37.22647 -81.09017 4 1996 1996-12-22 32.79430 -95.54250 7 1974 1974-09-20 41.45000 -71.50000 4102
First let’s worry about the histogram. Count the number of sightings per year and store that information in a new DataFrame.
Normally value_counts()
returns a pandas Series object, i.e. a simple list of values. By including reset_index()
we instead create a DataFrame with two data columns (year, count) and an index.
Sightings occurring after 2022 are already filtered but let’s also exclude rows before 1960, our start date.
df_histogram = df['year'].value_counts().reset_index() df_histogram = df_histogram[df_histogram['year'] >= 1960]
We’ll use the geopandas library to plot geospatial data. You can tell from the name it’s designed to work closely with pandas.
GeoDataFrames (gdf
below) are an analog to pandas DataFrames. You can manipulate columns the same way and even call a lot of the same methods. The important difference is that GeoDataFrames have a geometry column. This column is the key to translating numerical data into a map.
Geographic data is usually stored as a shapefile. In this case we’re using a U.S. map shapefile which I’ll link at the bottom of this post.
gdf = gpd.read_file("shapefile/cb_2018_us_state_20m.shp", epsg=4326)
Now we have everything we need to create the visualization:
- A DataFrame with Bigfoot sighting coordinates.
- A DataFrame with yearly counts for the histogram.
- A GeoDataFrame to create a U.S. map.
2. Plot the data.
The script will use a large for
loop to iterate through each year from 1960 to 2022. I’ll do my best to make things clear as I break the code into pieces but, as always, the full code can be found at the bottom of this post.
The first priority is to make a copy
of the original sightings DataFrame. This step allows us to silo each year from one another.
It’s tempting to write something like df2 = df[df['year'] == 1960]
. But creating multiple views of a DataFrame can cause serious problems on pandas’ back-end. It’s better to be safe and create a copy
when utilizing a loop like this.
for n in range(1960, 2023): df2 = df.copy()
I want to add another dimension to the visualization and show previous years’ sightings with reduced alpha
. In other words, dots from the previous year will be visible but they will begin to fade out. Sightings from five years earlier will be faintly visible and after six years they will be gone completely.
Filter the DataFrame so it contains data from the current year and the previous five years. Create a new alpha column which we can later pass into Matplotlib’s scatter()
. You can see from the lambda function that sightings from the current year will have alpha=1.0 while the faintest sightings from five years ago will have alpha=0.1.
for n in range(1960, 2023): [...] df2 = df2[(n - 5 <= df2['year']) & (df2['year'] <= n)] df2.loc[:, 'alpha'] = df2['year'].apply(lambda x: 1 - (n - x) * 0.18)
Now we can begin the familiar work of Matplotlib. You’ll see this figure has two subplots (ax0
and ax1
) of unequal size. The upper plot will be a U.S. map and it will be 4x larger than the histogram below it.
color_primary = "#E24A33" color_secondary = "#202020" for n in range(1960, 2023): [...] plt.style.use("bigfoot.mplstyle") fig, (ax0, ax1) = plt.subplots(2, 1, height_ratios=[4, 1])
First draw the U.S. map onto ax0
. Just like when using pandas, you can call .plot()
on a GeoDataFrame.
for n in range(1960, 2023): [...] gdf.plot(ax=ax0, facecolor="#fdf2d9", edgecolor="black", linewidth=0.5)
Then plot a scatter
of sightings onto the map. Pass the alpha column to the alpha
parameter.
Take care of setting x- and y-axis window limits, which are in units of longitude and latitude respectively.
Use text
to display the current year somewhere north of the Great Lakes. Each plot will be one frame of an animated GIF so it will help to clearly label each year.
for n in range(1960, 2023): [...] ax0.scatter(df2['longitude'], df2['latitude'], color=color_primary, s=120, edgecolor="#444444", linewidth=0.5, alpha=df2['alpha']) ax0.set_xlim(-125.2, -65.7) ax0.set_ylim(24, 50) ax0.text(-79, 47, n, size=32, ha="center")
That completes the upper subplot. Now we need to create a histogram on ax1
.
The plan is for most of the histogram’s bars to match the scatter, color_primary
, but the current year’s bar will be set apart with color_secondary
. Create a new column to hold color data and pass it when calling bar()
.
Again take care of window limits, this time in regular units.
for n in range(1960, 2023): [...] df_histogram.loc[:, 'color'] = df_histogram['year'].apply(lambda x: color_secondary if x == n else color_primary) ax1.bar(df_histogram['year'], df_histogram['count'], color=df_histogram['color']) ax1.set_xticks(range(1960, 2030, 5)) ax1.set_xlim(1958.5, 2026.5) ax1.set_yticks(range(0, 300, 50))
It’s usually best to call set_axis_off()
when plotting maps. It hides borders, ticks, etc. surrounding the geospatial data.
Save each year’s image with its own unique filename. Remember we’re going to turn these frames into an animated GIF.
for n in range(1960, 2023): [...] ax0.set_axis_off() plt.savefig(f"frames/{n}.png")
3. The output.
I should mention that Matplotlib can create animations natively but I find them to be a little unwieldy. And anyone who wanted to see the output would need their own Python environment, etc.
For image editing I like to use GIMP. It’s essentially a free but slightly less feature-rich version of Photoshop. GIMP is a great option if you’re like me and wouldn’t know what to do with 90% of Photoshop’s features anyway. I’ll skip the tutorial and get to the results:
Notice how sightings appear and then fade over time. And the histogram’s highlighted bar moves with each passing year.
The Pacific Northwest seems to have been Bigfoot’s preferred habitat from the start. But he also spends plenty of time in East Texas, Florida, and Appalachia.
The famous Patterson-Gimlin film was taken in 1967. I chose to start the plot at 1960 so you can clearly see the ensuing rise in Bigfoot sightings.
The Patterson–Gimlin film (1967).
Reports then tailed off for another 15 years. My guess is that the 90s rise was related to the early internet’s growth. Online communities allowed people to come together, share stories, and create a sort of echo chamber. Not to mention the constant drumbeat of paranormal investigator shows on the History Channel, etc.
Then, like emo music, Bigfoot sightings peaked in the mid-2000s. To be fair, I can only plot the data I have. Maybe folks today are seeing Bigfoot so often that they’ve gotten tired of filing reports.
“And you suspect what?” Scully asked. “Bigfoot maybe?”
“Not likely,” Mulder answered deadpan. “That’s a lot of flannel to choke down. Even for Bigfoot.”
Scully sighed. She should have known better than to joke about Bigfoot to Mulder. Bigfoot wasn’t a joke to him.
—Darkness Falls (The X Files, No. 2)
Download shapefile & mplstyle.
Full code:
import pandas as pd import geopandas as gpd import matplotlib.pyplot as plt df = pd.read_csv("bfro_reports_geocoded.csv", parse_dates=["date"]) df.dropna(subset=["date", "latitude", "longitude"], inplace=True) df.loc[:, 'year'] = df['date'].dt.strftime("%Y").astype(int) df = df[df['year'] <= 2022] df = df[["year", "date", "latitude", "longitude"]] df_histogram = df['year'].value_counts().reset_index() df_histogram = df_histogram[df_histogram['year'] >= 1960] gdf = gpd.read_file("shapefile/cb_2018_us_state_20m.shp", epsg=4326) color_primary = "#E24A33" color_secondary = "#202020" for n in range(1960, 2023): df2 = df.copy() df2 = df2[(n - 5 <= df2['year']) & (df2['year'] <= n)] df2.loc[:, 'alpha'] = df2['year'].apply(lambda x: 1 - (n - x) * 0.18) plt.style.use("bigfoot.mplstyle") fig, (ax0, ax1) = plt.subplots(2, 1, height_ratios=[4, 1]) gdf.plot(ax=ax0, facecolor="#fdf2d9", edgecolor="black", linewidth=0.5) ax0.scatter(df2['longitude'], df2['latitude'], color=color_primary, s=120, edgecolor="#444444", linewidth=0.5, alpha=df2['alpha']) ax0.set_xlim(-125.2, -65.7) ax0.set_ylim(24, 50) ax0.text(-79, 47, n, size=32, ha="center") df_histogram.loc[:, 'color'] = df_histogram['year'].apply(lambda x: color_secondary if x == n else color_primary) ax1.bar(df_histogram['year'], df_histogram['count'], color=df_histogram['color']) ax1.set_xticks(range(1960, 2030, 5)) ax1.set_xlim(1958.5, 2026.5) ax1.set_yticks(range(0, 300, 50)) ax0.set_axis_off() plt.savefig(f"frames/{n}.png")