Government, History, Maps

Who was in the room on July 4, 1776?

With Independence Day coming this weekend I thought I’d take a closer look at the men who signed the Declaration of Independence. How old were they? Where were they from? What did they do? What did their families look like? I found a great dataset at archives.gov to help answer those questions.


1. The Signer’s ages.

Begin with the imports. Like usual we’ll primarily use pandas to process the data. Seaborn and GeoPandas generate plots, which both use Matplotlib on the backend. One module I haven’t previously used on the blog is GeoPy. It can interface with Google’s Maps API and convert city-state-country information into latitude-longitude pairs.

import pandas as pd
from geopy.geocoders import GoogleV3
from numpy import timedelta64
import matplotlib.pyplot as plt
import seaborn as sns
import geopandas as gpd
from collections import Counter

Read the dataset into a dataframe and convert dates to Timestamp objects. A few of the dates are approximations so we can’t use pd.to_datetime on whole columns. We’ll write our own function and assume those dates fall in the middle of the year.

def parse_circa_dates(text):
    if "c." in text:
        return pd.Timestamp(f'July 1 {text.strip("c.")}')
    else:
        return pd.Timestamp(text)


df = pd.read_csv("declaration_signers.csv")

df.loc[:, "birth_date"] = df["birth_date"].apply(parse_circa_dates)
df.loc[:, "death_date"] = df["death_date"].apply(parse_circa_dates)

Now that birth and death columns are in datetime form, we can calculate everyone’s age at the time they signed the Declaration of Independence. I want the resulting histograms to have units of years on the x-axis so each value is divided by a numpy timedelta64. We’ll use the official Old Glory colors according to the US State Department’s style guide.

def plot_age_histogram(dframe, filename):
    sns.set(font="Ubuntu Condensed", font_scale=1.4)
    fig, ax = plt.subplots(1, 2, figsize=(15, 6))    # 1 row, 2 columns
    fig.subplots_adjust(left=0.053, right=0.978, bottom=0.13, top=0.935)

    year_hist = sns.histplot(x=dframe["birth_year"], bins=range(1710, 1755, 5),
                             color="#b31942", alpha=1.0, ax=ax[0])
    year_hist.set_xticks(range(1710, 1755, 5))
    year_hist.set_yticks(range(0, 14, 2))
    year_hist.set_ylim(0, 12.5)
    year_hist.set_title("Birth Year", size=20)
    year_hist.set_xlabel("Year", labelpad=9)
    year_hist.set_ylabel("Count", labelpad=8)

    age_hist = sns.histplot(x=dframe["age_at_signing"], bins=range(25, 80, 5),
                            color="#0a3161", alpha=1.0, ax=ax[1])
    age_hist.set_xticks(range(25, 80, 5))
    age_hist.set_yticks(range(0, 14, 2))
    age_hist.set_ylim(0, 12.5)
    age_hist.set_title("Age on July 4, 1776", size=20)
    age_hist.set_xlabel("Age (Years)", labelpad=9)
    age_hist.set_ylabel("Count", labelpad=8)

    plt.savefig(filename)
    return


df.loc[:, "birth_year"] = df["birth_date"].apply(lambda x: int(x.strftime("%Y")))
df.loc[:, "age_at_signing"] = df["birth_date"].apply(
    lambda x: (pd.Timestamp("July 4 1776") - x) / timedelta64(1, "Y"))
plot_age_histogram(df, "age_histogram_1x2.png")

The output:


2. The Signer’s birth places.

Next I want to plot the signers’ birth places on a US map. To do that we’ll need to convert plain-English addresses into latitude-longitude data that GeoPandas can understand. This process is called “geocoding.” There are lots of APIs available to do this—some less expensive than others. Here I’ll interface with Google’s free-tier Maps API.

Note that if you want to recreate this plot you’ll have to register for the service and generate your own API key. Alternatively you can geocode using other free services like GeoNames or Nominatim. You can find examples on GeoPy‘s Github.

The dataset contains separate city and country columns so I’ll concatenate those columns and place the resulting lookup_address into a new column. This is the information that will be sent to Google’s API. It’s a little tricky because some rows in the city column are empty, so first replace any NaN values with empty strings.

def geocode(address):
    api_key = "XXXXXXXXXXXXXXXXX"
    geo = GoogleV3(api_key=api_key)
    loc = geo.geocode(address)
    lat, lon = loc.latitude, loc.longitude
    return (lat, lon)


df.loc[:, "lookup_address"] = df["birth_city"].fillna("").astype(str) + \
                              " " + df["birth_country"].fillna("").astype(str)
df.loc[:, "lat_lon"] = df["lookup_address"].apply(geocode)

With a lat_lon column in hand it’s time to plot these locations on a US map shapefile. There’s a lot going on here and it might help to read my previous post about GeoPandas. In short, a shapefile is a format for storing geographic data. We can plot a birthplace scatter plot on top of a US map shapefile and get a nice look at the data.

The main real-world caveat is that state (colony) boundaries looked somewhat different in 1776. Vermont didn’t yet exist independently, for example. We’ll also limit this plot to the eastern US, which will omit 8 of the 56 signers who were born in Europe.

def plot_birth_places(dframe, filename):
    gdf = gpd.read_file("shapefile/cb_2018_us_state_20m.shp", epsg=4326)
    fig, ax = plt.subplots(figsize=(5.25, 7.5))
    fig.subplots_adjust(left=0.0, right=1.0, bottom=0.015, top=0.943)
    us_map = gdf.plot(ax=ax, color="#fdf2d9", edgecolor="black", linewidth=0.7)
    lat, lon = zip(*dframe["lat_lon"].tolist())
    us_map.scatter(lon, lat, color="#b31942", s=60, alpha=0.6)
    us_map.set_xlim(-83.8, -66.9)
    us_map.set_ylim(31.5, 48)
    us_map.set_title("D.O.I. Signers' Birth Places", size=22)
    us_map.annotate(text="Not Pictured:\nEngland (2)\nIreland (2)\nScotland (2)\nNorthern Ireland (1)\nWales (1)",
                    xy=(-73, 36), size=11)
    us_map.annotate(text="Source:  https://www.archives.gov/founding-docs/signers-factsheet",
                    xy=(-67, 31.5), size=8, ha="right")
    us_map.set_axis_off()
    plt.savefig(filename, facecolor="#c2efff", dpi=300)
    return


plot_birth_places(df, "signers_birth_places.png")

The output:


3. Discussion.

I want to shine some light on just a couple more corners of the data. I’ll resist the urge to create more histograms.

I wondered how many of the 56 signers didn’t live to see the end of the Revolutionary War in 1783. Remember there were 7 full years between the Declaration of Independence and England finally relenting.

died = df[df["death_date"] < pd.Timestamp("September 3, 1783")].shape[0]
print(f"{died}/{df.shape[0]} signers died before the end of the Revolutionary War.\n")

# 9/56 signers died before the end of the Revolutionary War.

The above code creates a separate dataframe by filtering out all the signers who were still alive, then checks how many rows remain. Another approach would be to create a new column of boolean values that indicate whether the signer died by the cutoff date and count how many True values exist.

Next let’s take inventory of the signers’ occupations. Many of them had multiple occupations so what’s the best way to dump all those comma-separated strings into a flat list? Let’s change out of our official Pandas Ambassador™ uniforms for a moment.

occupations = df["occupation"].str.split(",").tolist()
all_occupations = [job.strip() for item in occupations for job in item]

After this list comprehension we have a regular Python list, not a pandas Series. Instead of using the value_counts Series method we can accomplish the same thing using collections.Counter from the standard library. Its most_common method sorts values in descending order.

for item in Counter(all_occupations).most_common():
    print(f"{item[1]:>2} {item[0]}")

The output:

25 Lawyer
17 Merchant
14 Plantation Owner
 4 Physician
 3 Scientist
 2 Land Speculator
 2 Minister
 2 Farmer
 1 Surveyer
 1 Printer
 1 Land owner
 1 Musician
 1 Military Officer

As you can see there was never a shortage of lawyers in politics.

Finally let’s check number of kids, marriages, and the median signer lifespan. pandas makes it very easy to calculate these column-wise descriptive statistics.

avg_kids = df["children"].mean()

avg_marriages = df["marriages"].mean()

df.loc[:, "lifespan"] = (df["death_date"] - df["birth_date"]) / timedelta64(1, "Y")
median_lifetime = df["lifespan"].median()

print(f"\nThe signers had an average of {avg_marriages:.2f} marriages and {avg_kids:.2f} kids.")
print(f"\nThe median lifetime was {median_lifetime:.2f} years.")

# The signers had an average of 1.27 marriages and 6.13 kids.
# The median lifetime was 65.27 years.

I hope you learned something new about the signers. Happy 4th of July!


Source: www.archives.gov

Download the data.

Full code:

import pandas as pd
from geopy.geocoders import GoogleV3
from numpy import timedelta64
import matplotlib.pyplot as plt
import seaborn as sns
import geopandas as gpd
from collections import Counter


def parse_circa_dates(text):
    if "c." in text:
        return pd.Timestamp(f'July 1 {text.strip("c.")}')
    else:
        return pd.Timestamp(text)


def plot_age_histogram(dframe, filename):
    sns.set(font="Ubuntu Condensed", font_scale=1.4)
    fig, ax = plt.subplots(1, 2, figsize=(15, 6))    # 1 row, 2 columns
    fig.subplots_adjust(left=0.053, right=0.978, bottom=0.13, top=0.935)

    year_hist = sns.histplot(x=dframe["birth_year"], bins=range(1710, 1755, 5),
                             color="#b31942", alpha=1.0, ax=ax[0])
    year_hist.set_xticks(range(1710, 1755, 5))
    year_hist.set_yticks(range(0, 14, 2))
    year_hist.set_ylim(0, 12.5)
    year_hist.set_title("Birth Year", size=20)
    year_hist.set_xlabel("Year", labelpad=9)
    year_hist.set_ylabel("Count", labelpad=8)
    
    age_hist = sns.histplot(x=dframe["age_at_signing"], bins=range(25, 80, 5),
                            color="#0a3161", alpha=1.0, ax=ax[1])
    age_hist.set_xticks(range(25, 80, 5))
    age_hist.set_yticks(range(0, 14, 2))
    age_hist.set_ylim(0, 12.5)
    age_hist.set_title("Age on July 4, 1776", size=20)
    age_hist.set_xlabel("Age (Years)", labelpad=9)
    age_hist.set_ylabel("Count", labelpad=8)
    
    plt.savefig(filename)
    return


def geocode(address):
    api_key = "XXXXXXXXXXXXXXXXXXXXXXXXXXXX"
    geo = GoogleV3(api_key=api_key)
    loc = geo.geocode(address)
    lat, lon = loc.latitude, loc.longitude
    return (lat, lon)


def plot_birth_places(dframe, filename):
    gdf = gpd.read_file("shapefile/cb_2018_us_state_20m.shp", epsg=4326)
    fig, ax = plt.subplots(figsize=(5.25, 7.5))
    fig.subplots_adjust(left=0.0, right=1.0, bottom=0.015, top=0.943)
    us_map = gdf.plot(ax=ax, color="#fdf2d9", edgecolor="black", linewidth=0.7)
    lat, lon = zip(*dframe["lat_lon"].tolist())
    us_map.scatter(lon, lat, color="#b31942", s=60, alpha=0.6)
    us_map.set_xlim(-83.8, -66.9)
    us_map.set_ylim(31.5, 48)
    us_map.set_title("D.O.I. Signers' Birth Places", size=22)
    us_map.annotate(text="Not Pictured:\nEngland (2)\nIreland (2)\nScotland (2)\nNorthern Ireland (1)\nWales (1)",
                    xy=(-73, 36), size=11)
    us_map.annotate(text="Source:  https://www.archives.gov/founding-docs/signers-factsheet",
                    xy=(-67, 31.5), size=8, ha="right")
    us_map.set_axis_off()
    plt.savefig(filename, facecolor="#c2efff", dpi=300)
    return


df = pd.read_csv("declaration_signers.csv")

df.loc[:, "birth_date"] = df["birth_date"].apply(parse_circa_dates)
df.loc[:, "death_date"] = df["death_date"].apply(parse_circa_dates)

df.loc[:, "birth_year"] = df["birth_date"].apply(lambda x: int(x.strftime("%Y")))
df.loc[:, "age_at_signing"] = df["birth_date"].apply(lambda x: (pd.Timestamp("July 4 1776") - x) / timedelta64(1, "Y"))
plot_age_histogram(df, "age_histogram_1x2.png")

df.loc[:, "lookup_address"] = df["birth_city"].fillna("").astype(str) + " " + df["birth_country"].fillna("").astype(str)
df.loc[:, "lat_lon"] = df["lookup_address"].apply(geocode)
plot_birth_places(df, "signers_birth_places.png")

died = df[df["death_date"] < pd.Timestamp("September 3, 1783")].shape[0]
print(f"{died}/{df.shape[0]} signers died before the end of the Revolutionary War.\n")

occupations = df["occupation"].str.split(",").tolist()
all_occupations = [job.strip() for item in occupations for job in item]
for item in Counter(all_occupations).most_common():
    print(f"{item[1]:>2} {item[0]}")

avg_kids = df["children"].mean()
avg_marriages = df["marriages"].mean()
df.loc[:, "lifespan"] = (df["death_date"] - df["birth_date"]) / timedelta64(1, "Y")
median_lifetime = df["lifespan"].median()
print(f"\nThe signers had an average of {avg_marriages:.2f} marriages and {avg_kids:.2f} kids.")
print(f"\nThe median lifetime was {median_lifetime:.2f} years.")