Government

Wait, there’s a 91-year-old senator?

Yes, there is. In fact, when Republicans gained a majority in the chamber earlier this year, Sen. Chuck Grassley (R-Iowa) reclaimed his position as chairman of the Senate Judiciary Committee. When you spend a lot of time in Congress, you tend to grow into positions of leadership.

I’m not here to bash the elderly. I’m not even going to make an argument for term limits. But I think it’s a great time to take stock of Congress’ age and let the facts guide our path forward.

Grassley studying at the University of Northern Iowa. He would earn a master’s degree in 1956.

1. Prepare the data.

My plan is to calculate every Congress members’ age and create a “dual histogram.” We can combine both chambers, House and Senate, but Democrats and Republicans will be counted separately. One party’s histogram bars will extend upward and the other downward, making for easy comparison.

There’s an up-to-date, comprehensive dataset on Github that holds information about legislators dating back to 1789. Grab these two files and drop them into your project folder:

  • legislators-current.yaml
  • legislators-historical.yaml

A yaml file is very similar to a json file. Both are data serialization formats with nested key-value pairs. The entries are a little too long to copy & paste here, but open the files in a text editor and see for yourself. It’s much easier to read than json.

We’re only interested in the current Congress, which began in January 2025, but we might as well load legislators-historical.yaml at the same time. Let’s write a function that reads yaml files and returns a pandas DataFrame.

An efficient way to create a DataFrame is to pass a dictionary. We can step through each Congress member’s data, handle edge cases where birthday or party data is missing, and append values to the appropriate lists. When that’s completed, let’s create a dictionary and pass it to pd.DataFrame. Notice that every individual term of service becomes its own row. For example, John F. Kennedy served three terms in the House and two in the Senate, so our DataFrame will have five JFK rows. We repeat the process for both files and concat them into one large DataFrame. This is the data structure we’ll manipulate to create a histogram.

import yaml
import pandas as pd


def yaml_to_df(filename):

    with open(filename, "r") as f:
        data = yaml.load(f, Loader=yaml.SafeLoader)

    last_name_column = []
    first_name_column = []
    birthday_column = []
    party_column = []
    chamber_column = []
    start_column = []
    end_column = []

    for person in data:

        last_name = person['name']['last']
        first_name = person['name']['first']

        if "birthday" in person['bio']:
            birthday = pd.Timestamp(person['bio']['birthday'])
        else:
            birthday = pd.NA

        for term in person['terms']:

            last_name_column.append(last_name)
            first_name_column.append(first_name)
            birthday_column.append(birthday)
            chamber_column.append(term['type'])
            start_column.append(pd.Timestamp(term['start']))
            end_column.append(pd.Timestamp(term['end']))

            if "party" in term:
                party_column.append(term['party'])
            else:
                party_column.append(pd.NA)

    return pd.DataFrame({'last_name': last_name_column,
                         'first_name': first_name_column,
                         'birthday': birthday_column,
                         'party': party_column,
                         'chamber': chamber_column,
                         'start': start_column,
                         'end': end_column})


df = pd.concat([yaml_to_df("legislators-historical.yaml"),
                yaml_to_df("legislators-current.yaml")]
               )

df.head() looks like this:

  last_name  first_name   birthday                party chamber      start        end
0   Bassett     Richard 1745-04-02  Anti-Administration     sen 1789-03-04 1793-03-03
1     Bland  Theodorick 1742-03-21                  NaN     rep 1789-03-04 1791-03-03
2     Burke     Aedanus 1743-06-16                  NaN     rep 1789-03-04 1791-03-03
3   Carroll      Daniel 1730-07-22                  NaN     rep 1789-03-04 1791-03-03
4    Clymer      George 1739-03-16                  NaN     rep 1789-03-04 1791-03-03

I wonder how popular the name “Aedanus” is these days…

You can see that many early elected officials had no party affiliation. That arrangement wouldn’t last long, of course. Since we’re only worried about the 2025 Congress, where everything is partisan and recordkeeping is more robust, we can safely drop all rows where party or birthday data is missing.

df = df.dropna(subset=['birthday', 'party'])

This takes us from 45,514 down to 43,931 rows, a loss of about 3.5%.

To identify currently serving members, we should check where the end column is some time in the future. We can’t use start date because Senate terms span six years, so two-thirds of them began prior to 2025.

today = pd.Timestamp("April 1 2025")

df = df[df['end'] > today]

Let’s do a sanity check and see how many House and Senate members remain.

print(df['chamber'].value_counts())

The output:

chamber
rep    436
sen    100
Name: chamber, dtype: int64

This looks correct. There are 435 full House members but also a few non-voting delegates and vacant seats. We could drop delegate rows but I think it makes sense to include them.

If you aren’t familiar, a histogram divides the data range into a series of “bins” and counts how many values fall into each bin. For example, if a dozen members were between 70.0 and 70.99 years old, the 70 bar would be 12 units tall. The idea is to visualize a distribution of values.

Let’s create a column that measures each member’s age in years. We should use the dt accessor to avoid doing an apply, which would be less efficient.

df.loc[:, 'current_age'] = (today - df['birthday']).dt.days / 365.25

Then make another column that bins the values. The easiest approach is to use np.floor, which basically truncates a value at the decimal point. Remember to add 0.5 units because we want our bars to be centered between whole numbers.

from numpy import floor

df.loc[:, 'bin'] = floor(df['current_age']) + 0.5

Check out the new columns with df.head():

        last_name first_name   birthday        party chamber      start        end  current_age   bin
42662    Cantwell      Maria 1958-10-13     Democrat     sen 2025-01-03 2031-01-03    66.472279  66.5
42666   Klobuchar        Amy 1960-05-25     Democrat     sen 2025-01-03 2031-01-03    64.856947  64.5
42678     Sanders    Bernard 1941-09-08  Independent     sen 2025-01-03 2031-01-03    83.567420  83.5
42682  Whitehouse    Sheldon 1955-10-20     Democrat     sen 2025-01-03 2031-01-03    69.453799  69.5
42686    Barrasso       John 1952-07-21   Republican     sen 2025-01-03 2031-01-03    72.700890  72.5

This looks correct to me. A value between 66.0 and 66.99 goes into the 66 bin, which is centered at x=66.5.

Before moving to Matplotlib, let’s grab average ages for both Democratic and Republican members. We’ll display the values on a legend. There are currently three Independent Congress members but all three caucus with Democrats, so we can simply lump them in. If we were plotting older data we would have to be more careful.

To calculate party averages, create two separate views of df that are filtered according to party. Then call the mean method.

mean_age_dem = df[df['party'].isin(['Democrat', 'Independent'])]['current_age'].mean()
mean_age_repub = df[df['party'] == 'Republican']['current_age'].mean()

2. Plot the data.

I’ll use a custom Matplotlib style that will be linked at the bottom of this post. Set the style and create an Axes instance for plotting.

import matplotlib.pyplot as plt

plt.style.use("wollen_congress.mplstyle")

fig, ax = plt.subplots()

In a lot of situations, the best strategy to create a histogram is to call df.hist(). But we want to create a dual histogram so we’ll have to do some additional work.

We have a few options but I think the best approach is to use df.value_counts(), which will tally up the bin column for us. Then call reset_index() to convert the tally into a DataFrame. We’ll do this twice—first for Democrats and then for Republicans.

df_dem_histo = df[df['party'].isin(['Democrat', 'Independent'])]['bin'].value_counts().reset_index()

df_dem_histo.head() is shown below. The first row tells us that ten Democrats are between 70.0 and 70.99 years old.

This is all the information we need for plotting. index locates bars along the x-axis and bin tells us their height.

  index  bin
0  70.5   10
1  55.5    9
2  48.5    9
3  61.5    8
4  58.5    8

Now call bar to create the upper (Democratic) histogram. Pass columns from the DataFrame we just created. Specify color and size. zorder is set to 2 because later we’ll do some shading and we need to layer the plot correctly. This is where we use mean_age_dem from earlier. It will appear in a legend in the corner of the plot.

ax.bar(x=df_dem_histo['index'],
       height=df_dem_histo['bin'],
       color="#0270CF",
       width=0.8,
       zorder=2,
       label=f"Democrat • mean={mean_age_dem:.1f}"
       )

Repeat the process for Republicans. The only difference is that the height column is multiplied by -1. That’s because the Republican histogram is oriented downward.

df_repub_histo = df[df['party'] == 'Republican']['bin'].value_counts().reset_index()

ax.bar(x=df_repub_histo['index'],
       height=df_repub_histo['bin'] * -1,
       color="#E41B22",
       width=0.8,
       zorder=2,
       label=f"Republican • mean={mean_age_repub:.1f}"
       )

We’ll assign x- and y-axis values to variables so we can use them later to specify coordinates. y-tick labels are passed through the absolute value function (abs) because otherwise they would appear as negative numbers, which wouldn’t make sense. Republican bars are oriented downward but they still represent positive numbers.

x_ticks = range(25, 100, 5)
ax.set_xticks(x_ticks)
x_tick_range = x_ticks[-1] - x_ticks[0]
x_left, x_right = x_ticks[0] - x_tick_range * 0.02, x_ticks[-1] + x_tick_range * 0.02
ax.set_xlim(x_left, x_right)

y_ticks = range(-15, 20, 5)
ax.set_yticks(y_ticks, labels=[abs(n) for n in y_ticks])
y_tick_range = y_ticks[-1] - y_ticks[0]
y_bottom, y_top = y_ticks[0], y_ticks[-1] + y_tick_range * 0.005
ax.set_ylim(y_bottom, y_top)

Let’s plot a dark gray line at y=0. I think it helps to communicate the symmetry of the visualization. zorder is set to 3 so the line will appear on the top layer, above the bars.

ax.plot([x_left, x_right], [0, 0], color="#555", linewidth=1.2, zorder=3)

I’d like to label each generation on the plot, from Silent to Gen Z. We can use ax.fill_between to create a shading effect below the data. Labels and coordinates are stored as a list of tuples. We use a for loop to step through the list and draw alternating light- and dark-gray zones. A bool named toggle is switched back and forth to alternate color.

fill_between requires three spatial arguments: x, lower y, and upper y. Everything along the x-range and between the y series gets shaded. I like to use a low alpha (very transparent) so grid lines are still visible behind the color. We again specify zorder to ensure it’s drawn on the bottom layer.

text places the generation’s name at the center of the x-range, near the bottom of the plot. The optional bbox parameter aids readability by drawing a background behind the text.

generations_list = [("Gen Z", x_left, 30),
                    ("Millenial", 30, 45),
                    ("Gen X", 45, 60),
                    ("Boomer", 60, 79),
                    ("Silent", 79, x_right)]

toggle = True

for name, left, right in generations_list:

    ax.fill_between(x=[left, right],
                    y1=[y_bottom, y_bottom],
                    y2=[y_top, y_top],
                    facecolors={True: "#BBB", False: "#444"}[toggle],
                    alpha=0.2,
                    zorder=1
                    )

    ax.text(x=(left + right) / 2,
            y=y_bottom + y_tick_range * 0.04,
            s=name,
            size=11,
            ha="center",
            bbox={"boxstyle": "Round", "facecolor": "white", "edgecolor": "#333", "linewidth": 0.5, "pad": 0.2}
            )

    toggle = not toggle

Call text one more time to cite the Github dataset. Place this text in the upper-left corner with a left/top alignment. This is why we saved x_tick_range and y_tick_range as variables. It makes it easy to specify coordinates without hard-coding values.

ax.text(x=x_ticks[0] + x_tick_range * 0.005,
        y=y_ticks[-1] - y_tick_range * 0.006,
        s="Data: https://github.com/unitedstates/congress-legislators",
        size=8,
        color="#0F0F0F",
        ha="left",
        va="top"
        )

Finally, place a legend in the upper-right corner, set axis labels, and save the figure.

ax.legend(loc="upper right")

ax.set_xlabel("Current Age")
ax.set_ylabel("Count")
ax.set_title("Current US House & Senate  •  April 1, 2025")

plt.savefig("congress_age.png", dpi=300)

3. The output.

Democrats have more 80+ members but Republicans can claim the oldest in Sen. Grassley. Democrats have significantly more under-40 members, including the only congressional Zoomer. The average ages are less than a year apart. In other words, it’s difficult to make this a strong partisan issue, which may be why it’s not weaponized more often in Washington.

I didn’t realize how many 70+ Congress members there were until making this plot. I’m eternally grateful to have two gracefully aging Baby Boomer parents, but I’m not sure how I’d feel if they were leading the country.

But like I said earlier, I’m not interested in arguing for term limits. People vote to be represented this way and it’s up to them to change the status quo. I hope this post helps to communicate the facts.


4. Bonus plot.

I’m not going to post the code but I couldn’t resist plotting a moving average beginning with the first Congress in 1789. For this, I separated House and Senate members. And threw in a cheesy image of the US Capitol building.

So Congress is about 13 years older than they were 236 years ago. Some of that aging is reasonable given longer lifespans and improved health care, especially for the elderly. But the sharp increase beginning in the early 1980s is difficult to hand-wave away. Has it leveled off? Maybe, but we did recently elect the oldest presidential candidate in history, so the anti-youth wave hasn’t crashed yet.


Download the Matplotlib style.

Full code:

import yaml
import pandas as pd
from numpy import floor
import matplotlib.pyplot as plt


def yaml_to_df(filename):

    with open(filename, "r") as f:
        data = yaml.load(f, Loader=yaml.SafeLoader)

    last_name_column = []
    first_name_column = []
    birthday_column = []
    party_column = []
    chamber_column = []
    start_column = []
    end_column = []

    for person in data:

        last_name = person['name']['last']
        first_name = person['name']['first']

        if "birthday" in person['bio']:
            birthday = pd.Timestamp(person['bio']['birthday'])
        else:
            birthday = pd.NA

        for term in person['terms']:

            last_name_column.append(last_name)
            first_name_column.append(first_name)
            birthday_column.append(birthday)
            chamber_column.append(term['type'])
            start_column.append(pd.Timestamp(term['start']))
            end_column.append(pd.Timestamp(term['end']))

            if "party" in term:
                party_column.append(term['party'])
            else:
                party_column.append(pd.NA)

    return pd.DataFrame({'last_name': last_name_column,
                         'first_name': first_name_column,
                         'birthday': birthday_column,
                         'party': party_column,
                         'chamber': chamber_column,
                         'start': start_column,
                         'end': end_column})


df = pd.concat([yaml_to_df("legislators-historical.yaml"),
                yaml_to_df("legislators-current.yaml")]
               )

df = df.dropna(subset=['birthday', 'party'])

today = pd.Timestamp("April 1 2025")

df = df[df['end'] > today]

df.loc[:, 'current_age'] = (today - df['birthday']).dt.days / 365.25

df.loc[:, 'bin'] = floor(df['current_age']) + 0.5

mean_age_dem = df[df['party'].isin(['Democrat', 'Independent'])]['current_age'].mean()
mean_age_repub = df[df['party'] == 'Republican']['current_age'].mean()

plt.style.use("wollen_congress.mplstyle")

fig, ax = plt.subplots()

df_dem_histo = df[df['party'].isin(['Democrat', 'Independent'])]['bin'].value_counts().reset_index()

ax.bar(x=df_dem_histo['index'],
       height=df_dem_histo['bin'],
       color="#0270CF",
       width=0.8,
       zorder=2,
       label=f"Democrat • mean={mean_age_dem:.1f}"
       )

df_repub_histo = df[df['party'] == 'Republican']['bin'].value_counts().reset_index()

ax.bar(x=df_repub_histo['index'],
       height=df_repub_histo['bin'] * -1,
       color="#E41B22",
       width=0.8,
       zorder=2,
       label=f"Republican • mean={mean_age_repub:.1f}"
       )

x_ticks = range(25, 100, 5)
ax.set_xticks(x_ticks)
x_tick_range = x_ticks[-1] - x_ticks[0]
x_left, x_right = x_ticks[0] - x_tick_range * 0.02, x_ticks[-1] + x_tick_range * 0.02
ax.set_xlim(x_left, x_right)

y_ticks = range(-15, 20, 5)
ax.set_yticks(y_ticks, labels=[abs(n) for n in y_ticks])
y_tick_range = y_ticks[-1] - y_ticks[0]
y_bottom, y_top = y_ticks[0], y_ticks[-1] + y_tick_range * 0.005
ax.set_ylim(y_bottom, y_top)

ax.plot([x_left, x_right], [0, 0], color="#555", linewidth=1.2, zorder=3)

generations_list = [("Gen Z", x_left, 30),
                    ("Millenial", 30, 45),
                    ("Gen X", 45, 60),
                    ("Boomer", 60, 79),
                    ("Silent", 79, x_right)]

toggle = True

for name, left, right in generations_list:

    ax.fill_between(x=[left, right],
                    y1=[y_bottom, y_bottom],
                    y2=[y_top, y_top],
                    facecolors={True: "#BBB", False: "#444"}[toggle],
                    alpha=0.2,
                    zorder=1
                    )

    ax.text(x=(left + right) / 2,
            y=y_bottom + y_tick_range * 0.04,
            s=name,
            size=11,
            ha="center",
            bbox={"boxstyle": "Round", "facecolor": "white", "edgecolor": "#333", "linewidth": 0.5, "pad": 0.2}
            )

    toggle = not toggle

ax.text(x=x_ticks[0] + x_tick_range * 0.005,
        y=y_ticks[-1] - y_tick_range * 0.006,
        s="Data: https://github.com/unitedstates/congress-legislators",
        size=8,
        color="#0F0F0F",
        ha="left",
        va="top"
        )

ax.legend(loc="upper right")

ax.set_xlabel("Current Age")
ax.set_ylabel("Count")
ax.set_title("Current US House & Senate  •  April 1, 2025")

plt.savefig("congress_age.png", dpi=300)