Partisan growing pains
This post will dive a little deeper into American politics. I try to avoid it unless there’s something genuinely interesting to say about the data. In this case, I think there is!
We’ll look at the last decade of county-level population growth and how it correlates with 2024 election results. Do Democrats have a growth problem?
This analysis is interesting because it’s more forward-looking. We’ll run a regression on 2024 results and then think about how demographic trends could affect future elections. I’d like to register my thoughts before we get too far removed from 2024.
1. Prepare the data.
We have three spreadsheets—one for election results and two for census data. Everything is reported at the county level. We’re calculating population change from 2013 to 2023 (the most recently available year) so there are two Excel files from the Census Bureau. Everything is linked at the bottom of this post.
The plan is to create a column common to all three tables and then merge them into a single pandas DataFrame. From there, we’ll do a linear regression to see how strongly population growth is correlated with vote.
For the most part, county names are clean and conveniently match across the datasets. Election and map nerds tend to follow Census Bureau naming conventions. But we do have a few edge cases to deal with:
- The District of Columbia doesn’t have counties.
- Alaska doesn’t have counties either. They have boroughs but they report election results at the state legislative district level.
- Connecticut is complicated. They recently switched to “planning regions,” which don’t match the previously reported boundaries, so we can’t accurately measure change over time.
- Kalawao County, Hawaii exists in census files but the state doesn’t report election results for it.
We’ll simply filter out Alaska, Connecticut, D.C., and Kalawao County and move ahead.
We have to read two census Excel files so let’s write a function. There’s nothing complicated here so I won’t waste too many words going over it.
First exclude a few rows at the top and bottom of the spreadsheet and rename columns. Then we can parse county names, which are formatted as .[County], [State]
. Our code separates county from state and then adds a new column, id, formatted [State]_[County]
. We’ll create this column in all three files and use it to merge the data.
import pandas as pd def get_census_df(filename, new_columns): df = pd.read_excel(filename) df = df[4:-6] df.columns = new_columns df[['county', 'state']] = df['census_name'].str.split(", ", expand=True) df.loc[:, 'county'] = df['county'].apply(lambda x: x[1:-7]) df.loc[:, 'id'] = df['state'] + "_" + df['county'] df = df[~df['state'].isin(["Alaska", "Connecticut", "District of Columbia"])] df = df[df['census_name'] != ".Kalawao County, Hawaii"] return df df_2013 = get_census_df(filename="co-est2019-annres.xlsx", new_columns=['census_name', 'census', 'base_estimate', 'pop2010', 'pop2011', 'pop2012', 'pop2013', 'pop2014', 'pop2015', 'pop2016', 'pop2017', 'pop2018', 'pop2019'])
The Census Bureau files have yearly population estimates, despite canvassing once per decade. We only care about 2013 so we can filter out the extra columns.
df_2013 = df_2013[['id', 'pop2013']]
df_2013.head()
looks like this:
id pop2013 4 Alabama_Autauga 54727.0 5 Alabama_Baldwin 194885.0 6 Alabama_Barbour 26937.0 7 Alabama_Bibb 22521.0 8 Alabama_Blount 57619.0
Repeat the process for the 2023 census file. The only differences are column names.
df_2023 = get_census_df(filename="co-est2023-pop.xlsx", new_columns=['census_name', 'base_estimate', 'pop2020', 'pop2021', 'pop2022', 'pop2023']) df_2023 = df_2023[['id', 'pop2023']]
We have two DataFrames, one each for 2013 and 2023 census data. Now let’s read the 2024 election results file.
Like before, we need to filter out Alaska, Connecticut, and D.C. We also need to parse county names so they have an identical id column ([State]_[County]
). The per_point_diff column measures percentage point difference between Republican and Democratic votes. It will eventually be our dependent variable along the y-axis.
df_elx = pd.read_csv("2024_US_County_Level_Presidential_Results.csv") df_elx = df_elx[~df_elx['state_name'].isin(["Alaska", "Connecticut", "District of Columbia"])] df_elx.loc[:, 'county'] = df_elx['county_name'].apply(lambda x: x[:-7]) df_elx.loc[:, 'id'] = df_elx['state_name'] + "_" + df_elx['county'] df_elx = df_elx[['id', 'per_point_diff']]
df_elx.head()
is shown below. A positive per_point_diff indicates a Republican win.
id per_point_diff 0 Alabama_Autauga 0.462753 1 Alabama_Baldwin 0.581768 2 Alabama_Barbour 0.147274 3 Alabama_Bibb 0.644194 4 Alabama_Blount 0.810173
At this point, we have three DataFrames, each with 3,103 rows and identical counties in the id column. It’s time to merge them into a single DataFrame.
pandas.merge
won’t accept a list of DataFrames, but you can easily chain together method calls. Specify that the common column is id and assign the output to a new variable, df
.
df = df_2013.merge(df_2023, on="id").merge(df_elx, on="id")
df.head()
looks like this:
id pop2013 pop2023 per_point_diff 0 Alabama_Autauga 54727.0 60342.0 0.462753 1 Alabama_Baldwin 194885.0 253507.0 0.581768 2 Alabama_Barbour 26937.0 24585.0 0.147274 3 Alabama_Bibb 22521.0 21868.0 0.644194 4 Alabama_Blount 57619.0 59816.0 0.810173
We can almost do a regression now. Our independent “predictor” variable is population growth so first we need to calculate percent change. Go ahead and re-order the columns so they’re easier to read.
df.loc[:, 'pct_change'] = (df['pop2023'] - df['pop2013']) / df['pop2013'] df = df[['id', 'pop2013', 'pop2023', 'pct_change', 'per_point_diff']]
Here’s one more look at df.head()
.
- x variable: pct_change of population
- y variable: per_point_diff of 2024 vote.
The first row is saying (roughly) that Autauga County, Alabama grew from 55K people to 60K people, which was a 10% change. They voted for Trump by a 46-point margin. The third and fourth rows show counties that declined in population.
id pop2013 pop2023 pct_change per_point_diff 0 Alabama_Autauga 54727.0 60342.0 0.102600 0.462753 1 Alabama_Baldwin 194885.0 253507.0 0.300803 0.581768 2 Alabama_Barbour 26937.0 24585.0 -0.087315 0.147274 3 Alabama_Bibb 22521.0 21868.0 -0.028995 0.644194 4 Alabama_Blount 57619.0 59816.0 0.038130 0.810173
We’re analyzing percent change of population so each county’s size, in absolute terms, doesn’t change the math. But it will improve our scatter plot to let dot size represent population. Larger dots mean larger counties. Let’s create a column for marker_size
.
df.loc[:, 'marker_size'] = df['pop2023'] * 0.00015
We could go straight to the regression (and I would start there if this wasn’t a blog post) but the effect is more interesting if we restrict our analysis to the top 10% of counties by population. Collectively, they account for two-thirds of the national vote. What happens there drives the overall trend.
We have 3,103 rows to start. Sort by 2023 population and truncate the DataFrame to 310 rows. That will leave us with the 10% most populous counties.
df = df.sort_values("pop2023", ascending=False)[:310]
Now we can pass the columns to scipy.stats.linregress
.
from scipy.stats import linregress slope, intercept, r_value, p_value, std_err = linregress(df['pct_change'], df['per_point_diff'])
We’ll want to plot the regression line on top of the scatter plot. slope
and intercept
correspond to m
and b
in the linear equation y=mx+b
.
x_reg = [df['pct_change'].min(), df['pct_change'].max()] y_reg = [n * slope + intercept for n in x_reg]
2. Plot the data.
I’ll use a custom Matplotlib style that will be linked at the bottom of this post. Start by creating an Axes instance for plotting (ax
).
import matplotlib.pyplot as plt plt.style.use("wollen_election.mplstyle") fig, ax = plt.subplots()
Create a scatter plot of population growth and 2024 vote.
I’m using a purple color to avoid blue/red partisan association. The size of each dot is defined by the marker_size column we created above. I’m giving the dots some transparency (alpha
) because there will be a lot of overlap. Set zorder
to 2 because we’ll create multiple layers on the figure.
ax.scatter(x=df['pct_change'], y=df['per_point_diff'], color="#9885BF", s=df['marker_size'], edgecolor="#333", linewidth=0.5, alpha=0.7, zorder=2)
Next is the regression line. This is a simple call to ax.plot()
. #333 is a dark gray color. zorder=3
puts this line directly on top of the scatter markers.
ax.plot(x_reg, y_reg, color="#333", zorder=3)
I like to include coordinate axes lines whenever possible. They make it easy to see what values are positive and negative. zorder=1
means the lines are drawn underneath the data.
ax.plot([-2, 2], [0, 0], color="#333", linewidth=0.5, zorder=1) ax.plot([0, 0], [-2, 2], color="#333", linewidth=0.5, zorder=1)
Often I’ll use numpy.arange
or numpy.linspace
to get a list of evenly spaced float values. But due to round off errors, they can add bugs when using the ==
operator. For example, arange
might give me 0.00002 instead of 0.0, and Python will tell me the value doesn’t equal zero.
Along the horizontal axis, I want x-ticks to be a list of evenly spaced floats from -0.2 to 0.6. I’ll use a list comprehension instead of numpy. I want label strings to be the tick values multiplied by 100, so they’re displayed as percentages ranging from 0 to 100.
Assign xlim
values to variables because we’ll use them later to draw text on the plot.
x_ticks = [n / 10 for n in range(-2, 7)] ax.set_xticks(x_ticks, labels=[f"{n * 100:+.0f}%" if n != 0 else "0" for n in x_ticks]) x_tick_range = x_ticks[-1] - x_ticks[0] x_left, x_right = x_ticks[0] - x_tick_range * 0.03, x_ticks[-1] + x_tick_range * 0.03 ax.set_xlim(x_left, x_right)
The y-axis is similar but these ticks represent a partisan margin from D+80 to R+60. I’ve written a function to convert per_point_diff values to D-R margins.
def get_partisan_ticks(ticks): new_ticks = [] for tick in ticks: if tick > 0: new_ticks.append(f"R+{tick * 100:.0f}") elif tick < 0: new_ticks.append(f"D+{abs(tick) * 100:.0f}") else: new_ticks.append("TIE") return new_ticks y_ticks = [n / 10 for n in range(-8, 8, 2)] ax.set_yticks(y_ticks, labels=get_partisan_ticks(y_ticks)) y_tick_range = y_ticks[-1] - y_ticks[0] y_bottom, y_top = y_ticks[0] - y_tick_range * 0.01, y_ticks[-1] + y_tick_range * 0.01 ax.set_ylim(y_bottom, y_top)
Let’s make a note in the lower-left corner that Alaska, Connecticut, and D.C. are excluded from the data. I like to locate text just inside the outermost grid lines.
ax.text(x=x_ticks[0] + x_tick_range * 0.004, y=y_ticks[0] + y_tick_range * 0.002, s="AK, CT, DC excluded.", ha="left", va="bottom")
In the lower-right corner, cite the data sources.
ax.text(x=x_ticks[-1] - x_tick_range * 0.004, y=y_ticks[0] + y_tick_range * 0.002, s="Election data: github.com/tonmcg.\nPopulation data: US Census Bureau.", ha="right", va="bottom")
Let’s include the value of R² just above the regression line. R² measures how tightly the data points fit the line. (In politics, it’s usually not a high number.)
ax.text(x=x_reg[-1], y=y_reg[-1] + y_tick_range * 0.01, s=f"R² = {r_value ** 2:.2f}", size=11, ha="right", va="bottom")
Finally, set a title and save the figure. Variables are defined explicitly enough in the title that we can skip axes labels.
ax.set_title("Largest 10% of Counties • Population Change 2013 to 2023 • 2024 POTUS Results") plt.savefig("county_population_reg.png", dpi=200)
3. The output.
It’s a questionable linear model! Residuals in the upper half of the range tend to form a J-shape, which is a sign of non-linearity. That’s okay. It tells us something about the data.
Model aside, there is clearly some positive relationship between county population growth and partisan preference. Notice how many dots are in the upper-right compared to the lower-right. A lot of moderately large, fast-growing counties are very Republican. Remember, we’re plotting the top 10% of counties by population. Even the small dots represent large groups of people.
Of these 310 large counties, 34 grew at least 25%. Only two of the 34 voted for Harris. If I had to boil this post down to a single fact, that would be it.
The data strongly suggests that Republican voters are having more kids. And they are, according to this analysis of CDC fertility data. It’s a profound long-run advantage for Republicans because people very often inherit their political identity from parents.
To be clear, this approach doesn’t distinguish between natural population growth and inbound migration. Some people relocate, at least in part, for partisan reasons. That makes it more difficult to draw conclusions from the relationship modeled above. If a Pennsylvania Republican moves to a fast-growing, Trump-loving Texas county, it’s probably not helpful to the party overall.
Regardless, these trends should give Democrats pause. Reapportionment after the 2020 census weakened the Blue Wall strategy (Pennsylvania, Michigan, and Wisconsin). 2030 is set to take more electoral votes from blue states. And the Electoral College isn’t the only 2030 concern. Republicans will benefit in the House as well, thanks in large part to surging migration to affordable Sun Belt metros.
Democrats’ smaller margins in population centers aren’t necessarily cause for alarm. The party is increasingly targeting marginal battleground states in both policy and ad spending. Harris lost but she performed better relative to baseline in actively contested states. In other words, (1) campaigning works, and (2) Democratic votes are now more efficiently distributed.
Trump gains in high-population blue states almost completely erased the Republican Electoral College advantage in 2024. The delta between national popular vote and tipping-point state vote hasn’t been this small since the 1980s:
- 2016, R+2.8
- 2020, R+3.8
- 2024, R+0.2
Neither party should panic. The major party voting coalitions are always churning, recomposing, and evolving. Our modern presidential elections are extremely competitive by historical standards.
Republicans have done well to build a more multiracial coalition. They’ve made significant progress in deep blue states that previously felt unwinnable. And the fastest-growing population centers are solidly Republican, as we showed above. But so far, these gains have been inefficient with respect to the Electoral College.
Democrats have benefited from education polarization, especially in midterm and special elections. Their voters are more likely to show up. They’ve also made gains in suburban neighborhoods, which are the sweet spot for making your vote count. But those advantages are less helpful in presidential elections where turnout is higher across the board.
Democrats should be concerned about long-term trends but, by definition, there is plenty of time to adapt. Despite the headwinds of 20% cumulative inflation during Biden’s tenure, 2024 was a competitive election and Trump fell short of winning 50% of the vote.
Arguably, Democrats are better positioned to make gains in 2028 than Republicans. The current Republican coalition is built on activating low-propensity voters. It remains to be seen if they can hold it together without Trump leading the party.
Election results (github.com/tonmcg).
2013 population data (US Census Bureau).
2023 population data (US Census Bureau).
Download the Matplotlib style.
Full code:
import pandas as pd from scipy.stats import linregress import matplotlib.pyplot as plt def get_census_df(filename, new_columns): df = pd.read_excel(filename) df = df[4:-6] df.columns = new_columns df[['county', 'state']] = df['census_name'].str.split(", ", expand=True) df.loc[:, 'county'] = df['county'].apply(lambda x: x[1:-7]) df.loc[:, 'id'] = df['state'] + "_" + df['county'] df = df[~df['state'].isin(["Alaska", "Connecticut", "District of Columbia"])] df = df[df['census_name'] != ".Kalawao County, Hawaii"] return df def get_partisan_ticks(ticks): new_ticks = [] for tick in ticks: if tick > 0: new_ticks.append(f"R+{tick * 100:.0f}") elif tick < 0: new_ticks.append(f"D+{abs(tick) * 100:.0f}") else: new_ticks.append("TIE") return new_ticks pd.set_option("display.expand_frame_repr", False) df_2013 = get_census_df(filename="co-est2019-annres.xlsx", new_columns=['census_name', 'census', 'base_estimate', 'pop2010', 'pop2011', 'pop2012', 'pop2013', 'pop2014', 'pop2015', 'pop2016', 'pop2017', 'pop2018', 'pop2019']) df_2013 = df_2013[['id', 'pop2013']] df_2023 = get_census_df(filename="co-est2023-pop.xlsx", new_columns=['census_name', 'base_estimate', 'pop2020', 'pop2021', 'pop2022', 'pop2023']) df_2023 = df_2023[['id', 'pop2023']] df_elx = pd.read_csv("2024_US_County_Level_Presidential_Results.csv") df_elx = df_elx[~df_elx['state_name'].isin(["Alaska", "Connecticut", "District of Columbia"])] df_elx.loc[:, 'county'] = df_elx['county_name'].apply(lambda x: x[:-7]) df_elx.loc[:, 'id'] = df_elx['state_name'] + "_" + df_elx['county'] df_elx = df_elx[['id', 'per_point_diff']] df = df_2013.merge(df_2023, on="id").merge(df_elx, on="id") df.loc[:, 'pct_change'] = (df['pop2023'] - df['pop2013']) / df['pop2013'] df = df[['id', 'pop2013', 'pop2023', 'pct_change', 'per_point_diff']] df.loc[:, 'marker_size'] = df['pop2023'] * 0.00015 df = df.sort_values("pop2023", ascending=False)[:310] slope, intercept, r_value, p_value, std_err = linregress(df['pct_change'], df['per_point_diff']) x_reg = [df['pct_change'].min(), df['pct_change'].max()] y_reg = [n * slope + intercept for n in x_reg] plt.style.use("wollen_election.mplstyle") fig, ax = plt.subplots() ax.scatter(x=df['pct_change'], y=df['per_point_diff'], color="#9885BF", s=df['marker_size'], edgecolor="#333", linewidth=0.5, alpha=0.7, zorder=2) ax.plot(x_reg, y_reg, color="#333", zorder=3) ax.plot([-2, 2], [0, 0], color="#333", linewidth=0.5, zorder=1) ax.plot([0, 0], [-2, 2], color="#333", linewidth=0.5, zorder=1) x_ticks = [n / 10 for n in range(-2, 7)] ax.set_xticks(x_ticks, labels=[f"{n * 100:+.0f}%" if n != 0 else "0" for n in x_ticks]) x_tick_range = x_ticks[-1] - x_ticks[0] x_left, x_right = x_ticks[0] - x_tick_range * 0.03, x_ticks[-1] + x_tick_range * 0.03 ax.set_xlim(x_left, x_right) y_ticks = [n / 10 for n in range(-8, 8, 2)] ax.set_yticks(y_ticks, labels=get_partisan_ticks(y_ticks)) y_tick_range = y_ticks[-1] - y_ticks[0] y_bottom, y_top = y_ticks[0] - y_tick_range * 0.01, y_ticks[-1] + y_tick_range * 0.01 ax.set_ylim(y_bottom, y_top) ax.text(x=x_ticks[0] + x_tick_range * 0.004, y=y_ticks[0] + y_tick_range * 0.002, s="AK, CT, DC excluded.", ha="left", va="bottom") ax.text(x=x_ticks[-1] - x_tick_range * 0.004, y=y_ticks[0] + y_tick_range * 0.002, s="Election data: github.com/tonmcg.\nPopulation data: US Census Bureau.", ha="right", va="bottom") ax.text(x=x_reg[-1], y=y_reg[-1] + y_tick_range * 0.01, s=f"R² = {r_value ** 2:.2f}", size=11, ha="right", va="bottom") ax.set_title("Largest 10% of Counties • Population Change 2013 to 2023 • 2024 POTUS Results") plt.savefig("county_population_reg.png", dpi=200)