Working with the new 2020 Census data
Like a lot of data nerds I was excited to check out the new US Census apportionment data. It only comes once per decade after all. I thought it would be a good chance to work on something I really enjoy: plotting maps.
In this post we’ll take a simple statistic, state population growth from 2010 to 2020, and focus most of our energy on producing the plot. The full code will be available at the end of the post.
Data sources:
1. Prepare the data.
Start with the imports. To produce this plot we’ll use GeoPandas, which is a package that (not surprisingly) looks and feels a lot like pandas. It keeps our data in a nice GeoDataFrame object on which we can eventually call .plot()
.
import pandas as pd import geopandas as gpd import matplotlib.pyplot as plt from mpl_toolkits.axes_grid1.inset_locator import zoomed_inset_axes from matplotlib import cm
Since we’re plotting population change we’ll want the POPULATION2010
and POPULATION2020
columns in the same dataframe. Read both datasets and concatenate the dataframes together. Because axis=1
the dataframes are placed side-by-side rather than stacked one below the other. Fortunately our datasets have the same structure with state names in the same order, so we don’t have to worry about matching up rows.
df2010 = pd.read_csv("data/2010.csv") df2020 = pd.read_csv("data/2020.csv") df = pd.concat([df2010, df2020], axis=1) df.set_index("STATE2010", inplace=True)
Create a column containing population change, then rate of change. This is the statistic we’ll use to color states on the map.
df.loc[:, "CHANGE"] = df["POPULATION2020"] - df["POPULATION2010"] df.loc[:, "CHANGE_RATE"] = df["CHANGE"] / df["POPULATION2010"]
At this point df.head(3)
looks like this:
POPULATION2010 STATE2020 POPULATION2020 CHANGE CHANGE_RATE STATE2010 Alabama 4802982 Alabama 5030053 227071 0.047277 Alaska 721523 Alaska 736081 14558 0.020177 Arizona 6412700 Arizona 7158923 746223 0.116366
Since we’re plotting Alaska and Hawaii on separate inset axes we can’t use the built-in cmap
parameter. We’ll have to write our own function to generate colors. State colors will be placed in a COLOR
column that can be passed into plot()
. Higher population growth is represented by darker greens and if a population shrank from 2010 to 2020, the function returns a shade of red.
def get_color(rate, max_rate): if rate >= 0: greens_cmap = cm.get_cmap("Greens", 100) return greens_cmap(rate / max_rate) else: reds_cmap = cm.get_cmap("Reds", 100) return reds_cmap(abs(rate) / max_rate) df.loc[:, "COLOR"] = df["CHANGE_RATE"].apply(get_color, max_rate=df["CHANGE_RATE"].max())
Notice we haven’t yet created a GeoDataFrame. Our shapefile uses the EPSG:4326 coordinate system, i.e. normal latitude/longitude pairs. Although it isn’t necessary here, I’m in the habit of reprojecting my CRS to EPSG:3857. Remember to filter out any shapefile elements that aren’t in df
(Puerto Rico and D.C.).
shapefile_path = "shapefile/cb_2018_us_state_20m.shp" gdf = gpd.read_file(shapefile_path, epsg=4326) gdf = gdf.to_crs(epsg=3857) gdf = gdf[gdf["NAME"].isin(df["STATE2020"])
Next we need to merge
the pandas dataframe, which contains color information, with the GeoDataFrame. To accomplish this we’ll first rename a column so that GeoPandas knows which rows to match.
gdf.rename(columns={"NAME": "STATE2020"}, inplace=True) gdf = gdf.merge(df, on="STATE2020")
2. Plot the data on a map.
Now for the interesting part: calling gdf.plot()
. On this axis we need to set x- and y-limits and limit our map to the continental United States. Alaska and Hawaii often present a problem when plotting a US map. The extreme west (east!) Aleutian Islands actually extend into the eastern hemisphere. So if we didn’t limit the window, GeoPandas would plot a zoomed-out figure containing all 360° of longitude. But I don’t want to pick on them too much. We’ll add Alaska and Hawaii inset axes next.
fig, ax = plt.subplots(figsize=(12, 7.4)) fig.subplots_adjust(left=0.0, right=1.0, bottom=0.0, top=0.955) border_thickness = 0.4 border_color = "#0F0F0F" us_map = gdf.plot(ax=ax, color=gdf["COLOR"], categorical=True, edgecolor=border_color, linewidth=border_thickness) us_map.set_xlim(-14.1e6, -7.24e6) us_map.set_ylim(2.5e6, 6.47e6)
After a bit of tedious work the inset axes fall into place. This procedure is very similar to plotting the continental US above. We simply create two slices of gdf
restricted to Alaska and Hawaii respectively, then again call plot()
.
alaska_inset = zoomed_inset_axes(ax, zoom=0.21, bbox_to_anchor=(0.25, 0.29), bbox_transform=plt.gcf().transFigure) alaska_inset.set_xlim(-20.0e6, -14.36e6) alaska_inset.set_ylim(6.53e6, 11.75e6) gdf_alaska = gdf[gdf["STATE2020"] == "Alaska"] gdf_alaska.plot(ax=alaska_inset, color=gdf["COLOR"], categorical=True, edgecolor=border_color, linewidth=border_thickness) hawaii_inset = zoomed_inset_axes(ax, zoom=1.05, bbox_to_anchor=(0.36, 0.22), bbox_transform=plt.gcf().transFigure) hawaii_inset.set_xlim(-17.87e6, -17.20e6) hawaii_inset.set_ylim(2.09e6, 2.57e6) gdf_hawaii = gdf[gdf["STATE2020"] == "Hawaii"] gdf_hawaii.plot(ax=hawaii_inset, color=gdf["COLOR"], categorical=True, edgecolor=border_color, linewidth=border_thickness)
Finally add a title and attribution text and turn off any axis labels that would clutter the map.
us_map.set_title("State Population Change | 2010-2020", {"fontname": "Ubuntu Condensed", "fontsize": 16}) us_map.annotate(text="Data: US Census Bureau", xy=(-7.35e6, 2.54e6), size=11, family="Ubuntu Condensed", horizontalalignment="right", verticalalignment="bottom") us_map.set_axis_off() alaska_inset.set_axis_off() hawaii_inset.set_axis_off() plt.savefig("population_2010-2020.png", facecolor="#FEFEFE", dpi=300)
I usually suggest saving maps in vectorized SVG format rather than PNG or JPG whenever possible. This avoids blurriness when zooming in. If you do output a raster image consider increasing dpi
from the default of 100
The output:
TODO:
- Add a legend so the audience know what these colors mean!
- Label smaller states with
annotate()
orarrow()
. - Specify a few extreme values to help communicate scale.
Full code:
import pandas as pd import geopandas as gpd import matplotlib.pyplot as plt from mpl_toolkits.axes_grid1.inset_locator import zoomed_inset_axes from matplotlib import cm def get_color(rate, max_rate): if rate >= 0: greens_cmap = cm.get_cmap("Greens", 100) return greens_cmap(rate / max_rate) else: reds_cmap = cm.get_cmap("Reds", 100) return reds_cmap(abs(rate) / max_rate) df2010 = pd.read_csv("data/2010.csv") df2020 = pd.read_csv("data/2020.csv") df = pd.concat([df2010, df2020], axis=1) df.set_index("STATE2010", inplace=True) df.loc[:, "CHANGE"] = df["POPULATION2020"] - df["POPULATION2010"] df.loc[:, "CHANGE_RATE"] = df["CHANGE"] / df["POPULATION2010"] df.loc[:, "COLOR"] = df["CHANGE_RATE"].apply(get_color, max_rate=df["CHANGE_RATE"].max()) shapefile_path = "shapefile/cb_2018_us_state_20m.shp" gdf = gpd.read_file(shapefile_path, epsg=4326) gdf = gdf.to_crs(epsg=3857) gdf = gdf[gdf["NAME"].isin(df["STATE2020"])] gdf.rename(columns={"NAME": "STATE2020"}, inplace=True) gdf = gdf.merge(df, on="STATE2020") fig, ax = plt.subplots(figsize=(12, 7.4)) fig.subplots_adjust(left=0.0, right=1.0, bottom=0.0, top=0.955) border_thickness = 0.4 border_color = "#0F0F0F" us_map = gdf.plot(ax=ax, color=gdf["COLOR"], categorical=True, edgecolor=border_color, linewidth=border_thickness) us_map.set_xlim(-14.1e6, -7.24e6) us_map.set_ylim(2.5e6, 6.47e6) alaska_inset = zoomed_inset_axes(ax, zoom=0.21, bbox_to_anchor=(0.25, 0.29), bbox_transform=plt.gcf().transFigure) alaska_inset.set_xlim(-20.0e6, -14.36e6) alaska_inset.set_ylim(6.53e6, 11.75e6) gdf_alaska = gdf[gdf["STATE2020"] == "Alaska"] gdf_alaska.plot(ax=alaska_inset, color=gdf["COLOR"], categorical=True, edgecolor=border_color, linewidth=border_thickness) hawaii_inset = zoomed_inset_axes(ax, zoom=1.05, bbox_to_anchor=(0.36, 0.22), bbox_transform=plt.gcf().transFigure) hawaii_inset.set_xlim(-17.87e6, -17.20e6) hawaii_inset.set_ylim(2.09e6, 2.57e6) gdf_hawaii = gdf[gdf["STATE2020"] == "Hawaii"] gdf_hawaii.plot(ax=hawaii_inset, color=gdf["COLOR"], categorical=True, edgecolor=border_color, linewidth=border_thickness) us_map.set_title("State Population Change | 2010-2020", {"fontname": "Ubuntu Condensed", "fontsize": 16}) us_map.annotate(text="Data: US Census Bureau", xy=(-7.35e6, 2.54e6), size=11, family="Ubuntu Condensed", horizontalalignment="right", verticalalignment="bottom") us_map.set_axis_off() alaska_inset.set_axis_off() hawaii_inset.set_axis_off() plt.savefig("population_2010-2020.png", facecolor="#FEFEFE", dpi=300)