{"id":94,"date":"2021-04-30T12:19:45","date_gmt":"2021-04-30T17:19:45","guid":{"rendered":"https:\/\/wollen.org\/blog\/?p=94"},"modified":"2025-04-24T01:52:04","modified_gmt":"2025-04-24T06:52:04","slug":"working-with-the-new-2020-census-data","status":"publish","type":"post","link":"https:\/\/wollen.org\/blog\/2021\/04\/working-with-the-new-2020-census-data\/","title":{"rendered":"Working with the new 2020 Census data"},"content":{"rendered":"<p>Like a lot of data nerds I was excited to check out the new <a href=\"https:\/\/web.archive.org\/web\/20210428131326\/https:\/\/www.census.gov\/newsroom\/press-releases\/2021\/2020-census-apportionment-results.html\" target=\"_blank\" rel=\"noopener\">US Census apportionment data<\/a>. It only comes once per decade after all. I thought it would be a good chance to work on something I really enjoy: plotting maps.<\/p>\n<p>In this post we&#8217;ll take a simple statistic, state population growth from 2010 to 2020, and focus most of our energy on producing the plot. The full code will be available at the end of the post.<\/p>\n<p><strong>Data sources:<\/strong><\/p>\n<ol>\n<li><a href=\"https:\/\/www.census.gov\/data\/tables\/2010\/dec\/2010-apportionment-data.html\" target=\"_blank\" rel=\"noopener\">2010 Census<\/a><\/li>\n<li><a href=\"https:\/\/www.census.gov\/data\/tables\/2020\/dec\/2020-apportionment-data.html\" target=\"_blank\" rel=\"noopener\">2020 Census<\/a><\/li>\n<li><a href=\"https:\/\/www.census.gov\/geographies\/mapping-files\/time-series\/geo\/carto-boundary-file.html\" target=\"_blank\" rel=\"noopener\">Shapefile<\/a><\/li>\n<\/ol>\n<hr \/>\n<h4>1. Prepare the data.<\/h4>\n<p>Start with the imports. To produce this plot we&#8217;ll use <strong>GeoPandas<\/strong>, which is a package that (not surprisingly) looks and feels a lot like <strong>pandas<\/strong>. It keeps our data in a nice <em>GeoDataFrame<\/em> object on which we can eventually call <code>.plot()<\/code>.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">import pandas as pd\r\nimport geopandas as gpd\r\nimport matplotlib.pyplot as plt\r\nfrom mpl_toolkits.axes_grid1.inset_locator import zoomed_inset_axes\r\nfrom matplotlib import cm<\/pre>\n<p>Since we&#8217;re plotting population change we&#8217;ll want the <code>POPULATION2010<\/code> and <code>POPULATION2020<\/code> columns in the same dataframe. Read both datasets and concatenate the dataframes together. Because <code>axis=1<\/code> the dataframes are placed side-by-side rather than stacked one below the other. Fortunately our datasets have the same structure with state names in the same order, so we don&#8217;t have to worry about matching up rows.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">df2010 = pd.read_csv(\"data\/2010.csv\")\r\ndf2020 = pd.read_csv(\"data\/2020.csv\")\r\n\r\ndf = pd.concat([df2010, df2020], axis=1)\r\ndf.set_index(\"STATE2010\", inplace=True)<\/pre>\n<p>Create a column containing population change, then rate of change. This is the statistic we&#8217;ll use to color states on the map.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">df.loc[:, \"CHANGE\"] = df[\"POPULATION2020\"] - df[\"POPULATION2010\"]\r\ndf.loc[:, \"CHANGE_RATE\"] = df[\"CHANGE\"] \/ df[\"POPULATION2010\"]<\/pre>\n<p>At this point <code>df.head(3)<\/code> looks like this:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">           POPULATION2010 STATE2020  POPULATION2020  CHANGE  CHANGE_RATE\r\nSTATE2010                                                               \r\nAlabama           4802982   Alabama         5030053  227071     0.047277\r\nAlaska             721523    Alaska          736081   14558     0.020177\r\nArizona           6412700   Arizona         7158923  746223     0.116366\r\n<\/pre>\n<p>Since we&#8217;re plotting Alaska and Hawaii on separate inset axes we can&#8217;t use the built-in <code>cmap<\/code> parameter. We&#8217;ll have to write our own function to generate colors. State colors will be placed in a <code>COLOR<\/code> column that can be passed into <code>plot()<\/code>. Higher population growth is represented by darker greens and if a population shrank from 2010 to 2020, the function returns a shade of red.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">def get_color(rate, max_rate):\r\n    if rate &gt;= 0:\r\n        greens_cmap = cm.get_cmap(\"Greens\", 100)\r\n        return greens_cmap(rate \/ max_rate)\r\n    else:\r\n        reds_cmap = cm.get_cmap(\"Reds\", 100)\r\n        return reds_cmap(abs(rate) \/ max_rate)\r\n\r\n\r\ndf.loc[:, \"COLOR\"] = df[\"CHANGE_RATE\"].apply(get_color, max_rate=df[\"CHANGE_RATE\"].max())\r\n<\/pre>\n<p>Notice we haven&#8217;t yet created a <em>GeoDataFrame<\/em>. Our shapefile uses the <em>EPSG:4326<\/em> coordinate system, i.e. normal latitude\/longitude pairs. Although it isn&#8217;t necessary here, I&#8217;m in the habit of reprojecting my CRS to <em>EPSG:3857<\/em>. Remember to filter out any shapefile elements that aren&#8217;t in <code>df<\/code> (Puerto Rico and D.C.).<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">shapefile_path = \"shapefile\/cb_2018_us_state_20m.shp\"\r\ngdf = gpd.read_file(shapefile_path, epsg=4326)\r\ngdf = gdf.to_crs(epsg=3857)\r\ngdf = gdf[gdf[\"NAME\"].isin(df[\"STATE2020\"])<\/pre>\n<p>Next we need to <code>merge<\/code> the <strong>pandas<\/strong> dataframe, which contains color information, with the <em>GeoDataFrame<\/em>. To accomplish this we&#8217;ll first rename a column so that <strong>GeoPandas<\/strong> knows which rows to match.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">gdf.rename(columns={\"NAME\": \"STATE2020\"}, inplace=True)\r\ngdf = gdf.merge(df, on=\"STATE2020\")<\/pre>\n<hr \/>\n<h4>2. Plot the data on a map.<\/h4>\n<p>Now for the interesting part: calling <code>gdf.plot()<\/code>. On this axis we need to set x- and y-limits and limit our map to the continental United States. Alaska and Hawaii often present a problem when plotting a US map. The extreme west (east!) Aleutian Islands actually extend into the eastern hemisphere. So if we didn&#8217;t limit the window, <strong>GeoPandas<\/strong> would plot a zoomed-out figure containing all 360\u00b0 of longitude. But I don&#8217;t want to pick on them too much. We&#8217;ll add Alaska and Hawaii inset axes next.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">fig, ax = plt.subplots(figsize=(12, 7.4))\r\nfig.subplots_adjust(left=0.0, right=1.0, bottom=0.0, top=0.955)\r\n\r\nborder_thickness = 0.4\r\nborder_color = \"#0F0F0F\"\r\n\r\nus_map = gdf.plot(ax=ax, color=gdf[\"COLOR\"], categorical=True,\r\n                  edgecolor=border_color, linewidth=border_thickness)\r\n\r\nus_map.set_xlim(-14.1e6, -7.24e6)\r\nus_map.set_ylim(2.5e6, 6.47e6)<\/pre>\n<p>After a bit of tedious work the inset axes fall into place. This procedure is very similar to plotting the continental US above. We simply create two slices of <code>gdf<\/code> restricted to Alaska and Hawaii respectively, then again call <code>plot()<\/code>.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">alaska_inset = zoomed_inset_axes(ax, zoom=0.21, bbox_to_anchor=(0.25, 0.29), bbox_transform=plt.gcf().transFigure)\r\nalaska_inset.set_xlim(-20.0e6, -14.36e6)\r\nalaska_inset.set_ylim(6.53e6, 11.75e6)\r\ngdf_alaska = gdf[gdf[\"STATE2020\"] == \"Alaska\"]\r\ngdf_alaska.plot(ax=alaska_inset, color=gdf[\"COLOR\"], categorical=True,\r\n                edgecolor=border_color, linewidth=border_thickness)\r\n\r\nhawaii_inset = zoomed_inset_axes(ax, zoom=1.05, bbox_to_anchor=(0.36, 0.22), bbox_transform=plt.gcf().transFigure)\r\nhawaii_inset.set_xlim(-17.87e6, -17.20e6)\r\nhawaii_inset.set_ylim(2.09e6, 2.57e6)\r\ngdf_hawaii = gdf[gdf[\"STATE2020\"] == \"Hawaii\"]\r\ngdf_hawaii.plot(ax=hawaii_inset, color=gdf[\"COLOR\"], categorical=True,\r\n                edgecolor=border_color, linewidth=border_thickness)<\/pre>\n<p>Finally add a title and attribution text and turn off any axis labels that would clutter the map.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">us_map.set_title(\"State Population Change  |  2010-2020\", {\"fontname\": \"Ubuntu Condensed\", \"fontsize\": 16})\r\nus_map.annotate(text=\"Data:  US Census Bureau\", xy=(-7.35e6, 2.54e6), size=11, family=\"Ubuntu Condensed\",\r\n                horizontalalignment=\"right\", verticalalignment=\"bottom\")\r\n\r\nus_map.set_axis_off()\r\nalaska_inset.set_axis_off()\r\nhawaii_inset.set_axis_off()\r\nplt.savefig(\"population_2010-2020.png\", facecolor=\"#FEFEFE\", dpi=300)<\/pre>\n<p>I usually suggest saving maps in vectorized <em>SVG<\/em> format rather than <em>PNG<\/em> or <em>JPG<\/em> whenever possible. This avoids blurriness when zooming in. If you do output a raster image consider increasing <code>dpi<\/code> from the default of 100<\/p>\n<p><strong>The output:<\/strong><\/p>\n<p><a href=\"https:\/\/wollen.org\/blog\/wp-content\/uploads\/2021\/04\/population_2010-2020.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-1769 size-full\" src=\"https:\/\/wollen.org\/blog\/wp-content\/uploads\/2021\/04\/population_2010-2020.png\" alt=\"\" width=\"3600\" height=\"2220\" srcset=\"https:\/\/wollen.org\/blog\/wp-content\/uploads\/2021\/04\/population_2010-2020.png 3600w, https:\/\/wollen.org\/blog\/wp-content\/uploads\/2021\/04\/population_2010-2020-300x185.png 300w, https:\/\/wollen.org\/blog\/wp-content\/uploads\/2021\/04\/population_2010-2020-1024x631.png 1024w, https:\/\/wollen.org\/blog\/wp-content\/uploads\/2021\/04\/population_2010-2020-768x474.png 768w, https:\/\/wollen.org\/blog\/wp-content\/uploads\/2021\/04\/population_2010-2020-1536x947.png 1536w, https:\/\/wollen.org\/blog\/wp-content\/uploads\/2021\/04\/population_2010-2020-2048x1263.png 2048w\" sizes=\"auto, (max-width: 3600px) 100vw, 3600px\" \/><\/a><\/p>\n<hr \/>\n<p><strong>TODO:<\/strong><\/p>\n<ul>\n<li>Add a legend so the audience know what these colors mean!<\/li>\n<li>Label smaller states with <code>annotate()<\/code> or <code>arrow()<\/code>.<\/li>\n<li>Specify a few extreme values to help communicate scale.<\/li>\n<\/ul>\n<hr \/>\n<p><strong><a href=\"https:\/\/wollen.org\/misc\/population_change_4-30-2021.zip\">Download the data<\/a><\/strong>.<\/p>\n<p><strong>Full code:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">import pandas as pd\r\nimport geopandas as gpd\r\nimport matplotlib.pyplot as plt\r\nfrom mpl_toolkits.axes_grid1.inset_locator import zoomed_inset_axes\r\nfrom matplotlib import cm\r\n\r\n\r\ndef get_color(rate, max_rate):\r\n    if rate &gt;= 0:\r\n        greens_cmap = cm.get_cmap(\"Greens\", 100)\r\n        return greens_cmap(rate \/ max_rate)\r\n    else:\r\n        reds_cmap = cm.get_cmap(\"Reds\", 100)\r\n        return reds_cmap(abs(rate) \/ max_rate)\r\n\r\n\r\ndf2010 = pd.read_csv(\"data\/2010.csv\")\r\ndf2020 = pd.read_csv(\"data\/2020.csv\")\r\n\r\ndf = pd.concat([df2010, df2020], axis=1)\r\ndf.set_index(\"STATE2010\", inplace=True)\r\n\r\ndf.loc[:, \"CHANGE\"] = df[\"POPULATION2020\"] - df[\"POPULATION2010\"]\r\ndf.loc[:, \"CHANGE_RATE\"] = df[\"CHANGE\"] \/ df[\"POPULATION2010\"]\r\ndf.loc[:, \"COLOR\"] = df[\"CHANGE_RATE\"].apply(get_color, max_rate=df[\"CHANGE_RATE\"].max())\r\n\r\nshapefile_path = \"shapefile\/cb_2018_us_state_20m.shp\"\r\ngdf = gpd.read_file(shapefile_path, epsg=4326)\r\ngdf = gdf.to_crs(epsg=3857)\r\ngdf = gdf[gdf[\"NAME\"].isin(df[\"STATE2020\"])]\r\ngdf.rename(columns={\"NAME\": \"STATE2020\"}, inplace=True)\r\ngdf = gdf.merge(df, on=\"STATE2020\")\r\n\r\nfig, ax = plt.subplots(figsize=(12, 7.4))\r\nfig.subplots_adjust(left=0.0, right=1.0, bottom=0.0, top=0.955)\r\n\r\nborder_thickness = 0.4\r\nborder_color = \"#0F0F0F\"\r\n\r\nus_map = gdf.plot(ax=ax, color=gdf[\"COLOR\"], categorical=True,\r\n                  edgecolor=border_color, linewidth=border_thickness)\r\n\r\nus_map.set_xlim(-14.1e6, -7.24e6)\r\nus_map.set_ylim(2.5e6, 6.47e6)\r\n\r\nalaska_inset = zoomed_inset_axes(ax, zoom=0.21, bbox_to_anchor=(0.25, 0.29), bbox_transform=plt.gcf().transFigure)\r\nalaska_inset.set_xlim(-20.0e6, -14.36e6)\r\nalaska_inset.set_ylim(6.53e6, 11.75e6)\r\ngdf_alaska = gdf[gdf[\"STATE2020\"] == \"Alaska\"]\r\ngdf_alaska.plot(ax=alaska_inset, color=gdf[\"COLOR\"], categorical=True,\r\n                edgecolor=border_color, linewidth=border_thickness)\r\n\r\nhawaii_inset = zoomed_inset_axes(ax, zoom=1.05, bbox_to_anchor=(0.36, 0.22), bbox_transform=plt.gcf().transFigure)\r\nhawaii_inset.set_xlim(-17.87e6, -17.20e6)\r\nhawaii_inset.set_ylim(2.09e6, 2.57e6)\r\ngdf_hawaii = gdf[gdf[\"STATE2020\"] == \"Hawaii\"]\r\ngdf_hawaii.plot(ax=hawaii_inset, color=gdf[\"COLOR\"], categorical=True,\r\n                edgecolor=border_color, linewidth=border_thickness)\r\n\r\nus_map.set_title(\"State Population Change  |  2010-2020\", {\"fontname\": \"Ubuntu Condensed\", \"fontsize\": 16})\r\nus_map.annotate(text=\"Data:  US Census Bureau\", xy=(-7.35e6, 2.54e6), size=11, family=\"Ubuntu Condensed\",\r\n                horizontalalignment=\"right\", verticalalignment=\"bottom\")\r\n\r\nus_map.set_axis_off()\r\nalaska_inset.set_axis_off()\r\nhawaii_inset.set_axis_off()\r\nplt.savefig(\"population_2010-2020.png\", facecolor=\"#FEFEFE\", dpi=300)\r\n<\/pre>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Like a lot of data nerds I was excited to check out the new US Census apportionment data. It only comes once per decade after all. I thought it would be a good chance to work on something I really<\/p>\n","protected":false},"author":1,"featured_media":1824,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[19],"tags":[27,23,22,32,21,28,29,26,24,30,31,25,87],"class_list":["post-94","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-maps","tag-alaska","tag-census","tag-data","tag-demographics","tag-geopandas","tag-hawaii","tag-inset","tag-maps","tag-matplotlib","tag-pandas","tag-population","tag-python","tag-zoomed_inset_axes"],"_links":{"self":[{"href":"https:\/\/wollen.org\/blog\/wp-json\/wp\/v2\/posts\/94","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/wollen.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wollen.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/wollen.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/wollen.org\/blog\/wp-json\/wp\/v2\/comments?post=94"}],"version-history":[{"count":43,"href":"https:\/\/wollen.org\/blog\/wp-json\/wp\/v2\/posts\/94\/revisions"}],"predecessor-version":[{"id":1770,"href":"https:\/\/wollen.org\/blog\/wp-json\/wp\/v2\/posts\/94\/revisions\/1770"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/wollen.org\/blog\/wp-json\/wp\/v2\/media\/1824"}],"wp:attachment":[{"href":"https:\/\/wollen.org\/blog\/wp-json\/wp\/v2\/media?parent=94"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wollen.org\/blog\/wp-json\/wp\/v2\/categories?post=94"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wollen.org\/blog\/wp-json\/wp\/v2\/tags?post=94"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}