Government

Violent crime in the United States

The FBI Uniform Crime Reporting Program (UCR) collects crime-related data from thousands of law enforcement agencies across the country. The FBI then compiles the data and publishes quarterly and annual reports. It isn’t a perfect snapshot of crime as participation among agencies is voluntary, but it is the best picture we have.

For this post I’ll be digging into the most recent annual release, October 2023, which covers data through the end of 2022. You can download the spreadsheet I’m using by clicking over to the FBI’s Crime Data Explorer and finding the section shown below. I’ll also link the file directly at the bottom of this post.


1. Prepare the data.

You might notice the spreadsheet isn’t as clean as the CSV files I usually work with here. The federal government doesn’t have the softest touch. But in the spirit of imperfect data I want to leave the Excel file as it is and go through the process of cleaning it for presentation. Sure, it would be easier to copy-and-paste the target data into a clean file and manually remove any weird stuff, like footnotes or trailing white space. But data is rarely in a perfect form so let’s practice meeting it where it is.

After importing pandas and Matplotlib, set an option that will allow us to see the entire DataFrame’s width on screen. Because the spreadsheet has so many columns—and several with very long names—pandas will try to collapse the DataFrame whenever we print it to screen. It will replace the majority of the data with ellipses. Changing this setting will avoid truncation and make it easier to visualize our work.

import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.offsetbox import OffsetImage, AnnotationBbox

pd.set_option("display.expand_frame_repr", False)

Read the spreadsheet into a DataFrame and take a look.

df = pd.read_excel("Table_1_Crime_in_the_United_States_by_Volume_and_Rate_per_100000_Inhabitants_2003-2022.xlsx")

print(df.head())

The output is shown below. In this case I’m using ellipses because it would be too much to display in a blog post. The important thing is to see what a mess the DataFrame is.

                                             Table 1                    Unnamed: 2  [...]
0                         Crime in the United States          NaN              NaN  [...]
1  by Volume and Rate per 100,000 Inhabitants, 20...          NaN              NaN  [...]
2                                               Year  Population1  Violent\ncrime2  [...]
3                                               2003    290809777          1459416  [...]
4                                               2004    293655404          1428745  [...]

The first problem is that the column labels (Year, Population1, et al.) are located in the third row.

To solve it we can grab that specific row using iloc and parse the strings into something more readable, then tell the DataFrame that those are its new column labels.

column_labels = [str(item).replace("\n", " ").replace("  ", " ").strip() for item in df.iloc[2]]
df.columns = column_labels

The second problem is the extraneous rows at top and bottom of the spreadsheet. We’re only interested in rows that correspond to yearly data from 2003 to 2022.

We can simply redefine df to be a subset of its rows.

df = df[3:23]

The last change we’ll need to make is in the Year column. There is a footnote in the 2021 row that changes its value. The tail of the column looks like this:

     Year
18   2018
19   2019
20   2020
21  20215
22   2022

But this is an easy fix. Rather than applying a function we can just redefine the row to be the numbers from 2003 to 2022.

df.loc[:, 'Year'] = range(2003, 2023)

At this point we could begin plotting but I have an idea to make the data a little more informative.

I want to create two plots in the output image. On top will be the most-often cited statistic, overall violent crime rate. And on bottom it will show percent change in each subcategory (murder, assault, robbery) over the past 20 years. In absolute terms, robbery rate is much higher than murder rate, but we can peg both at 100% and see how they change relative to each other.

Create new columns to hold this data. Divide each cell by the topmost row in its column (2003) and multiply by 100 to express the value as a percentage.

df.loc[:, 'murder_rate_change'] = df['Murder and nonnegligent manslaughter rate'] / df['Murder and nonnegligent manslaughter rate'].iloc[0] * 100
df.loc[:, 'assault_rate_change'] = df['Aggravated assault rate'] / df['Aggravated assault rate'].iloc[0] * 100
df.loc[:, 'robbery_rate_change'] = df['Robbery rate'] / df['Robbery rate'].iloc[0] * 100

2. Plot the data.

This plot uses a custom Matplotlib style that I’ll link at the bottom of this post.

Specify the shape of the subplot array in plt.subplots()—2 rows, 1 column. In other words we’ll have two plots stacked in a vertical orientation. When I create multiple axes on the same figure I like to name them axs rather than the usual ax. Each can be addressed as axs[0] or axs[1].

plt.style.use("wollen_dark.mplstyle")
fig, axs = plt.subplots(2, 1, figsize=(14, 11))

The violent crime rate plot is below. The Matplotlib code is fairly straightforward so I won’t bog down the post with too much commentary.

I’m overlaying an FBI seal onto the plot with low alpha to give it a watermark effect. The trickiest part is to correctly set box_alignment. (0, 0) means the lower-left corner of the image will be placed at the specified location. (0.5, 0.5) would center the image at that point.

axs[0].plot(df['Year'], df['Violent crime rate'], color="#B0E441", marker="o", markersize=6)

x_ticks = df['Year'].tolist()
axs[0].set_xticks(x_ticks)
x_range = x_ticks[-1] - x_ticks[0]
x_left, x_right = x_ticks[0] - x_range * 0.02, x_ticks[-1] + x_range * 0.02
axs[0].set_xlim(x_left, x_right)

y_ticks = range(360, 540, 20)
axs[0].set_yticks(y_ticks)
y_range = y_ticks[-1] - y_ticks[0]
y_bottom, y_top = y_ticks[0] - y_range * 0.01, y_ticks[-1] + y_range * 0.01
axs[0].set_ylim(y_bottom, y_top)
axs[0].set_ylabel("Rate per 100,000")

axs[0].set_title("United States Violent Crime Rate  •  2003–2022")

axs[0].text(x_right - x_range * 0.005, y_ticks[-1] - y_range * 0.01,
            "Data:  FBI Crime in the Nation, October 2023.",
            size=11, ha="right", va="top")

ab = AnnotationBbox(OffsetImage(plt.imread("fbi_seal.png"), zoom=0.2, alpha=0.05),
                    (x_ticks[0], y_ticks[0]), box_alignment=(0, 0), frameon=False)
axs[0].add_artist(ab)

Next is the bottom plot, axs[1], which will display change in violent crime subcategories over the past 20 years.

This code is a little simpler because it doesn’t include an image or citation text. We can also reuse previous definitions for the x-axis.

Remember to include a legend so readers can identity each crime subcategory. The data series trend downward so lower-left is a nice spot for a legend.

axs[1].plot(df['Year'], df['murder_rate_change'], marker="o", markersize=6, label="Murder and Nonnegligent Manslaughter")
axs[1].plot(df['Year'], df['assault_rate_change'], marker="o", markersize=6, label="Aggravated Assault")
axs[1].plot(df['Year'], df['robbery_rate_change'], marker="o", markersize=6, label="Robbery")

axs[1].set_xticks(x_ticks)
axs[1].set_xlim(x_left, x_right)

y_ticks = range(0, 140, 20)
axs[1].set_yticks(y_ticks)
axs[1].set_yticklabels([f"{n}%" for n in y_ticks])
y_range = y_ticks[-1] - y_ticks[0]
y_bottom, y_top = y_ticks[0] - y_range * 0.01, y_ticks[-1] + y_range * 0.01
axs[1].set_ylim(y_bottom, y_top)

axs[1].set_title("Subcategories  •  Change Since 2003")

axs[1].legend(loc="lower left")

Finally, save the figure. I bump up dpi from its default 100 to aid readability.

plt.savefig("fbi_violent_crime.png", dpi=150)

3. The output.

Overall the country is much safer than it was 20 years ago. We saw an uptick in violent crime during the COVID-19 pandemic, especially within the murder category, but we’ve nearly returned to the lowest levels on record.

So far, quarterly data has pointed to another large decline in 2023. Experts say it could be the largest recorded year-over-year drop in the murder rate. We’ll learn more when the FBI released full 2023 data this October.


Download the data.

Full code:

import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.offsetbox import OffsetImage, AnnotationBbox


pd.set_option("display.expand_frame_repr", False)

df = pd.read_excel("Table_1_Crime_in_the_United_States_by_Volume_and_Rate_per_100000_Inhabitants_2003-2022.xlsx")

column_labels = [str(item).replace("\n", " ").replace("  ", " ").strip() for item in df.iloc[2]]
df.columns = column_labels

df = df[3:23]

df.loc[:, 'Year'] = range(2003, 2023)

df.loc[:, 'murder_rate_change'] = df['Murder and nonnegligent manslaughter rate'] / df['Murder and nonnegligent manslaughter rate'].iloc[0] * 100
df.loc[:, 'assault_rate_change'] = df['Aggravated assault rate'] / df['Aggravated assault rate'].iloc[0] * 100
df.loc[:, 'robbery_rate_change'] = df['Robbery rate'] / df['Robbery rate'].iloc[0] * 100

print(df.head(50))

plt.style.use("wollen_dark.mplstyle")
fig, axs = plt.subplots(2, 1, figsize=(14, 11))

axs[0].plot(df['Year'], df['Violent crime rate'], color="#B0E441", marker="o", markersize=6)

x_ticks = df['Year'].tolist()
axs[0].set_xticks(x_ticks)
x_range = x_ticks[-1] - x_ticks[0]
x_left, x_right = x_ticks[0] - x_range * 0.02, x_ticks[-1] + x_range * 0.02
axs[0].set_xlim(x_left, x_right)

y_ticks = range(360, 540, 20)
axs[0].set_yticks(y_ticks)
y_range = y_ticks[-1] - y_ticks[0]
y_bottom, y_top = y_ticks[0] - y_range * 0.01, y_ticks[-1] + y_range * 0.01
axs[0].set_ylim(y_bottom, y_top)
axs[0].set_ylabel("Rate per 100,000")

axs[0].set_title("United States Violent Crime Rate  •  2003–2022")

axs[0].text(x_right - x_range * 0.005, y_ticks[-1] - y_range * 0.01,
            "Data:  FBI Crime in the Nation, October 2023.",
            size=11, ha="right", va="top")

ab = AnnotationBbox(OffsetImage(plt.imread("fbi_seal.png"), zoom=0.2, alpha=0.05),
                    (x_ticks[0], y_ticks[0]), box_alignment=(0, 0), frameon=False)
axs[0].add_artist(ab)

axs[1].plot(df['Year'], df['murder_rate_change'], marker="o", markersize=6, label="Murder and Nonnegligent Manslaughter")
axs[1].plot(df['Year'], df['assault_rate_change'], marker="o", markersize=6, label="Aggravated Assault")
axs[1].plot(df['Year'], df['robbery_rate_change'], marker="o", markersize=6, label="Robbery")

axs[1].set_xticks(x_ticks)
axs[1].set_xlim(x_left, x_right)

y_ticks = range(0, 140, 20)
axs[1].set_yticks(y_ticks)
axs[1].set_yticklabels([f"{n}%" for n in y_ticks])
y_range = y_ticks[-1] - y_ticks[0]
y_bottom, y_top = y_ticks[0] - y_range * 0.01, y_ticks[-1] + y_range * 0.01
axs[1].set_ylim(y_bottom, y_top)

axs[1].set_title("Subcategories  •  Change Since 2003")

axs[1].legend(loc="lower left")

plt.savefig("fbi_violent_crime.png", dpi=150)