{"id":696,"date":"2021-09-10T07:00:26","date_gmt":"2021-09-10T12:00:26","guid":{"rendered":"https:\/\/wollen.org\/blog\/?p=696"},"modified":"2025-05-05T14:59:22","modified_gmt":"2025-05-05T19:59:22","slug":"how-important-are-turnovers-in-the-nfl","status":"publish","type":"post","link":"https:\/\/wollen.org\/blog\/2021\/09\/how-important-are-turnovers-in-the-nfl\/","title":{"rendered":"How important are turnovers in the NFL?"},"content":{"rendered":"<p>The 2021-22 NFL season kicked off last night and I&#8217;m in a football mood. Let&#8217;s continue the streak of sports posts!<\/p>\n<p>Football is often characterized as a game of inches. I&#8217;ve always taken that to mean that many seemingly small edges are actually crucial to the final score. And few advantages pack a punch like turnovers, which occur when one team takes the ball away from the other.<\/p>\n<p>Exactly how much do turnovers matter? We can find out by implementing a simple linear regression. We&#8217;ll calculate:<\/p>\n<ol>\n<li>How many points each turnover is worth, on average.<\/li>\n<li>How often teams win when they earn a turnover advantage.<\/li>\n<\/ol>\n<p>And of course we&#8217;ll plot the regression line.<\/p>\n<hr \/>\n<h4>1. Prepare the data.<\/h4>\n<p>Begin with the imports. We&#8217;ve worked with <em>pandas<\/em> and <em>Matplotlib<\/em> many times on the blog. Making a new appearance is <em>SciPy<\/em>, which we&#8217;ll use for the regression.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">import pandas as pd\r\nfrom scipy.stats import linregress\r\nimport matplotlib.pyplot as plt<\/pre>\n<p>The dataset can be downloaded from <a href=\"https:\/\/www.kaggle.com\/cviaxmiwnptr\/nfl-team-stats-20022019-espn\" target=\"_blank\" rel=\"noopener\">Kaggle<\/a>. It contains team stats from every NFL game going back to 2002.<\/p>\n<p>Get started by reading the dataset and converting its <code>date<\/code> column to datetime format.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">df = pd.read_csv(\"nfl_team_stats_2002-2020.csv\", parse_dates=[\"date\"])<\/pre>\n<p>Since we&#8217;re interested in calculating how turnovers affect the final score, we&#8217;ll set up the regression like this:<\/p>\n<ul>\n<li>Independent variable: turnover margin\n<ul>\n<li>Away minus home.<\/li>\n<li>Positive is good for the home team.<\/li>\n<\/ul>\n<\/li>\n<li>Dependent variable: score margin\n<ul>\n<li>Home minus away.<\/li>\n<li>Positive is good for the home team.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>Notice we&#8217;ve arbitrarily decided to work from the perspective of the home team. This helps us avoid double-counting games!<\/p>\n<p>Create two new columns to contain said variables.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">df.loc[:, \"turnover_margin\"] = df[\"turnovers_away\"] - df[\"turnovers_home\"]\r\ndf.loc[:, \"score_margin\"] = df[\"score_home\"] - df[\"score_away\"]\r\n<\/pre>\n<p>After that, restrict the dataframe to the most recent 10 years of games. Football fans know how much the sport has changed throughout its history. Limiting analysis to the most recent decade, while not perfect, should better represent the modern game while still providing plenty of data points.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">df = df[df[\"date\"] &gt; pd.Timestamp(\"July 1 2011\")]<\/pre>\n<hr \/>\n<h4>2. Linear regression.<\/h4>\n<p>Now it&#8217;s time to perform a simple linear regression. Doing regressions by hand is extremely tedious, but with Python it&#8217;s as easy as passing two iterables into <code>scipy.stats.linregress()<\/code>. This function returns five values but we&#8217;re interested in three:<\/p>\n<ol>\n<li><code>slope<\/code>\n<ol>\n<li>The slope (rise over run) of the regression line.<\/li>\n<li>Coefficient \u03b2<sub>1<\/sub> in the equation below.<\/li>\n<\/ol>\n<\/li>\n<li><code>intercept<\/code>\n<ol>\n<li>The y-intercept of the regression line.<\/li>\n<li>Constant \u03b2<sub>0<\/sub> in the equation below.<\/li>\n<\/ol>\n<\/li>\n<li><code>r_value<\/code>\n<ol>\n<li>This is <em>r<\/em>, the correlation coefficient.<\/li>\n<li>We&#8217;ll use it to calculate <em>R\u00b2<\/em>, which essentially tells us how tightly data points fit the regression line.<\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<p><a href=\"https:\/\/wollen.org\/blog\/wp-content\/uploads\/2021\/09\/regression_equation.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-715 size-medium\" src=\"https:\/\/wollen.org\/blog\/wp-content\/uploads\/2021\/09\/regression_equation-300x56.png\" alt=\"\" width=\"300\" height=\"56\" srcset=\"https:\/\/wollen.org\/blog\/wp-content\/uploads\/2021\/09\/regression_equation-300x56.png 300w, https:\/\/wollen.org\/blog\/wp-content\/uploads\/2021\/09\/regression_equation.png 461w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>Pass the appropriate dataframe columns into <code>scipy.stats.linregress()<\/code>:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">slope, intercept, r_value, p_value, std_err = linregress(df[\"turnover_margin\"],\r\n                                                         df[\"score_margin\"])<\/pre>\n<p>If you print <code>slope<\/code> and <code>intercept<\/code> values you&#8217;ll find:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">slope = 4.53\r\nintercept = 1.96<\/pre>\n<p><strong>This means each turnover is worth about <span style=\"text-decoration: underline;\">4.5 points<\/span> to the final score margin.<\/strong><\/p>\n<p>Be careful not to interpret it as a causal relationship. Although turnovers almost certainly <em>cause<\/em> a change in the final score, it would be overstating the capabilities of our methods. Strictly speaking we&#8217;re only observing the variables&#8217; correlation.<\/p>\n<p>Also notice that <code>intercept<\/code> is clearly non-zero. That&#8217;s because we&#8217;re analyzing the data from the perspective of the home team. In the NFL, home field advantage is worth a couple points. In a game of two equally matched teams with no turnover advantage, you&#8217;d expect the home team to be favored by approximately 2 points.<\/p>\n<hr \/>\n<h4>3. Plot the data.<\/h4>\n<p>Next we&#8217;ll plot all data points along with the regression line. Start by creating a pair of lists to hold the regression line data. A straight line only really needs two ordered pairs. We can use <code>min()<\/code> and <code>max()<\/code> to easily match a line to the scatter plot.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">x_regression = [min(df[\"turnover_margin\"]), max(df[\"turnover_margin\"])]\r\ny_regression = [n * slope + intercept for n in x_regression]<\/pre>\n<p>The <em>Matplotlib<\/em> code is fairly straightforward. We create two <em>axes<\/em>, one each for the scatter plot and line plot.<\/p>\n<p>I like to use a square window for regressions: <code>figsize=(8, 8)<\/code>. And whenever possible, locate (0, 0) at the center of the window. In other words since the x-axis extends to +7, it should extend to -7 as well.<\/p>\n<p>A few more notes about the figure:<\/p>\n<ol>\n<li>The regression equation is displayed by a legend.<\/li>\n<li>R\u00b2 is placed manually as text with a <em>bbox<\/em> to aid visibility. It&#8217;s calculated by squaring the correlation coefficient.<\/li>\n<li>You can pass <em>pandas<\/em> dataframe columns directly into Matplotlib axes. There&#8217;s no need to convert data types.<\/li>\n<li>I like to add <code>alpha<\/code> (transparency) to scatter plots when there are many overlapping points.<\/li>\n<li>I use my custom <code>wollen_dark<\/code> style. It will be linked at the bottom of this post.<\/li>\n<\/ol>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">plt.style.use(\"wollen_dark.mplstyle\")\r\n\r\nfig, ax = plt.subplots(figsize=(8, 8))\r\n\r\nax.scatter(df[\"turnover_margin\"], df[\"score_margin\"],\r\n           color=\"#FFF\", s=30, alpha=0.5)\r\n\r\nax.plot(x_regression, y_regression,\r\n        color=\"#D50A0A\", linewidth=3.0,\r\n        label=f\"y={slope:.2f}*x{intercept:+.2f}\")\r\n\r\nax.set(xticks=range(-7, 8), xlim=(-7.25, 7.25), xlabel=\"Turnover Margin\",\r\n       yticks=range(-60, 70, 10), ylim=(-62, 62), ylabel=\"Score Margin\")\r\n\r\nax.set_title(\"NFL Turnovers &amp; Final Score  |  2011\u20132020\")\r\n\r\nax.text(4.5, -36, f\"R\u00b2 = {r_value**2:.4f}\",\r\n        {\"fontname\": \"Ubuntu Condensed\", \"fontsize\": 12, \"color\": \"#000\"},\r\n        bbox={\"boxstyle\": \"round\", \"facecolor\": \"#FFF\", \"linewidth\": 0.25, \"alpha\": 0.9, \"pad\": 0.25})\r\n\r\nplt.legend(loc=\"upper left\")\r\n\r\nplt.show()<\/pre>\n<p><strong>The output:<\/strong><\/p>\n<p><a href=\"https:\/\/wollen.org\/blog\/wp-content\/uploads\/2021\/09\/nfl_turnovers_scoring.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-2816 size-full\" src=\"https:\/\/wollen.org\/blog\/wp-content\/uploads\/2021\/09\/nfl_turnovers_scoring.png\" alt=\"\" width=\"800\" height=\"800\" srcset=\"https:\/\/wollen.org\/blog\/wp-content\/uploads\/2021\/09\/nfl_turnovers_scoring.png 800w, https:\/\/wollen.org\/blog\/wp-content\/uploads\/2021\/09\/nfl_turnovers_scoring-300x300.png 300w, https:\/\/wollen.org\/blog\/wp-content\/uploads\/2021\/09\/nfl_turnovers_scoring-150x150.png 150w, https:\/\/wollen.org\/blog\/wp-content\/uploads\/2021\/09\/nfl_turnovers_scoring-768x768.png 768w\" sizes=\"auto, (max-width: 800px) 100vw, 800px\" \/><\/a><\/p>\n<hr \/>\n<h4>4. Analysis.<\/h4>\n<p>With the regression done, let&#8217;s check how often winning the turnover battle leads to victory.<\/p>\n<p>First, filter out all games where turnover margin is zero, then check how often either of the following conditions is true:<\/p>\n<ul>\n<li><strong>Positive<\/strong> turnover margin and <strong>positive<\/strong> score margin.\n<ul>\n<li>Indicating the <strong>home<\/strong> team won both the turnover battle and the game.<\/li>\n<\/ul>\n<\/li>\n<li><strong>Negative<\/strong> turnover margin and <strong>negative<\/strong> score margin.\n<ul>\n<li>Indicating the <strong>away<\/strong> team won both the turnover battle and the game.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>Create new views of the original dataframe and check their size using the <code>shape<\/code> attribute. Calculate a percentage by dividing these sizes.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">df = df[df[\"turnover_margin\"] != 0]\r\n\r\ndf2 = df[(df[\"turnover_margin\"] &gt; 0) &amp; (df[\"score_margin\"] &gt; 0)]\r\ndf3 = df[(df[\"turnover_margin\"] &lt; 0) &amp; (df[\"score_margin\"] &lt; 0)]\r\n\r\nresult_follows_turnovers_percent = (df2.shape[0] + df3.shape[0]) \/ df.shape[0] * 100\r\nprint(f\"Teams with a positive turnover margin win {result_follows_turnovers_percent:.1f}% of the time.\")<\/pre>\n<hr \/>\n<div style=\"margin-left: 10%; margin-right: 10%;\">\n<p><strong>Note:<\/strong> We filtered the dataframe with multiple conditions by using the <code>&amp;<\/code> operator. Another option is to use <code>DataFrame.query<\/code>, which often results in more readable code. Lines 3-4 above could have been written this way:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">df2 = df.query(\"turnover_margin &gt; 0 &amp; score_margin &gt; 0\")\r\ndf3 = df.query(\"turnover_margin &lt; 0 &amp; score_margin &lt; 0\")<\/pre>\n<\/div>\n<hr \/>\n<p>The output:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">Teams with a positive turnover margin win 77.6% of the time.<\/pre>\n<p><strong>To answer the original question:<\/strong> turnovers are incredibly important in the NFL! In fact if you adjust the above code to calculate how often a +1 turnover margin leads to victory, you&#8217;ll find it&#8217;s 66.5%. So being just one turnover ahead leads to a 2-to-1 win rate. A +2 advantage pushes the rate well above 80%. There may be a team statistic that&#8217;s more correlated with final result but I&#8217;m not aware of it.<\/p>\n<p>Enjoy the upcoming NFL season and make sure your team wins the turnover battle!<\/p>\n<hr \/>\n<p><strong><a href=\"https:\/\/wollen.org\/misc\/nfl_turnovers_score_9-10-2021.zip\">Download the data<\/a>.<\/strong><\/p>\n<p><strong>Full code:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">import pandas as pd\r\nfrom scipy.stats import linregress\r\nimport matplotlib.pyplot as plt\r\n\r\n\r\ndf = pd.read_csv(\"nfl_team_stats_2002-2020.csv\", parse_dates=[\"date\"])\r\n\r\ndf.loc[:, \"turnover_margin\"] = df[\"turnovers_away\"] - df[\"turnovers_home\"]\r\ndf.loc[:, \"score_margin\"] = df[\"score_home\"] - df[\"score_away\"]\r\n\r\ndf = df[df[\"date\"] &gt; pd.Timestamp(\"July 1 2011\")]\r\n\r\nslope, intercept, r_value, p_value, std_err = linregress(df[\"turnover_margin\"], df[\"score_margin\"])\r\n\r\nx_regression = [min(df[\"turnover_margin\"]), max(df[\"turnover_margin\"])]\r\ny_regression = [n * slope + intercept for n in x_regression]\r\n\r\nplt.style.use(\"wollen_dark.mplstyle\")\r\n\r\nfig, ax = plt.subplots(figsize=(8, 8))\r\n\r\nax.scatter(df[\"turnover_margin\"], df[\"score_margin\"], color=\"#FFF\", s=30, alpha=0.5)\r\n\r\nax.plot(x_regression, y_regression, color=\"#D50A0A\", linewidth=3.0, label=f\"y={slope:.2f}*x{intercept:+.2f}\")\r\n\r\nax.set(xticks=range(-7, 8), xlim=(-7.25, 7.25), yticks=range(-60, 70, 10), ylim=(-62, 62),\r\n       xlabel=\"Turnover Margin\", ylabel=\"Score Margin\")\r\n\r\nax.set_title(\"NFL Turnovers &amp; Final Score  |  2011\u20132020\")\r\n\r\nax.text(4.5, -36, f\"R\u00b2 = {r_value**2:.4f}\", {\"fontname\": \"Ubuntu Condensed\", \"fontsize\": 12, \"color\": \"#000\"},\r\n        bbox={\"boxstyle\": \"round\", \"facecolor\": \"#FFF\", \"linewidth\": 0.25, \"alpha\": 0.9, \"pad\": 0.25})\r\n\r\nplt.legend(loc=\"upper left\")\r\n\r\nplt.show()\r\n\r\ndf = df[df[\"turnover_margin\"] != 0]\r\n\r\ndf2 = df[(df[\"turnover_margin\"] &gt; 0) &amp; (df[\"score_margin\"] &gt; 0)]\r\ndf3 = df[(df[\"turnover_margin\"] &lt; 0) &amp; (df[\"score_margin\"] &lt; 0)]\r\n\r\nresult_follows_turnovers_percent = (df2.shape[0] + df3.shape[0]) \/ df.shape[0] * 100\r\nprint(f\"Teams with a positive turnover margin win {result_follows_turnovers_percent:.1f}% of the time.\")<\/pre>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The 2021-22 NFL season kicked off last night and I&#8217;m in a football mood. Let&#8217;s continue the streak of sports posts! Football is often characterized as a game of inches. I&#8217;ve always taken that to mean that many seemingly small<\/p>\n","protected":false},"author":1,"featured_media":2818,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[59,215,469],"tags":[39,22,122,53,120,44,123,47,24,119,30,46,25,117,118,60,63,116,121],"class_list":["post-696","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-sports","category-betting","category-stats","tag-code","tag-data","tag-dataset","tag-datetime","tag-football","tag-games","tag-kaggle","tag-math","tag-matplotlib","tag-nfl","tag-pandas","tag-plot","tag-python","tag-regression","tag-scipy","tag-sports","tag-statistics","tag-stats","tag-turnovers"],"_links":{"self":[{"href":"https:\/\/wollen.org\/blog\/wp-json\/wp\/v2\/posts\/696","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/wollen.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wollen.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/wollen.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/wollen.org\/blog\/wp-json\/wp\/v2\/comments?post=696"}],"version-history":[{"count":52,"href":"https:\/\/wollen.org\/blog\/wp-json\/wp\/v2\/posts\/696\/revisions"}],"predecessor-version":[{"id":2817,"href":"https:\/\/wollen.org\/blog\/wp-json\/wp\/v2\/posts\/696\/revisions\/2817"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/wollen.org\/blog\/wp-json\/wp\/v2\/media\/2818"}],"wp:attachment":[{"href":"https:\/\/wollen.org\/blog\/wp-json\/wp\/v2\/media?parent=696"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wollen.org\/blog\/wp-json\/wp\/v2\/categories?post=696"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wollen.org\/blog\/wp-json\/wp\/v2\/tags?post=696"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}