{"id":1832,"date":"2024-12-05T07:00:53","date_gmt":"2024-12-05T13:00:53","guid":{"rendered":"https:\/\/wollen.org\/blog\/?p=1832"},"modified":"2025-03-27T08:02:37","modified_gmt":"2025-03-27T13:02:37","slug":"how-much-is-nba-home-court-advantage-worth","status":"publish","type":"post","link":"https:\/\/wollen.org\/blog\/2024\/12\/how-much-is-nba-home-court-advantage-worth\/","title":{"rendered":"How much is NBA home court advantage worth?"},"content":{"rendered":"<p>You&#8217;ll often hear people talk about <em>home field advantage<\/em> in football, which simply means that home crowd, lack of travel, etc. are worth a couple points to the final score. It&#8217;s discussed less often in basketball but the advantage is just as real.<\/p>\n<p>Let&#8217;s look at historical NBA data and measure both the expected advantage according to oddsmakers and the empirical value from scoring data. We&#8217;ll plot each season&#8217;s average and see how home court has changed over the years.<\/p>\n<hr \/>\n<h4>1. Prepare the data.<\/h4>\n<p>I&#8217;ll use the Kaggle dataset <a href=\"https:\/\/www.kaggle.com\/datasets\/cviaxmiwnptr\/nba-betting-data-october-2007-to-june-2024\" target=\"_blank\" rel=\"noopener\">here<\/a>. It contains both scoring and betting data beginning with the 2007-2008 season. Load the CSV into a pandas DataFrame and take a look at the relevant columns.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">import pandas as pd\r\n\r\ndf = pd.read_csv(\"nba_2008-2024.csv\")\r\n\r\nprint(df[['season', 'date', 'away', 'home', 'score_away', 'score_home', 'whos_favored', 'spread']].head())<\/pre>\n<p>The output:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">   season        date  away home  score_away  score_home whos_favored  spread\r\n0    2008  2007-10-30   por   sa          97         106         home    13.0\r\n1    2008  2007-10-30  utah   gs         117          96         home     1.0\r\n2    2008  2007-10-30   hou  lal          95          93         away     5.0\r\n3    2008  2007-10-31   phi  tor          97         106         home     6.5\r\n4    2008  2007-10-31   wsh  ind         110         119         away     1.5<\/pre>\n<p>The <em>whos_favored<\/em> column is always either &#8220;home&#8221; or &#8220;away&#8221; and <em>spread<\/em> is always a positive number. Notice that <em>season<\/em> is encoded as the year the season ends, e.g. 2007-08 becomes 2008.<\/p>\n<p>We&#8217;re interested first in real-world home court advantage according to the final score. Create a new column to hold this data and later we&#8217;ll calculate each season&#8217;s average.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">df.loc[:, 'home_score_margin'] = df['score_home'] - df['score_away']<\/pre>\n<p>We&#8217;re also interested in the expected advantage according to the point spread. Just like before, we want a positive value to indicate the home team is favored and negative to be an away favorite.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">df.loc[:, 'home_favored_by'] = df.apply(lambda row: row['spread'] if row['whos_favored'] == \"home\" else row['spread'] * -1, axis=1)<\/pre>\n<p>I&#8217;m certainly not a pandas expert but I&#8217;ve learned that <code>groupby<\/code>, while intimidating at first, is a powerful method to have in your arsenal. A few years ago it would have been tempting to iterate through each season, filter the DataFrame, and calculate averages individually. But that&#8217;s less efficient and (more importantly to me) a lot more work.<\/p>\n<p>It helps me to think about what I&#8217;m doing in plain English. We want to group the data into separate buckets according to the <em>season<\/em> column, so that&#8217;s the argument passed to <code>groupby<\/code>. We&#8217;re interested in the two recently created columns so those are referenced as a list. <code>agg<\/code> applies a function to each bucket of data.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">df2 = df.groupby(\"season\")[['home_score_margin', 'home_favored_by']].agg(\"mean\")\r\n\r\nprint(df2.head())<\/pre>\n<p>Things becomes clearer when you look at the new DataFrame.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">        home_score_margin  home_favored_by\r\nseason                                    \r\n2008             3.712766         3.476064\r\n2009             3.309506         3.377567\r\n2010             2.842226         3.323933\r\n2011             3.194508         3.271930\r\n2012             2.965549         3.167132<\/pre>\n<p>The DataFrame&#8217;s index is <em>season<\/em> and it holds yearly averages for each column. That&#8217;s everything we need. Next we can plot the data.<\/p>\n<hr \/>\n<h4>2. Plot the data.<\/h4>\n<p>I&#8217;ll use a custom Matplotlib style I created to emulate FiveThirtyEight. It won&#8217;t actually look like a FiveThirtyEight plot because it will use NBA-themed colors, but it provides a good blank slate that&#8217;s less off-putting than default Matplotlib.<\/p>\n<p>Create an Axes instance and pass the appropriate <code>df2<\/code> columns to <code>scatter<\/code>. Remember that x-axis data, <em>season<\/em>, is the DataFrame&#8217;s index.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">import matplotlib.pyplot as plt\r\n\r\nplt.style.use(\"wollen_538.mplstyle\")\r\nfig, ax = plt.subplots()\r\n\r\nax.scatter(df2.index, df2['home_score_margin'],\r\n           color=\"#DB132E\", marker=\"h\", s=110,\r\n           edgecolor=\"#555\", linewidth=1.0,\r\n           label=\"Score Margin\")\r\n\r\nax.scatter(df2.index, df2['home_favored_by'],\r\n           color=\"#00418D\", marker=\"D\", s=60,\r\n           edgecolor=\"#555\", linewidth=1.0,\r\n           label=\"Favored By\")<\/pre>\n<p>We could hard-code ticks and window limits and it would require fewer lines of code, but I generally try to avoid it. It will be easier to reuse this script in a year or two when I return with new data.<\/p>\n<p>NBA seasons span two calendar years so let&#8217;s communicate that along the x-axis. That means labels take up more space so let&#8217;s also rotate them 60 degrees.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">x_ticks = range(df2.index.min(), df2.index.max() + 1)\r\nax.set_xticks(x_ticks, labels=[f\"{n - 1}-{n - 2000:02}\" for n in x_ticks])\r\nplt.setp(ax.xaxis.get_majorticklabels(), rotation=60, ha=\"right\", rotation_mode=\"anchor\")\r\nx_tick_range = x_ticks[-1] - x_ticks[0]\r\nax.set_xlim(x_ticks[0] - x_tick_range * 0.03, x_ticks[-1] + x_tick_range * 0.02)<\/pre>\n<p>Identify the bottom and top y-ticks using a <code>while<\/code> loop. We can staple the two columns together with <code>concat<\/code> to make sure we consider the overall minimum and maximum values.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">bottom_y_tick = 10.0\r\nwhile bottom_y_tick &gt; pd.concat([df2['home_score_margin'], df2['home_favored_by']]).min():\r\n    bottom_y_tick -= 0.5\r\ntop_y_tick = 0.0\r\nwhile top_y_tick &lt; pd.concat([df2['home_score_margin'], df2['home_favored_by']]).max():\r\n    top_y_tick += 0.5\r\ny_ticks = arange(bottom_y_tick, top_y_tick + 0.5, 0.5)\r\nax.set_yticks(y_ticks)\r\ny_tick_range = y_ticks[-1] - y_ticks[0]\r\nax.set_ylim(y_ticks[0] - y_tick_range * 0.03, y_ticks[-1] + y_tick_range * 0.005)<\/pre>\n<p>Finally, create a legend in the upper-right corner, set plot labels, and save the figure.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">ax.legend(loc=\"upper right\")\r\n\r\nax.set_ylabel(\"Points\")\r\nax.set_title(\"NBA  \u2022  Home Court Advantage\")\r\n\r\nplt.savefig(\"nba_hca.png\", dpi=200)<\/pre>\n<hr \/>\n<h4>3. The output.<\/h4>\n<p><a href=\"https:\/\/wollen.org\/blog\/wp-content\/uploads\/2024\/12\/nba_hca.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-2448 size-full\" src=\"https:\/\/wollen.org\/blog\/wp-content\/uploads\/2024\/12\/nba_hca.png\" alt=\"\" width=\"2600\" height=\"1400\" srcset=\"https:\/\/wollen.org\/blog\/wp-content\/uploads\/2024\/12\/nba_hca.png 2600w, https:\/\/wollen.org\/blog\/wp-content\/uploads\/2024\/12\/nba_hca-300x162.png 300w, https:\/\/wollen.org\/blog\/wp-content\/uploads\/2024\/12\/nba_hca-1024x551.png 1024w, https:\/\/wollen.org\/blog\/wp-content\/uploads\/2024\/12\/nba_hca-768x414.png 768w, https:\/\/wollen.org\/blog\/wp-content\/uploads\/2024\/12\/nba_hca-1536x827.png 1536w, https:\/\/wollen.org\/blog\/wp-content\/uploads\/2024\/12\/nba_hca-2048x1103.png 2048w\" sizes=\"auto, (max-width: 2600px) 100vw, 2600px\" \/><\/a><\/p>\n<p><strong>To answer the original question, NBA home court advantage is worth about 2 to 2.5 points<\/strong>.<\/p>\n<p>What&#8217;s interesting to me is how clearly the advantage has trended down over the past 17 years. 2019-20 and 2020-21 were affected by the COVID-19 &#8220;bubble&#8221; and reduced crowd sizes. But even if you throw out those seasons, home court has lost a full point of value.<\/p>\n<p>I don&#8217;t think there&#8217;s any clear answer as to why this happened. Are home crowds really less rowdy than they were 20 years ago? I&#8217;m sure you could find grumpy fans who insist that people are too busy playing on their phones to be loud. I would look more toward innovation in travel methods. Teams have better optimized routines that help them arrive healthy and ready to perform. In addition, salaries have grown so players are more incentivized to take those routines seriously.<\/p>\n<p>Still, home court will always have some positive value. We&#8217;ll have to circle back in a few years to see where the trend has leveled off.<\/p>\n<hr \/>\n<p><a href=\"https:\/\/wollen.org\/misc\/nba_hca_2024.zip\"><strong>Download the Matplotlib style.<\/strong><\/a><\/p>\n<p><strong>Full code:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">import pandas as pd\r\nimport matplotlib.pyplot as plt\r\nfrom numpy import arange\r\n\r\n\r\ndf = pd.read_csv(\"nba_2008-2024.csv\")\r\n\r\nprint(df[['season', 'date', 'away', 'home', 'score_away', 'score_home', 'whos_favored', 'spread']].head())\r\n\r\ndf.loc[:, 'home_score_margin'] = df['score_home'] - df['score_away']\r\n\r\ndf.loc[:, 'home_favored_by'] = df.apply(lambda row: row['spread'] if row['whos_favored'] == \"home\" else row['spread'] * -1, axis=1)\r\n\r\ndf2 = df.groupby(\"season\")[['home_score_margin', 'home_favored_by']].agg(\"mean\")\r\n\r\nprint(df2.head())\r\n\r\nplt.style.use(\"wollen_538.mplstyle\")\r\nfig, ax = plt.subplots()\r\n\r\nax.scatter(df2.index, df2['home_score_margin'],\r\n           color=\"#DB132E\", marker=\"h\", s=110,\r\n           edgecolor=\"#555\", linewidth=1.0,\r\n           label=\"Score Margin\")\r\n\r\nax.scatter(df2.index, df2['home_favored_by'],\r\n           color=\"#00418D\", marker=\"D\", s=60,\r\n           edgecolor=\"#555\", linewidth=1.0,\r\n           label=\"Favored By\")\r\n\r\nx_ticks = range(df2.index.min(), df2.index.max() + 1)\r\nax.set_xticks(x_ticks, labels=[f\"{n - 1}-{n - 2000:02}\" for n in x_ticks])\r\nplt.setp(ax.xaxis.get_majorticklabels(), rotation=60, ha=\"right\", rotation_mode=\"anchor\")\r\nx_tick_range = x_ticks[-1] - x_ticks[0]\r\nax.set_xlim(x_ticks[0] - x_tick_range * 0.03, x_ticks[-1] + x_tick_range * 0.02)\r\n\r\nbottom_y_tick = 10.0\r\nwhile bottom_y_tick &gt; pd.concat([df2['home_score_margin'], df2['home_favored_by']]).min():\r\n    bottom_y_tick -= 0.5\r\ntop_y_tick = 0.0\r\nwhile top_y_tick &lt; pd.concat([df2['home_score_margin'], df2['home_favored_by']]).max():\r\n    top_y_tick += 0.5\r\ny_ticks = arange(bottom_y_tick, top_y_tick + 0.5, 0.5)\r\nax.set_yticks(y_ticks)\r\ny_tick_range = y_ticks[-1] - y_ticks[0]\r\nax.set_ylim(y_ticks[0] - y_tick_range * 0.03, y_ticks[-1] + y_tick_range * 0.005)\r\n\r\nax.legend(loc=\"upper right\")\r\n\r\nax.set_ylabel(\"Points\")\r\nax.set_title(\"NBA  \u2022  Home Court Advantage\")\r\n\r\nplt.savefig(\"nba_hca.png\", dpi=200)<\/pre>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>You&#8217;ll often hear people talk about home field advantage in football, which simply means that home crowd, lack of travel, etc. are worth a couple points to the final score. It&#8217;s discussed less often in basketball but the advantage is<\/p>\n","protected":false},"author":1,"featured_media":1846,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[59,215],"tags":[361,359,207,135,22,122,173,360,363,353,362,352,24,126,351,30,46,356,355,25,60,61,357,354,358],"class_list":["post-1832","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-sports","category-betting","tag-basketball","tag-bet","tag-betting","tag-csv","tag-data","tag-dataset","tag-graph","tag-historical","tag-home-court","tag-home-court-advantage","tag-home-field","tag-home-field-advantage","tag-matplotlib","tag-mplstyle","tag-nba","tag-pandas","tag-plot","tag-point-spread","tag-points","tag-python","tag-sports","tag-sports-betting","tag-spread","tag-travel","tag-vegas"],"_links":{"self":[{"href":"https:\/\/wollen.org\/blog\/wp-json\/wp\/v2\/posts\/1832","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/wollen.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wollen.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/wollen.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/wollen.org\/blog\/wp-json\/wp\/v2\/comments?post=1832"}],"version-history":[{"count":18,"href":"https:\/\/wollen.org\/blog\/wp-json\/wp\/v2\/posts\/1832\/revisions"}],"predecessor-version":[{"id":2449,"href":"https:\/\/wollen.org\/blog\/wp-json\/wp\/v2\/posts\/1832\/revisions\/2449"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/wollen.org\/blog\/wp-json\/wp\/v2\/media\/1846"}],"wp:attachment":[{"href":"https:\/\/wollen.org\/blog\/wp-json\/wp\/v2\/media?parent=1832"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wollen.org\/blog\/wp-json\/wp\/v2\/categories?post=1832"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wollen.org\/blog\/wp-json\/wp\/v2\/tags?post=1832"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}