How much is NBA home court advantage worth?
You’ll often hear people talk about home field advantage in football, which simply means that home crowd, lack of travel, etc. are worth a couple points to the final score. It’s discussed less often in basketball but the advantage is just as real.
Let’s look at historical NBA data and measure both the expected advantage according to oddsmakers and the empirical value from scoring data. We’ll plot each season’s average and see how home court has changed over the years.
1. Prepare the data.
I’ll use the Kaggle dataset here. It contains both scoring and betting data beginning with the 2007-2008 season. Load the CSV into a pandas DataFrame and take a look at the relevant columns.
import pandas as pd df = pd.read_csv("nba_2008-2024.csv") print(df[['season', 'date', 'away', 'home', 'score_away', 'score_home', 'whos_favored', 'spread']].head())
The output:
season date away home score_away score_home whos_favored spread 0 2008 2007-10-30 por sa 97 106 home 13.0 1 2008 2007-10-30 utah gs 117 96 home 1.0 2 2008 2007-10-30 hou lal 95 93 away 5.0 3 2008 2007-10-31 phi tor 97 106 home 6.5 4 2008 2007-10-31 wsh ind 110 119 away 1.5
The whos_favored column is always either “home” or “away” and spread is always a positive number. Notice that season is encoded as the year the season ends, e.g. 2007-08 becomes 2008.
We’re interested first in real-world home court advantage according to the final score. Create a new column to hold this data and later we’ll calculate each season’s average.
df.loc[:, 'home_score_margin'] = df['score_home'] - df['score_away']
We’re also interested in the expected advantage according to the point spread. Just like before, we want a positive value to indicate the home team is favored and negative to be an away favorite.
df.loc[:, 'home_favored_by'] = df.apply(lambda row: row['spread'] if row['whos_favored'] == "home" else row['spread'] * -1, axis=1)
I’m certainly not a pandas expert but I’ve learned that groupby
, while intimidating at first, is a powerful method to have in your arsenal. A few years ago it would have been tempting to iterate through each season, filter the DataFrame, and calculate averages individually. But that’s less efficient and (more importantly to me) a lot more work.
It helps me to think about what I’m doing in plain English. We want to group the data into separate buckets according to the season column, so that’s the argument passed to groupby
. We’re interested in the two recently created columns so those are referenced as a list. agg
applies a function to each bucket of data.
df2 = df.groupby("season")[['home_score_margin', 'home_favored_by']].agg("mean") print(df2.head())
Things becomes clearer when you look at the new DataFrame.
home_score_margin home_favored_by season 2008 3.712766 3.476064 2009 3.309506 3.377567 2010 2.842226 3.323933 2011 3.194508 3.271930 2012 2.965549 3.167132
The DataFrame’s index is season and it holds yearly averages for each column. That’s everything we need. Next we can plot the data.
2. Plot the data.
I’ll use a custom Matplotlib style I created to emulate FiveThirtyEight. It won’t actually look like a FiveThirtyEight plot because it will use NBA-themed colors, but it provides a good blank slate that’s less off-putting than default Matplotlib.
Create an Axes instance and pass the appropriate df2
columns to scatter
. Remember that x-axis data, season, is the DataFrame’s index.
import matplotlib.pyplot as plt plt.style.use("wollen_538.mplstyle") fig, ax = plt.subplots() ax.scatter(df2.index, df2['home_score_margin'], color="#DB132E", marker="h", s=110, edgecolor="#555", linewidth=1.0, label="Score Margin") ax.scatter(df2.index, df2['home_favored_by'], color="#00418D", marker="D", s=60, edgecolor="#555", linewidth=1.0, label="Favored By")
We could hard-code ticks and window limits and it would require fewer lines of code, but I generally try to avoid it. It will be easier to reuse this script in a year or two when I return with new data.
NBA seasons span two calendar years so let’s communicate that along the x-axis. That means labels take up more space so let’s also rotate them 60 degrees.
x_ticks = range(df2.index.min(), df2.index.max() + 1) ax.set_xticks(x_ticks, labels=[f"{n - 1}-{n - 2000:02}" for n in x_ticks]) plt.setp(ax.xaxis.get_majorticklabels(), rotation=60, ha="right", rotation_mode="anchor") x_tick_range = x_ticks[-1] - x_ticks[0] ax.set_xlim(x_ticks[0] - x_tick_range * 0.03, x_ticks[-1] + x_tick_range * 0.02)
Identify the bottom and top y-ticks using a while
loop. We can staple the two columns together with concat
to make sure we consider the overall minimum and maximum values.
bottom_y_tick = 10.0 while bottom_y_tick > pd.concat([df2['home_score_margin'], df2['home_favored_by']]).min(): bottom_y_tick -= 0.5 top_y_tick = 0.0 while top_y_tick < pd.concat([df2['home_score_margin'], df2['home_favored_by']]).max(): top_y_tick += 0.5 y_ticks = arange(bottom_y_tick, top_y_tick + 0.5, 0.5) ax.set_yticks(y_ticks) y_tick_range = y_ticks[-1] - y_ticks[0] ax.set_ylim(y_ticks[0] - y_tick_range * 0.03, y_ticks[-1] + y_tick_range * 0.005)
Finally, create a legend in the upper-right corner, set plot labels, and save the figure.
ax.legend(loc="upper right") ax.set_ylabel("Points") ax.set_title("NBA • Home Court Advantage") plt.savefig("nba_hca.png", dpi=200)
3. The output.
To answer the original question, NBA home court advantage is worth about 2 to 2.5 points.
What’s interesting to me is how clearly the advantage has trended down over the past 17 years. 2019-20 and 2020-21 were affected by the COVID-19 “bubble” and reduced crowd sizes. But even if you throw out those seasons, home court has lost a full point of value.
I don’t think there’s any clear answer as to why this happened. Are home crowds really less rowdy than they were 20 years ago? I’m sure you could find grumpy fans who insist that people are too busy playing on their phones to be loud. I would look more toward innovation in travel methods. Teams have better optimized routines that help them arrive healthy and ready to perform. In addition, salaries have grown so players are more incentivized to take those routines seriously.
Still, home court will always have some positive value. We’ll have to circle back in a few years to see where the trend has leveled off.
Download the Matplotlib style.
Full code:
import pandas as pd import matplotlib.pyplot as plt from numpy import arange df = pd.read_csv("nba_2008-2024.csv") print(df[['season', 'date', 'away', 'home', 'score_away', 'score_home', 'whos_favored', 'spread']].head()) df.loc[:, 'home_score_margin'] = df['score_home'] - df['score_away'] df.loc[:, 'home_favored_by'] = df.apply(lambda row: row['spread'] if row['whos_favored'] == "home" else row['spread'] * -1, axis=1) df2 = df.groupby("season")[['home_score_margin', 'home_favored_by']].agg("mean") print(df2.head()) plt.style.use("wollen_538.mplstyle") fig, ax = plt.subplots() ax.scatter(df2.index, df2['home_score_margin'], color="#DB132E", marker="h", s=110, edgecolor="#555", linewidth=1.0, label="Score Margin") ax.scatter(df2.index, df2['home_favored_by'], color="#00418D", marker="D", s=60, edgecolor="#555", linewidth=1.0, label="Favored By") x_ticks = range(df2.index.min(), df2.index.max() + 1) ax.set_xticks(x_ticks, labels=[f"{n - 1}-{n - 2000:02}" for n in x_ticks]) plt.setp(ax.xaxis.get_majorticklabels(), rotation=60, ha="right", rotation_mode="anchor") x_tick_range = x_ticks[-1] - x_ticks[0] ax.set_xlim(x_ticks[0] - x_tick_range * 0.03, x_ticks[-1] + x_tick_range * 0.02) bottom_y_tick = 10.0 while bottom_y_tick > pd.concat([df2['home_score_margin'], df2['home_favored_by']]).min(): bottom_y_tick -= 0.5 top_y_tick = 0.0 while top_y_tick < pd.concat([df2['home_score_margin'], df2['home_favored_by']]).max(): top_y_tick += 0.5 y_ticks = arange(bottom_y_tick, top_y_tick + 0.5, 0.5) ax.set_yticks(y_ticks) y_tick_range = y_ticks[-1] - y_ticks[0] ax.set_ylim(y_ticks[0] - y_tick_range * 0.03, y_ticks[-1] + y_tick_range * 0.005) ax.legend(loc="upper right") ax.set_ylabel("Points") ax.set_title("NBA • Home Court Advantage") plt.savefig("nba_hca.png", dpi=200)