Entertainment

How much does a Best Picture award cost?

This year Oppenheimer took home the Academy Awards’ Oscar for Best Picture. It was a somewhat unusual winner because the movie also performed exceptionally well at the box office. In recent years, the Best Picture category has favored more niche films with narrower appeal and, most relevant to this post, lower budgets.

It’s an interesting narrative but what does the data show? I did a quick scrape of the Best Picture Wikipedia page for title and budget information. I managed to nail down the budget for all but one of the 96 winners—Going My Way (1945). I’ll link the dataset at the bottom of this post.

Of course the data is in nominal terms so we’ll have to adjust for inflation. To adjust prices we can use the U.S. Bureau of Labor Statistics’ Consumer Prince Index (CPI). CPI tracks the value of goods and services and measures the value of a dollar at any point in time. We can use it to calculate that, for example, The Godfather’s $6.6 million budget in 1972 would be equivalent to nearly $50 million today. Let’s adjust all Best Picture budgets for inflation and see how they compare to Oppenheimer.


1. Prepare the data.

Start by installing the cpi module. Then read the dataset into a pandas DataFrame.

import pandas as pd
from cpi import inflate
import matplotlib.pyplot as plt

df = pd.read_csv("best_picture_data.csv")

Columns include:

  • ceremony — Year of the Oscars Ceremony.
  • film — Title of the movie.
  • url — The Wikipedia URL.
  • budget — The movie’s nominal budget (not yet adjusted for inflation).

Using cpi.inflate() is very straightforward. We can apply the function to a new column and pass three arguments:

  1. Budget.
  2. Year of the ceremony minus one, because generally the Oscars consider movies from the previous year.
  3. The target year, 2023. We want to inflate dollars to match Oppenheimer’s release year.
df.loc[:, 'budget_cpi'] = df.apply(lambda row: inflate(row['budget'], row['ceremony'] - 1, 2023), axis=1)

At this point we have everything we need to create a bar graph. But I’d also like to plot a five-year moving average to help smooth out the data.

pandas Series types, i.e. columns, have a rolling() method that can generate a moving average. The window parameter will be 5 because at any point we want to know the average of the most recent five budgets. Since one row is missing data, we set min_periods to 4, which will accept four data points when necessary and avoid a discontinuity in the line.

df['budget_cpi_5year'] = df['budget_cpi'].rolling(window=5, min_periods=4).mean()

2. Plot the data.

I’ll use a custom mplstyle for this plot. It will be linked along with the dataset at the bottom of this post.

plt.style.use("oscars.mplstyle")
fig, ax = plt.subplots()

Plot the relevant columns—budget as a bar plot and moving average as a line.

One small detail in the moving average plot should be corrected. Set it to start with the fifth value [4:]. Because we adjusted min_periods above, pandas begins calculating a moving average with the fourth value. It wouldn’t make sense for a five-year average to appear before the fifth value is known.

Pass zorder arguments to set layer order, line above bar.

ax.bar(df['ceremony'], df['budget_cpi'], zorder=1)

ax.plot(df['ceremony'][4:], df['budget_cpi_5year'][4:], color="#DDDDDD", label="5-Year Average", zorder=2)

Next configure x-ticks and y-ticks. I like to define window limits explicitly so I can use those variables later to place text on the plot.

x_ticks = range(1925, 2030, 5)
ax.set_xticks(x_ticks)
x_left, x_right = 1923.5, 2026.5
ax.set_xlim(x_left, x_right)
plt.setp(ax.xaxis.get_majorticklabels(), rotation=60, ha="right", rotation_mode="anchor")

y_ticks = range(0, 450000000, 50000000)
ax.set_yticks(y_ticks)
y_tick_labels = [f"${n / 1000000:.0f}M" for n in y_ticks]
ax.set_yticklabels(y_tick_labels)
y_bottom, y_top = 0, y_ticks[-1] * 1.01
ax.set_ylim(y_bottom, y_top)

Create a legend to communicate what the moving average represents.

Set a title and any necessary axis labels. I usually don’t set an x-axis label when working with dates. Since the y-axis label is written sideways, I sometimes like to put an extra space between words to make it more readable.

It’s also good to make a note of how exactly we adjust for inflation. There are many inflation calculators other than CPI. Notice how we use previously defined variables (e.g. x_left) to locate the text.

ax.set_ylabel("Budget  (2023 Dollars)")

ax.set_title("Academy Awards  •  Best Picture Winner  •  Budget  •  Inflation Adjusted")

ax.legend(loc="upper left", labelcolor="#EEEEEE")

x_span = x_right - x_left
y_span = y_top - y_bottom
ax.text(x_right - x_span * 0.005, y_ticks[-1] - y_span * 0.005,
        "Values adjusted using CPI inflation data.",
        size=9, ha="right", va="top")

I think when viewers see this plot they will immediately try to guess which bars represent which movies. It would make the plot more interesting to annotate the most noteworthy films. I picked a few outliers, like Gone with the Wind (1940). And winners that signaled a changing trend, like Forrest Gump (1995).

highlight_coords is a list of tuples containing title, x- and y-positions, and text alignment. I use a nested for-loop to iterate through both this list and the DataFrame. At each match, I call annotate() to place text and an arrow on the plot. You might call this plot excessively annotated and you wouldn’t necessarily be wrong. I still think it’s fun to see all the movie titles.

highlight_coords = [("Gone with the Wind", 1938, 125, "center"),
                    ("Ben-Hur", 1959, 195, "right"),
                    ("Lawrence of Arabia", 1963, 230, "center"),
                    ("My Fair Lady", 1965, 200, "left"),
                    ("Forrest Gump", 1993, 135, "right"),
                    ("Braveheart", 1995, 175, "right"),
                    ("Titanic", 1996, 388, "right"),
                    ("Gladiator", 2000, 240, "left"),
                    ("LOTR: Return of the King", 2003, 205, "left"),
                    ("The Departed", 2008, 170, "left"),
                    ("Argo", 2013, 95, "center"),
                    ("Oppenheimer", 2024, 140, "right")]

for text, x_pos, y_pos, align in highlight_coords:
    for r, row in df.iterrows():
        if row['film'] == text:
            ax.annotate(row['film'], ha=align, size=10,
                        xy=(row['ceremony'], row['budget_cpi'] + y_span * 0.008), xytext=(x_pos, y_pos * 1e6),
                        arrowprops={"arrowstyle": "wedge", "color": "#EEEEEE"})

Finally, save the figure.

plt.savefig("best_picture_budget.png", dpi=150)

3. The output.

The data confirms that Oppenheimer was a major break from the norm. Its $100 million budget doesn’t reach the heights of the 1990s and 2000s, but it is the most expensive Best Picture winner in 17 years. It was a rare intersection of blockbuster filmmaking and critical praise.

I’m not sure what to make of the peak centered around Titanic. It could reflect a tendency toward excess in the post-Cold-War, high-growth, end of history 90s. I’m tempted to read the 2008–2023 small-budget era as a post-Great-Recession attitude shift. Maybe Academy voters wanted to project frugality and distance themselves from broadly popular spectacles like The Departed. Or maybe a new generation of voters who value more “serious” films replaced an older one. I would guess it’s a combination of factors, given how strong the trend has been.

We’ll have to wait and see if Oppenheimer ushers in a new era of higher-budget Best Picture success, or if it remains an exception to the rule.


Download the data.

Full code:

import pandas as pd
from cpi import inflate
import matplotlib.pyplot as plt


df = pd.read_csv("best_picture_data.csv")

df.loc[:, 'budget_cpi'] = df.apply(lambda row: inflate(row['budget'], row['ceremony'] - 1, 2023), axis=1)

df['budget_cpi_5year'] = df['budget_cpi'].rolling(window=5, min_periods=4).mean()

plt.style.use("oscars.mplstyle")
fig, ax = plt.subplots()

ax.bar(df['ceremony'], df['budget_cpi'], zorder=1)
ax.plot(df['ceremony'][4:], df['budget_cpi_5year'][4:], color="#DDDDDD", label="5-Year Average", zorder=2)

x_ticks = range(1925, 2030, 5)
ax.set_xticks(x_ticks)
x_left, x_right = 1923.5, 2026.5
ax.set_xlim(x_left, x_right)
plt.setp(ax.xaxis.get_majorticklabels(), rotation=60, ha="right", rotation_mode="anchor")

y_ticks = range(0, 450000000, 50000000)
ax.set_yticks(y_ticks)
y_tick_labels = [f"${n / 1000000:.0f}M" for n in y_ticks]
ax.set_yticklabels(y_tick_labels)
y_bottom, y_top = 0, y_ticks[-1] * 1.01
ax.set_ylim(y_bottom, y_top)

ax.set_ylabel("Budget  (2023 Dollars)")

ax.set_title("Academy Awards  •  Best Picture Winner  •  Budget  •  Inflation Adjusted")

ax.legend(loc="upper left", labelcolor="#EEEEEE")

x_span = x_right - x_left
y_span = y_top - y_bottom
ax.text(x_right - x_span * 0.005, y_ticks[-1] - y_span * 0.005,
        "Values adjusted using CPI inflation data.",
        size=9, ha="right", va="top")

highlight_coords = [("Gone with the Wind", 1938, 125, "center"),
                    ("Ben-Hur", 1959, 195, "right"),
                    ("Lawrence of Arabia", 1963, 230, "center"),
                    ("My Fair Lady", 1965, 200, "left"),
                    ("Forrest Gump", 1993, 135, "right"),
                    ("Braveheart", 1995, 175, "right"),
                    ("Titanic", 1996, 388, "right"),
                    ("Gladiator", 2000, 240, "left"),
                    ("LOTR: Return of the King", 2003, 205, "left"),
                    ("The Departed", 2008, 170, "left"),
                    ("Argo", 2013, 95, "center"),
                    ("Oppenheimer", 2024, 140, "right")]

for text, x_pos, y_pos, align in highlight_coords:
    for r, row in df.iterrows():
        if row['film'] == text:
            ax.annotate(row['film'], ha=align, size=10,
                        xy=(row['ceremony'], row['budget_cpi'] + y_span * 0.008), xytext=(x_pos, y_pos * 1e6),
                        arrowprops={"arrowstyle": "wedge", "color": "#EEEEEE"})

plt.savefig("best_picture_budget.png", dpi=150)