Entertainment

Is Christmas music becoming more popular?

I’m not one of those people who makes hating Christmas music an important part of their personality, but I have to be in the right mood, and there quickly comes a time when I’ve had enough. I had a suspicion that Christmas music has crept further into mainstream culture. Let’s see what the data can tell us.

I found a great Christmas music dataset on Kaggle that merges a Billboard Hot 100 dataset with the Wikipedia page for popular Christmas singles. We can use it to visualize Christmas songs’ chart presence over time. Are they becoming more popular? The catch is that this dataset only covers 1958 through 2017, so we’ll miss the past few years, but it should be sufficient for our purposes.


1. Prepare the data.

Start by reading the dataset with pandas.

df = pd.read_csv("christmas_billboard_data.csv")

A full list of columns is below:

url
weekid
week_position
song
performer
songid
instance
previous_week_position
peak_position
weeks_on_chart
year
month
day

Our goal is to measure the presence of Christmas songs over time, which means we won’t have to worry about filtering. The dataset creator already did the hard work of removing non-Christmas music. We’ll just need to count rows.

The column we’re most interested in is week_position. This is a song’s ranking (1-100). I think grouping the data by year will be most appropriate to see trends over 60 years. The holiday season comes once a year, after all, so it makes sense to use the same period.

A lower ranking indicates higher popularity, so we should pass week_position through a function and invert it. Let’s call the new column popularity_index.

df.loc[:, "popularity_index"] = 101 - df.week_position

For example, topping the Hot 100 at #1 would be worth 100 points. Sneaking in at #100 would be worth just a point.

Obviously Popularity Index uses an arbitrary scale. I think we should take a cue from Google and the way they handle Google Trends data. Regardless of the absolute popularity of any search term, Google transforms the data to use a relative 0-100 scale. Every data point is plotted as a proportion of the series’ maximum value. In other words, peak popularity is always 100 and the rest of the data follows accordingly.

As an example, here’s the search term “bitcoin” over the last five years:

 

Notice the y-axis goes from 0 to 100. This is always the case with Google Trends plots. We can normalize Popularity Index the same way.

First use a pandas groupby to bundle together each year, then use a sum aggregate function on the popularity_index column.

df2 = df.groupby("year")["popularity_index"].sum().reset_index()

The new dataframe looks like this:

   year  popularity_index
0  1958               295
1  1959               442
2  1960              1332
3  1961              1459
4  1962              1662

But what do 295 or 442 mean? They represent the total presence of Christmas songs in a given year—but those values would seem so arbitrary on a plot.

Now we should normalize the data using a 0-100 scale. We’ll call it a Popularity Percentile. Each value is represented as a proportion of the largest value.

df2.loc[:, "popularity_percentile"] = df2.popularity_index / df2.popularity_index.max() * 100

2. Plot

I created a Christmas-themed Matplotlib style for the plot. It will be linked at the bottom of this post.

A few quick notes about the Matplotlib code:

  • We can pass dataframe columns directly into plotting methods.
  • Use set to cover several customizations with a single method.
  • We can add small inset images so the plot will be overflowing with Christmas cheer. How you wield such power is up to you and your feelings about Christmas music. Be sure to set annotation_clip=False so Matplotlib can draw outside the axes limits.
  • wollen_christmas.mplstyle uses a particular typeface I found online. I don’t want to distribute it but I’ll link it at the bottom of this post.
  • It’s good to increase dpi when saving raster images, especially those with small inset images.
plt.style.use("wollen_christmas.mplstyle")

fig, ax = plt.subplots()

ax.bar(df2["year"], df2["popularity_percentile"], width=0.7)

y_ticks = range(0, 110, 10)

ax.set(xticks=range(1955, 2025, 5), xlim=(1954, 2021),
       yticks=y_ticks, ylim=(-1, 101), yticklabels=[f"{n}%" for n in y_ticks], ylabel="Annual  Popularity",
       title="Christmas Song Popularity  |  Billboard Top 100  |  1958–2017")

inset_image_positions = [(1961.5, 105),
                         (1964.0, 105),
                         (2011.0, 105),
                         (2013.5, 105)]
for pos in inset_image_positions:
    ab = AnnotationBbox(OffsetImage(plt.imread("snowflake.png"), zoom=0.045),
                        pos, frameon=False, annotation_clip=False)
    ax.add_artist(ab)

plt.savefig("christmas_song_popularity.png", dpi=150)

The output:

To answer the original question: Yes, Christmas music has been gaining popularity in recent years—at least through 2017. It suffered through a 40-year arctic slumber but it has since regained some momentum.

My theory, for all it’s worth, is that Christmas music was genuinely more popular in the mid-20th century. There was less music in general to crowd it out. In the 21st century, significantly more music is being published. Technology makes it easier than ever for artists to get their songs onto listeners’ playlists, and audiences sort into smaller and smaller niches. Now the industry is so diverse that it requires music with an exceptionally broad appeal, like Christmas music, to gain traction on the Billboard Hot 100. Something like Mariah Carey’s infamously irritating All I Want for Christmas is You!

Or maybe modern Americans just love Christmas music as much as they do comic book movies. It’s hard to say.

Merry Christmas to those who celebrate! Happy Holidays to all! And to all a good night.


Santa’s Sleigh font.

Download the data.

Full Code:

import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.offsetbox import OffsetImage, AnnotationBbox


df = pd.read_csv("christmas_billboard_data.csv")

df.loc[:, "popularity_index"] = 101 - df.week_position

df2 = df.groupby("year")["popularity_index"].sum().reset_index()

df2.loc[:, "popularity_percentile"] = df2.popularity_index / df2.popularity_index.max() * 100

plt.style.use("wollen_christmas.mplstyle")

fig, ax = plt.subplots()

ax.bar(df2["year"], df2["popularity_percentile"], width=0.7)

y_ticks = range(0, 110, 10)

ax.set(xticks=range(1955, 2025, 5), xlim=(1954, 2021),
       yticks=y_ticks, ylim=(-1, 101), yticklabels=[f"{n}%" for n in y_ticks], ylabel="Annual  Popularity",
       title="Christmas Song Popularity  |  Billboard Top 100  |  1958–2017")

inset_image_positions = [(1961.5, 105),
                         (1964.0, 105),
                         (2011.0, 105),
                         (2013.5, 105)]

for pos in inset_image_positions:
    ab = AnnotationBbox(OffsetImage(plt.imread("snowflake.png"), zoom=0.045),
                        pos, frameon=False, annotation_clip=False)
    ax.add_artist(ab)

plt.savefig("christmas_song_popularity.png", dpi=150)