{"id":968,"date":"2022-11-08T10:35:15","date_gmt":"2022-11-08T16:35:15","guid":{"rendered":"https:\/\/wollen.org\/blog\/?p=968"},"modified":"2025-03-04T12:20:20","modified_gmt":"2025-03-04T18:20:20","slug":"predicting-the-2022-midterm-elections-cook","status":"publish","type":"post","link":"https:\/\/wollen.org\/blog\/2022\/11\/predicting-the-2022-midterm-elections-cook\/","title":{"rendered":"Predicting the 2022 Midterm Elections with Cook Race Ratings"},"content":{"rendered":"<p>Today is Election Day in the United States and I thought it would be fun to dip my toes into election forecasting. I&#8217;ll build a very simple model based on race ratings published by <a href=\"https:\/\/www.cookpolitical.com\/ratings\/house-race-ratings\" target=\"_blank\" rel=\"noopener\">The Cook Political Report<\/a>. I&#8217;ll check how accurate Cook&#8217;s ratings have been in the past and use that information to predict the 2022 Congressional elections.<\/p>\n<p>We can take all the ratings and boil them down to a single number\u2014the number everyone cares about\u2014how many seats each party will win.<\/p>\n<hr \/>\n<h4>1. Check Cook&#8217;s Historical Accuracy.<\/h4>\n<p>I have a dataset (<a href=\"https:\/\/www.kaggle.com\/datasets\/cviaxmiwnptr\/us-house-cook-ratings-election-results-20022018\" target=\"_blank\" rel=\"noopener\">available on Kaggle<\/a>) that will make things easy for us. It doesn&#8217;t cover every rating but there are more than 2000, which should provide a reasonable estimate of Cook&#8217;s performance.<\/p>\n<p>Start by reading the dataset with <em>pandas<\/em> and dropping any irrelevant bits.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">df = pd.read_csv(\"2002-2018_house_election_ratings_results.csv\")\r\n\r\ndf = df.dropna(subset=[\"cook_rating\"])\r\ndf = df.drop(df[~df.winner.isin([\"d\", \"r\"])].index)\r\ndf = df.drop(df[df.cook_rating == \"tossup\"].index)<\/pre>\n<p>Ratings are absent pre-2008 so we can drop those rows. There are a couple races involving Independent candidates but not enough to significantly impact the measurement. We&#8217;ll also ignore any races classified as &#8220;tossup,&#8221; which we&#8217;ll later divvy up 50-50.<\/p>\n<p>With our dataset now trimmed to the relevant 2,427 rows, we&#8217;ll create a column to represent whether each prediction was <em>True<\/em> or <em>False<\/em>.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">def check_prediction(row):\r\n    confidence, prediction = row.cook_rating.split(\"-\")\r\n    if row.winner == prediction:\r\n        return True\r\n    else:\r\n        return False\r\n\r\n\r\ndf.loc[:, \"correct\"] = df.apply(check_prediction, axis=1)<\/pre>\n<p>The ratings are coded as strings, e.g. &#8220;likely-r&#8221;, &#8220;lean-d&#8221;, etc. We&#8217;ll use an <code>apply<\/code> and pass each row through a function. This is generally an inefficient way of doing things in <em>pandas<\/em>, but with such a small dataset, readability can sometimes outweigh performance losses.<\/p>\n<p>Finally check how often each confidence level (<em>solid<\/em>, <em>likely<\/em>, and <em>lean<\/em>) was correct.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">df_solid = df[df.cook_rating.isin([\"solid-d\", \"solid-r\"])]\r\nsolid_accuracy = df_solid[df_solid.correct].shape[0] \/ df_solid.shape[0]\r\n\r\ndf_likely = df[df.cook_rating.isin([\"likely-d\", \"likely-r\"])]\r\nlikely_accuracy = df_likely[df_likely.correct].shape[0] \/ df_likely.shape[0]\r\n\r\ndf_lean = df[df.cook_rating.isin([\"lean-d\", \"lean-r\"])]\r\nlean_accuracy = df_lean[df_lean.correct].shape[0] \/ df_lean.shape[0]<\/pre>\n<p>We find:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">Solid:  99.95%\r\nLikely: 98.47%\r\nLean:   92.35%<\/pre>\n<p>Cook seems to know what they&#8217;re doing! Of course it helps to simply toss all the close races into a &#8220;tossup&#8221; bin, but they do deserve some credit.<\/p>\n<h4>2. Simulate the Election.<\/h4>\n<p>Now we can put these figures into action. Let&#8217;s boil the race ratings down to a single numerical prediction.<\/p>\n<p>We&#8217;ll do this by generating a random number between 0 and 1 to represent each individual race. For example, &#8220;lean&#8221; ratings are correct about 92% of the time. So if the associated random number is 0.57, i.e. below 0.92, we call it a correct prediction. On the other hand, 8% of the time it will be an incorrect prediction, which matches the accuracy calculated above.<\/p>\n<p>Repeat this process 10,000 times and see how it shakes out.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">def get_random_race_result(cook_prediction, thresholds_dict):\r\n    level, party = cook_prediction.split(\"-\")\r\n    if random() &lt; thresholds_dict[level]:\r\n        return party\r\n    else:\r\n        return {\"r\": \"d\", \"d\": \"r\"}[party]\r\n\r\n\r\naccuracy_dict = {\"solid\": solid_accuracy,\r\n                 \"likely\": likely_accuracy,\r\n                 \"lean\": lean_accuracy}\r\n\r\ncook_final_predictions = {\"solid-d\": 159,\r\n                          \"likely-d\": 13,\r\n                          \"lean-d\": 15,\r\n                          \"lean-r\": 13,\r\n                          \"likely-r\": 11,\r\n                          \"solid-r\": 188}\r\n\r\nsimulated_output = []\r\n\r\nfor _ in range(10000):\r\n    house_seats = [\"d\"] * 18 + [\"r\"] * 18\r\n\r\n    for item in cook_final_predictions:\r\n        for _ in range(cook_final_predictions[item]):\r\n            house_seats.append(get_random_race_result(item, accuracy_dict))\r\n\r\n    simulated_output.append(house_seats.count(\"d\"))<\/pre>\n<p>I want to mention an important caveat here\u2014one that has gotten election forecasting into trouble in the past. We assume that every race is a random, independent event. But in reality the errors tend to be correlated. For example if Republicans exceed expectations in Ohio, they&#8217;re likely to exceed them in Virginia as well. I&#8217;m happy to ignore this complexity and present my humble blog model for entertainment purposes only.<\/p>\n<p>With 10,000 simulations in hand, let&#8217;s check the average result and call it a day.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">predicted_d_seats = mean(simulated_output)\r\npredicted_r_seats = 435 - predicted_d_seats<\/pre>\n<p>The output:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">D seats: 204.8\r\nR seats: 230.2<\/pre>\n<p>So the most likely outcome, based on Cook&#8217;s historical accuracy, is Republicans winning a 230-205 majority. We say nothing about the margin of error on these predictions and that&#8217;s okay.<\/p>\n<p>Repeating the same process for Senate elections (using <a href=\"https:\/\/www.kaggle.com\/datasets\/cviaxmiwnptr\/us-senate-cook-rating-election-results-19762018\" target=\"_blank\" rel=\"noopener\">this dataset<\/a>), the model finds Republicans most likely to win a 51-49 majority.<\/p>\n<hr \/>\n<div style=\"margin-left: 15%; margin-right: 15%;\"><strong>Update (December 7th, 2022):<\/strong> After the Georgia Senate Runoff Election we now have final results for both chambers. Republicans won a 222-213 majority in the House, and Democrats won 51-49 in the Senate. So our model did pretty well in the House by predicting a modest GOP victory. However it overestimated Republican gains in the Senate by two seats\u2014enough to swing majority control.<\/div>\n<hr \/>\n<p><strong>Full code:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">import pandas as pd\r\nfrom random import random\r\nfrom numpy import mean\r\n\r\n\r\ndef check_prediction(row):\r\n    confidence, prediction = row.cook_rating.split(\"-\")\r\n    if row.winner == prediction:\r\n        return True\r\n    else:\r\n        return False\r\n\r\n\r\ndef get_random_race_result(cook_prediction, thresholds_dict):\r\n    level, party = cook_prediction.split(\"-\")\r\n    if random() &lt; thresholds_dict[level]:\r\n        return party\r\n    else:\r\n        return {\"r\": \"d\", \"d\": \"r\"}[party]\r\n\r\n\r\ndf = pd.read_csv(\"2002-2018_house_election_ratings_results.csv\")\r\n\r\ndf = df.dropna(subset=[\"cook_rating\"])\r\ndf = df.drop(df[~df.winner.isin([\"d\", \"r\"])].index)\r\ndf = df.drop(df[df.cook_rating == \"tossup\"].index)\r\n\r\ndf.loc[:, \"correct\"] = df.apply(check_prediction, axis=1)\r\n\r\ndf_solid = df[df.cook_rating.isin([\"solid-d\", \"solid-r\"])]\r\nsolid_accuracy = df_solid[df_solid.correct].shape[0] \/ df_solid.shape[0]\r\n\r\ndf_likely = df[df.cook_rating.isin([\"likely-d\", \"likely-r\"])]\r\nlikely_accuracy = df_likely[df_likely.correct].shape[0] \/ df_likely.shape[0]\r\n\r\ndf_lean = df[df.cook_rating.isin([\"lean-d\", \"lean-r\"])]\r\nlean_accuracy = df_lean[df_lean.correct].shape[0] \/ df_lean.shape[0]\r\n\r\nprint(f\"Solid:  {solid_accuracy * 100:.2f}%\")\r\nprint(f\"Likely: {likely_accuracy * 100:.2f}%\")\r\nprint(f\"Lean:   {lean_accuracy * 100:.2f}%\")\r\n\r\naccuracy_dict = {\"solid\": solid_accuracy,\r\n                 \"likely\": likely_accuracy,\r\n                 \"lean\": lean_accuracy}\r\n\r\ncook_final_predictions = {\"solid-d\": 159,\r\n                          \"likely-d\": 13,\r\n                          \"lean-d\": 15,\r\n                          \"lean-r\": 13,\r\n                          \"likely-r\": 11,\r\n                          \"solid-r\": 188}\r\n\r\nsimulated_output = []\r\n\r\nfor _ in range(10000):\r\n    house_seats = [\"d\"] * 18 + [\"r\"] * 18\r\n\r\n    for item in cook_final_predictions:\r\n        for _ in range(cook_final_predictions[item]):\r\n            house_seats.append(get_random_race_result(item, accuracy_dict))\r\n\r\n    simulated_output.append(house_seats.count(\"d\"))\r\n\r\npredicted_d_seats = mean(simulated_output)\r\npredicted_r_seats = 435 - predicted_d_seats\r\n\r\nprint(f\"D seats: {predicted_d_seats:.1f}\")\r\nprint(f\"R seats: {predicted_r_seats:.1f}\")<\/pre>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Today is Election Day in the United States and I thought it would be fun to dip my toes into election forecasting. I&#8217;ll build a very simple model based on race ratings published by The Cook Political Report. I&#8217;ll check<\/p>\n","protected":false},"author":1,"featured_media":979,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[238,41],"tags":[91,191,193,22,189,185,188,92,123,186,30,187,25,192,42,190,93],"class_list":["post-968","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-government","category-probability","tag-congress","tag-cook","tag-cook-political-report","tag-data","tag-democrat","tag-election","tag-house","tag-house-of-representatives","tag-kaggle","tag-midterms","tag-pandas","tag-politics","tag-python","tag-race-ratings","tag-random","tag-republican","tag-senate"],"_links":{"self":[{"href":"https:\/\/wollen.org\/blog\/wp-json\/wp\/v2\/posts\/968","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/wollen.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wollen.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/wollen.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/wollen.org\/blog\/wp-json\/wp\/v2\/comments?post=968"}],"version-history":[{"count":17,"href":"https:\/\/wollen.org\/blog\/wp-json\/wp\/v2\/posts\/968\/revisions"}],"predecessor-version":[{"id":2235,"href":"https:\/\/wollen.org\/blog\/wp-json\/wp\/v2\/posts\/968\/revisions\/2235"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/wollen.org\/blog\/wp-json\/wp\/v2\/media\/979"}],"wp:attachment":[{"href":"https:\/\/wollen.org\/blog\/wp-json\/wp\/v2\/media?parent=968"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wollen.org\/blog\/wp-json\/wp\/v2\/categories?post=968"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wollen.org\/blog\/wp-json\/wp\/v2\/tags?post=968"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}