{"id":459,"date":"2016-03-01T23:36:37","date_gmt":"2016-03-01T23:36:37","guid":{"rendered":"http:\/\/www.jaijuneja.com\/blog\/?p=459"},"modified":"2025-02-26T12:28:25","modified_gmt":"2025-02-26T12:28:25","slug":"predicting-primaries-us-election-story-told-data","status":"publish","type":"post","link":"https:\/\/www.jaijuneja.com\/blog\/2016\/03\/predicting-primaries-us-election-story-told-data\/","title":{"rendered":"Predicting the Primaries: The US Election Story As Told by Data"},"content":{"rendered":"<p><img loading=\"lazy\" decoding=\"async\" width=\"595\" height=\"335\" class=\"aligncenter size-full wp-image-519\" src=\"http:\/\/www.jaijuneja.com\/blog\/wp-content\/uploads\/2016\/03\/trump.jpg\" alt=\"Donald Trump\" srcset=\"https:\/\/www.jaijuneja.com\/blog\/wp-content\/uploads\/2016\/03\/trump.jpg 595w, https:\/\/www.jaijuneja.com\/blog\/wp-content\/uploads\/2016\/03\/trump-300x169.jpg 300w\" sizes=\"auto, (max-width: 595px) 100vw, 595px\" \/><\/p>\n<p>Everyone in media wants a piece of the US election pie. On both ends of the political spectrum, traditional party politics are being rocked by new personalities. Voters frustrated with the establishment see this as an opportunity for protest. On the left they&#8217;re Feeling the Bern and on the right they&#8217;re hoping to Make America Great Again, but in both cases voters have expressed a cocktail of emotions from anger to optimism.<\/p>\n<p>Front and centre of this circus act sits an orange caricature. To cynics, Donald Trump is a parody of himself, a charlatan who wins support by pandering to public fear. To supporters, he represents the anti-establishment and has the ability to &#8220;drain the swamp&#8221; from within. And while the media fumbles over his unexpected rise, the rest of the world is watching in shock, horror and awe as the prospect of President Trump becomes achingly real. At the surface, it&#8217;s easy to condemn the media for falling prey to his shallow tactics; but deep down I&#8217;m relishing every bite of this amuse-bouche under the naive assumption that it won&#8217;t last.<\/p>\n<p>However, I&#8217;m not here to push my opinion. Instead, I want to uncover what the numbers tell us: who are the likely candidates for each party? Are the pundits&#8217; claims backed by hard statistics? <strong>What insights can we achieve with voting data and a simple predictive model?\u00a0<em>Let&#8217;s find out&#8230;<\/em><\/strong><!--more--><\/p>\n<h2>The Data<\/h2>\n<p>With\u00a0the primaries\/caucuses of the four carve-out states just passed (Iowa, New Hampshire, South Carolina and Nevada), we have a rich <a href=\"https:\/\/www.kaggle.com\/benhamner\/2016-us-election\">set of county-level voting data<\/a>\u00a0for each of the candidates. Using US census records, we can associate each of these counties with demographic features (such as race, education, income and homeownership) which we assume are adequate predictors of voting outcome. Ultimately, the results of our election predictor are only as good as this assumption.<\/p>\n<p>With that in mind, here&#8217;s a sample of what our data looks like:<\/p>\n<p>[easytable style=&#8221;white-space:nowrap;font-size:13px;&#8221;]<br \/>\nstate,county,party,candidate,votes,fraction_votes,income,black,hispanic,bilingual,senior,college,homeownership,firms<br \/>\nIowa,Adair,Republican,Donald Trump,104,0.256,47892,0.4,1.7,1.1,22.1,16.3,77.1,0.098<br \/>\nIowa,Adams,Republican,Donald Trump,68,0.249,45871,0.4,1.1,1.2,22,13.7,78.5,0.108<br \/>\nIowa,Allamakee,Republican,Donald Trump,193,0.281,48831,1.4,5.7,8,21.3,14.9,79.5,0.114<br \/>\nIowa,Appanoose,Republican,Donald Trump,292,0.348,39208,0.7,1.5,2.3,21.4,18.3,72.5,0.11<br \/>\nIowa,Audubon,Republican,Donald Trump,99,0.265,48313,0.4,1.1,1.2,24.3,16.6,80.4,0.074<br \/>\nIowa,Benton,Republican,Donald Trump,410,0.251,56669,0.6,1.3,2,16.9,18.8,80.4,0.091<br \/>\n[\/easytable]<\/p>\n<p>For each candidate we have the number and fraction of votes they received as well as\u00a0eight representative features per county:<\/p>\n<ol>\n<li>Median household income<\/li>\n<li>% black population<\/li>\n<li>% hispanic population<\/li>\n<li>% population speaking more than one language at home<\/li>\n<li>% population over 65 years<\/li>\n<li>% population with a bachelor&#8217;s degree or higher<\/li>\n<li>Homeownership rate<\/li>\n<li># of firms per capita<\/li>\n<\/ol>\n<h2>Initial Findings<\/h2>\n<p>A quick run through the data reveals some of the\u00a0correlations that our model will rely on. If we examine the counties where different candidates have won, Marco Rubio and Bernie Sanders attract higher income voters. Trump wins in\u00a0lower income counties\u00a0overall, but seems to appeal to a broader range of voters. <em>[For those wondering how the chart works, the boxed area represents the interquartile range while the lines represent the min\/max range <a href=\"https:\/\/en.wikipedia.org\/wiki\/Box_plot\">#GCSEmaths<\/a>].<\/em><br \/>\n<div class=\"visualizer-front-container\" id=\"chart_wrapper_visualizer-500-591684198\"><style type=\"text\/css\" name=\"visualizer-custom-css\" id=\"customcss-visualizer-500\">.locker,.locker-loader{position:absolute;top:0;left:0;width:100%;height:100%}.locker{z-index:1000;opacity:.8;background-color:#fff;-ms-filter:\"progid:DXImageTransform.Microsoft.Alpha(Opacity=80)\";filter:alpha(opacity=80)}.locker-loader{z-index:1001;background:url(https:\/\/www.jaijuneja.com\/blog\/wp-content\/plugins\/visualizer\/images\/ajax-loader.gif) no-repeat center center}.dt-button{display:none!important}.visualizer-front-container.visualizer-lazy-render{content-visibility: auto;}.google-visualization-controls-categoryfilter label.google-visualization-controls-label {vertical-align: middle;}.google-visualization-controls-categoryfilter li.goog-inline-block {margin: 0 0.2em;}.google-visualization-controls-categoryfilter li {padding: 0 0.2em;}.visualizer-front-container .dataTables_scrollHeadInner{margin: 0 auto;}<\/style><div id=\"visualizer-500-591684198\" class=\"visualizer-front  visualizer-front-500\"><\/div><!-- Not showing structured data for chart 500 because description is empty --><\/div><\/p>\n<p>Diving a bit deeper, Rubio dominates in the few states with the highest rates of college education (bachelor&#8217;s degree or higher), while Trump and Cruz jockey for position among the remaining demographic.<br \/>\n<div class=\"visualizer-front-container\" id=\"chart_wrapper_visualizer-498-477540178\"><style type=\"text\/css\" name=\"visualizer-custom-css\" id=\"customcss-visualizer-498\">.locker,.locker-loader{position:absolute;top:0;left:0;width:100%;height:100%}.locker{z-index:1000;opacity:.8;background-color:#fff;-ms-filter:\"progid:DXImageTransform.Microsoft.Alpha(Opacity=80)\";filter:alpha(opacity=80)}.locker-loader{z-index:1001;background:url(https:\/\/www.jaijuneja.com\/blog\/wp-content\/plugins\/visualizer\/images\/ajax-loader.gif) no-repeat center center}.dt-button{display:none!important}.visualizer-front-container.visualizer-lazy-render{content-visibility: auto;}.google-visualization-controls-categoryfilter label.google-visualization-controls-label {vertical-align: middle;}.google-visualization-controls-categoryfilter li.goog-inline-block {margin: 0 0.2em;}.google-visualization-controls-categoryfilter li {padding: 0 0.2em;}.visualizer-front-container .dataTables_scrollHeadInner{margin: 0 auto;}<\/style><div id=\"visualizer-498-477540178\" class=\"visualizer-front  visualizer-front-498\"><\/div><!-- Not showing structured data for chart 498 because description is empty --><\/div><\/p>\n<p>A look at race demographics further reveals Trump&#8217;s widespread dominance, even winning in counties with the largest black and Hispanic populations. It&#8217;s worth caveating that this does not necessarily imply that black and Hispanic people are voting for Trump, but rather that he has won counties where such populations exist. Given his political agenda, there may be other hidden dynamics at play &#8211; for example, the Republican voting population may be relatively small and predominantly white. This theory has some basis given that caucus and primary voting is typically restricted\u00a0to registered party members, which in the case of Republicans is white-dominated.<br \/>\n<div class=\"visualizer-front-container\" id=\"chart_wrapper_visualizer-502-1745727302\"><style type=\"text\/css\" name=\"visualizer-custom-css\" id=\"customcss-visualizer-502\">.locker,.locker-loader{position:absolute;top:0;left:0;width:100%;height:100%}.locker{z-index:1000;opacity:.8;background-color:#fff;-ms-filter:\"progid:DXImageTransform.Microsoft.Alpha(Opacity=80)\";filter:alpha(opacity=80)}.locker-loader{z-index:1001;background:url(https:\/\/www.jaijuneja.com\/blog\/wp-content\/plugins\/visualizer\/images\/ajax-loader.gif) no-repeat center center}.dt-button{display:none!important}.visualizer-front-container.visualizer-lazy-render{content-visibility: auto;}.google-visualization-controls-categoryfilter label.google-visualization-controls-label {vertical-align: middle;}.google-visualization-controls-categoryfilter li.goog-inline-block {margin: 0 0.2em;}.google-visualization-controls-categoryfilter li {padding: 0 0.2em;}.visualizer-front-container .dataTables_scrollHeadInner{margin: 0 auto;}<\/style><div id=\"visualizer-502-1745727302\" class=\"visualizer-front  visualizer-front-502\"><\/div><!-- Not showing structured data for chart 502 because description is empty --><\/div><\/p>\n<p>Among the Democrats, Clinton won overwhelmingly in black counties. It&#8217;s a striking chart, but one that has been reaffirmed by other news sources, which cite Clinton as securing <a href=\"http:\/\/www.pbs.org\/newshour\/updates\/how-clinton-won-the-black-vote-in-south-carolina\/\">86% of African-American votes<\/a> vs. 16% for Sanders in the latest South Carolina primary.<br \/>\n<div class=\"visualizer-front-container\" id=\"chart_wrapper_visualizer-495-1141029629\"><style type=\"text\/css\" name=\"visualizer-custom-css\" id=\"customcss-visualizer-495\">.locker,.locker-loader{position:absolute;top:0;left:0;width:100%;height:100%}.locker{z-index:1000;opacity:.8;background-color:#fff;-ms-filter:\"progid:DXImageTransform.Microsoft.Alpha(Opacity=80)\";filter:alpha(opacity=80)}.locker-loader{z-index:1001;background:url(https:\/\/www.jaijuneja.com\/blog\/wp-content\/plugins\/visualizer\/images\/ajax-loader.gif) no-repeat center center}.dt-button{display:none!important}.visualizer-front-container.visualizer-lazy-render{content-visibility: auto;}.google-visualization-controls-categoryfilter label.google-visualization-controls-label {vertical-align: middle;}.google-visualization-controls-categoryfilter li.goog-inline-block {margin: 0 0.2em;}.google-visualization-controls-categoryfilter li {padding: 0 0.2em;}.visualizer-front-container .dataTables_scrollHeadInner{margin: 0 auto;}<\/style><div id=\"visualizer-495-1141029629\" class=\"visualizer-front  visualizer-front-495\"><\/div><!-- Not showing structured data for chart 495 because description is empty --><\/div><\/p>\n<h2>Predicting Voting Outcomes<\/h2>\n<p>Our predictive model will use the implied insights above &#8211; such as Rubio&#8217;s relative popularity among high income, college educated people &#8211; to project winners in each of the remaining states. We&#8217;ll use a <strong>random forest classifier<\/strong> to achieve this. In simple terms, a random forest classifier constructs a set of &#8220;decision trees&#8221; that relate each of our demographic features to a voting outcome (i.e. a candidate). You can think of it as a flow chart that asks a series of yes\/no questions about the features (e.g. is median income higher than X?), descending the appropriate branch of the tree after each question until it reaches the leaf (the winning candidate).<\/p>\n<p>Traditional decision tree-based learning algorithms tend to &#8220;overfit&#8221; to their training data. What this\u00a0means is that decision trees are often\u00a0grown very deep to fit to\u00a0irregular patterns in the data. This makes them good at matching to the original dataset, but bad at predicting outcomes with\u00a0new data. Consequently,\u00a0random forest classifiers\u00a0<em>randomly select<\/em>\u00a0a subset of data points and features to construct an ensemble of decision trees. When the prediction algorithm is run, it chooses the outcome that is most commonly output by the various decision trees (i.e. the <em>mode<\/em>). This has been found to reduce the problem of overfitting.<\/p>\n<h2>And The Winner Is&#8230;<\/h2>\n<p><strong>Hillary Clinton<\/strong> wins for the Democrats, with a notable East-West divide. Clinton wins 31 states including her home of New York versus 19 for Sanders.<\/p>\n<div style=\"text-align:center; position: relative; top: 10px; z-index: 10;\">\n<div style=\"display:inline-block;height:10px;width:10px;background-color:#1e73be;\"><\/div>\n<div style=\"display:inline-block;height:10px;width:50px;font-size:13px;\">Clinton<\/div>\n<div style=\"display:inline-block;height:10px;width:10px;background-color:#dd3333;\"><\/div>\n<div style=\"display:inline-block;height:10px;width:50px;font-size:13px;\">Sanders<\/div>\n<\/div>\n<div class=\"visualizer-front-container\" id=\"chart_wrapper_visualizer-531-819493719\"><style type=\"text\/css\" name=\"visualizer-custom-css\" id=\"customcss-visualizer-531\">.locker,.locker-loader{position:absolute;top:0;left:0;width:100%;height:100%}.locker{z-index:1000;opacity:.8;background-color:#fff;-ms-filter:\"progid:DXImageTransform.Microsoft.Alpha(Opacity=80)\";filter:alpha(opacity=80)}.locker-loader{z-index:1001;background:url(https:\/\/www.jaijuneja.com\/blog\/wp-content\/plugins\/visualizer\/images\/ajax-loader.gif) no-repeat center center}.dt-button{display:none!important}.visualizer-front-container.visualizer-lazy-render{content-visibility: auto;}.google-visualization-controls-categoryfilter label.google-visualization-controls-label {vertical-align: middle;}.google-visualization-controls-categoryfilter li.goog-inline-block {margin: 0 0.2em;}.google-visualization-controls-categoryfilter li {padding: 0 0.2em;}.visualizer-front-container .dataTables_scrollHeadInner{margin: 0 auto;}<\/style><div id=\"visualizer-531-819493719\" class=\"visualizer-front  visualizer-front-531\"><\/div><!-- Not showing structured data for chart 531 because title is empty --><\/div>\n<p style=\"margin-top: 15px;\">And a landslide victory for <strong>Donald Trump<\/strong> among the Republicans. A tad exaggerated, perhaps? Only time will tell!<\/p>\n<div style=\"text-align:center; position: relative; top: 10px; z-index: 10;\">\n<div style=\"display:inline-block;height:10px;width:10px;background-color:#1e73be;\"><\/div>\n<div style=\"display:inline-block;height:10px;width:50px;font-size:13px;\">Trump<\/div>\n<div style=\"display:inline-block;height:10px;width:10px;background-color:#dd3333;\"><\/div>\n<div style=\"display:inline-block;height:10px;width:50px;font-size:13px;\">Rubio<\/div>\n<div style=\"display:inline-block;height:10px;width:10px;background-color: #ff9900;\"><\/div>\n<div style=\"display:inline-block;height:10px;width:50px;font-size:13px;\">Cruz<\/div>\n<\/div>\n<div class=\"visualizer-front-container\" id=\"chart_wrapper_visualizer-534-958941352\"><style type=\"text\/css\" name=\"visualizer-custom-css\" id=\"customcss-visualizer-534\">.locker,.locker-loader{position:absolute;top:0;left:0;width:100%;height:100%}.locker{z-index:1000;opacity:.8;background-color:#fff;-ms-filter:\"progid:DXImageTransform.Microsoft.Alpha(Opacity=80)\";filter:alpha(opacity=80)}.locker-loader{z-index:1001;background:url(https:\/\/www.jaijuneja.com\/blog\/wp-content\/plugins\/visualizer\/images\/ajax-loader.gif) no-repeat center center}.dt-button{display:none!important}.visualizer-front-container.visualizer-lazy-render{content-visibility: auto;}.google-visualization-controls-categoryfilter label.google-visualization-controls-label {vertical-align: middle;}.google-visualization-controls-categoryfilter li.goog-inline-block {margin: 0 0.2em;}.google-visualization-controls-categoryfilter li {padding: 0 0.2em;}.visualizer-front-container .dataTables_scrollHeadInner{margin: 0 auto;}<\/style><div id=\"visualizer-534-958941352\" class=\"visualizer-front  visualizer-front-534\"><\/div><!-- Not showing structured data for chart 534 because title is empty --><\/div>\n<p style=\"margin-top: 15px;\"><strong>Some caveats:<\/strong> As mentioned earlier, these predictions are only as good as the assumptions upon which they are based. In this case, we assume that demographic features are universally strong predictors of voting outcome. There&#8217;s also the simplification of the presidential nomination being based on votes, when in reality\u00a0it&#8217;s based on the number of delegates assigned to each candidate. Finally, we face the typical problem of limited data: in some instances (e.g. Rubio), the model had little information to go on given the small number of wins, while in others (e.g. Trump) it may be exaggerating future success.<\/p>\n<p>In any case Super Tuesday is today, so we&#8217;ll see how the model fares!<\/p>\n<p><em>If you enjoyed this article then please like and share! \ud83d\ude42<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Everyone in media wants a piece of the US election pie. On both ends of the political spectrum, traditional party politics are being rocked by new personalities. Voters frustrated with the establishment see this as an opportunity for protest. On the left they&#8217;re Feeling the Bern and on the right they&#8217;re hoping to Make America [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"slim_seo":{"title":"Predicting the Primaries: The US Election Story As Told by Data - Jai&#039;s Awesome Blog!","description":"Everyone in media wants a piece of the US election pie. On both ends of the political spectrum, traditional party politics are being rocked by new personalities"},"footnotes":""},"categories":[6],"tags":[],"class_list":["post-459","post","type-post","status-publish","format-standard","hentry","category-programming"],"_links":{"self":[{"href":"https:\/\/www.jaijuneja.com\/blog\/wp-json\/wp\/v2\/posts\/459","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.jaijuneja.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.jaijuneja.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.jaijuneja.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.jaijuneja.com\/blog\/wp-json\/wp\/v2\/comments?post=459"}],"version-history":[{"count":75,"href":"https:\/\/www.jaijuneja.com\/blog\/wp-json\/wp\/v2\/posts\/459\/revisions"}],"predecessor-version":[{"id":554,"href":"https:\/\/www.jaijuneja.com\/blog\/wp-json\/wp\/v2\/posts\/459\/revisions\/554"}],"wp:attachment":[{"href":"https:\/\/www.jaijuneja.com\/blog\/wp-json\/wp\/v2\/media?parent=459"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.jaijuneja.com\/blog\/wp-json\/wp\/v2\/categories?post=459"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.jaijuneja.com\/blog\/wp-json\/wp\/v2\/tags?post=459"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}