How the Polls Failed

Presidential Election of 2016: A Statistical Quagmire

Professor Jeffrey Simonoff and Professor Aaron Tenenbein, NYU Stern IOMS Department

The presidential election of 2016 was one for the history books. It has been labelled as the greatest upset in election history. The Showtime cable program “The Circus: Inside the Greatest Political Show on Earth” documents the election campaign of 2016 from its start at the Iowa Caucuses in January of 2016 to the last three days of the campaign plus Election Day. In the last episode, the three seasoned political commentators (Mark Halperin, John Heilemann, and Mark McKinnon) document the events on election night. The contrast in the changing moods in the Trump and Clinton camps was compelling. The Trump campaign headquarters at the New York Hilton hotel began the evening with what the commentators likened to an osteopath’s convention and progressed to a party atmosphere of complete euphoria as the election returns were coming in. On the other hand, the Clinton campaign headquarters at the Javits Center started with a very festive party atmosphere and progressed to a mood of utter dismay and devastation.

Most of the pollsters got it wrong. Nate Silver, who was very successful in predicting the results of the presidential elections of 2008 and 2012, had said in late October that the probability of Trump winning the presidential election was about the same as the Chicago Cubs winning the World Series (at a time the Cubs were trailing in the best-of-seven series 3 games to 2, with the final two games about to be played in Cleveland). He had the odds against both of these events happening at 3-1. Both events defied the odds.

(From left to right) The final 2016 election predictions from the Huffington Post, and the New York Times.

Why did the polls get it so wrong? Here are some salient points. First, we should note that the polls weren’t necessarily as wrong as it might seem – the national voting margin was predicted to be a 3-to-4 percentage point lead for Hillary Clinton, and it turns out that her margin was roughly 2 percentage points, well within any reasonable margin of error. Still, virtually no polls gave Donald Trump much of a chance to win the election.

In order to understand what went wrong, we first have to recognize that the structure of the Electoral College means that relatively small errors in polls can have disproportionately large effects on predictions of the outcome of a presidential election if they occur in crucial swing states, and that is exactly what happened this year. The three big states of California, Illinois, and New York overwhelmingly went to Clinton with a total of 104 electoral votes, resulting in the popular opinion that Trump could not win the election because he would have to run the table: he would have to carry the swing states of Florida, Iowa, Michigan, North Carolina, Ohio, Pennsylvania, and Wisconsin. Trump did win those states, giving him 114 electoral votes that made the difference in the election, and in all of these states the polls seriously underestimated support for Trump. Indeed, in three (Michigan, Pennsylvania, and Wisconsin) the polls had Clinton ahead, and if she had just taken all three of those states she would have won the election with 278 electoral votes.

How did Trump accomplish this feat? The evidence suggests that the Trump campaign used a lot of the elements of data science. Jared Kushner, Trump’s son-in-law, led this task by having a group of 100 people doing polling in a nondescript building outside of San Antonio, Texas.  Their analysis indicated that Trump had a chance to pull out upsets in states like Pennsylvania and Michigan. They advised Trump to make last minute trips to these states, which he did, and the effects of these trips were not reflected in polls that occurred before then.

A similar situation occurred in the 1948 presidential election. Harry Truman was behind Thomas Dewey in the polls, and all the polls predicted a Dewey win. Truman pulled out an upset over Thomas Dewey by travelling to many cities starting in September of 1948. At that time his method of travel was the railroad and his campaign was labeled the whistle-stop tour, during which he travelled over 30,000 miles. This election is remembered for a famous photograph of a smiling Harry Truman holding up a headline of the Chicago Daily Tribune stating “Dewey Defeats Truman” the day after Election Day.

What else went wrong with the polls, particularly in these swing states? We probably will not know for certain for a while, but evidence suggests a few possibilities.

It is important to recognize that polls can’t possibly answer the question we really care about (how will someone actually vote on Election Day) since they are taken before Election Day. The best that they can do is measure intent. Survey respondents who don’t actually vote don’t matter, so pollsters try to assess the likelihood of someone actually voting by asking, for example, if they voted in the previous election. That turned out to be ineffective in this election, since many people who supported and ultimately voted for Trump apparently didn’t vote in 2012 (since they didn’t like either President Obama or Mitt Romney), and were thus down-weighted in the polls. Polls that down-weighted less based on 2012 voting did better in their predictions, as it turns out that the Democrats were less successful in turning out their voters in the swing states than the Republicans were.

Polls also often weight their results based on demographics, trying to correct for differences between the makeup of their samples and the actual population, but that requires knowing which demographic characteristics matter. In this election it appears that Trump’s strong support among non-college educated whites carried him in these swing states, something pollsters did not account for. We can also note that these same problems occurred in all of these states, so the fact that the poll errors were all in the same direction is not very surprising; election results in nearby locations are correlated with each other, so the odds of this same pattern happening in 7 different states are a lot higher than those of flipping a coin 7 times and having it come up heads every time. Pollsters and pundits treated a Trump victory as unlikely because he had to take virtually all of the crucial swing states to do it, but once it became clear he was doing better than expected in one or two of these states, the chances of him doing better than expected in all of them rose considerably.

There is also some evidence that Trump supporters were less likely to honestly tell a telephone pollster who they were going to vote for. This response error or bias makes it virtually impossible to get an accurate assessment of voter beliefs, since the basic premise of polling is that people will be willing to tell anonymous strangers what they are thinking.

Another potential problem that has been emphasized for several years by Nate Silver is what he calls “herding,” the tendency of pollsters to suppress or manipulate results that are out of line with the consensus view. It will of course invalidate the attempts of poll aggregators like Silver’s to give accurate assessments of the uncertainty in poll results if more extreme results (particularly in one direction) are not shared. Silver has documented this problem in several elections in the past, and has speculated that it played a role here as well.

What can we glean from this situation?  Is this a onetime event which will not occur again?  We do not think so.  Other polls done recently have had similar problems. Examples include the 2014 Scottish referendum, the 2014 U.S. midterm elections, the 2015 U.K. general election, the 2015 Greek referendum, the 2015 Argentine presidential election, and the 2016 Brexit vote, to name a few.

The natural question is where do we go from here? Statisticians have known and worried about biases in samples for decades, and there are methods that can help cut into them. These methods tend to be more expensive and more difficult to implement than current methods. Hopefully, this will spur new research into making these methods less expensive and more practical to implement or to develop other less expensive and more practical methodologies.

Leave a Reply

Your email address will not be published. Required fields are marked *