This is the fifth piece in a series of op-eds about polling in the election. Read the previous piece here. John E. Newhagen is an associate professor emeritus at the University of Maryland’s Philip Merrill College of Journalism.
Post-election analysis is awash with commentary and analysis asking how public opinion polls could have gotten the outcome of the 2016 presidential election so wrong. But a closer look shows it was the journalists, aided and abetted by “expert” pollsters, who should take the fall for missing perhaps the biggest electoral upset in the history of the Union.
Pollsters claim they can offer accurate data about public opinion fast and cheap. That is, after all, what journalism is all about. That claim rests on the idea that a relatively small sample, if carefully selected, can be used to generalize to a much larger population. It’s a neat trick, if it works: talk to 1,200 voters on the telephone and predict what 129 million will actually do when they get to the polls
But just like a carpenter’s rule, a survey’s precision has limits. This is the lesson that journalists reporting the polls, and many consultants providing them data, seem to have missed in 2016. Many Americans thought Hillary Clinton would win the election based on a diet of daily news claiming poll results showed her in the lead. The New York Times predicted the odds of a Clinton victory were above 70 percent throughout the election. The Washington Post similarly gave Clinton a clear edge in the Electoral College right up to election night. The 2016 election was so charged with emotion that when the unexpected results did come in a fair number of citizens probably experienced an episode of Post-Traumatic Stress Disorder.
Maybe it’s time for an intervention.
The way polling data was reported went beyond a simple distraction and actually caused harm. This is just the opposite of journalism schools teach. Unlike hideous advertising and public relations hacks, the press defends and protects the public from harm.
But what happened? Even the Clinton’s internal polls must have been off – and those efforts were ambitious. The Washington Post reported she had a big data server powered by a secret algorithm named Ada that churned out 4,000 solutions a day. But the question here is the performance of public polls, the ones reported in main stream media and aggregated by realclearpolitics.com day after day.
A walk through the mine field
In the run-up to the election, MediaFile published four articles focusing on Cardinal Sins of Poll Analysis that can lead to dangerously flawed reporting. These sins are:
Looking closely at how a prototypical national public opinion survey is executed illuminates the catastrophe:
First, a sample is randomly drawn from a population defined as “eligible” or “registered” voters according to standard practice endorsed by the prominent professional organization, the Association for American for Public Opinion Research (APPOR).
Next, a computer dials the numbers on the list. If a real human picks up the phone, voice recognition software transfers the call to a trained interviewer who administers the questionnaire. The computer keeps dialing until interviewers entice about 1,200 people to take part. Cardinal Sin #2 comes into play at this point because it may take more than 10 calls to generate one complete interview, even after business exchanges are eliminated. That generates a sample with a confidence interval (C.I) of ± 3%.
After data collection is complete the pollsters face the scariest step in process, filtering their data set of 1,200 eligible voters down to who they believe will be likely to vote. Questions specifically intended to isolate “likely voters” are used to filter respondents resulting. That process usually leaves a subsample of about 650, but the C.I. then increases to ±4% or more. This last step is scary because it is normative; the decisions about who to leave in and who to filter out are not guided by statistics and have a palpable impact on results. Cardinal Sin #3 gets violated at this point.
A 1,200-respondent sample is widely used in national surveys because it strikes a tradeoff between three elements, listed in their unfortunate order of importance to contemporary journalism: cost, timeliness, and accuracy. Increasing sample size increases cost substantially and requires more time to execute, but it does improve accuracy, and vice versa.
Tradeoffs are conspicuous.
COST: The cost of generating a 1,200 respondent sample can run well into five figures and has increased as response rates have decline (see Cardinal Sin #2), according to fivethirtyeight.com . One interview will cost $25-$30 (depending on the caliber of the firm), according to quora.com . A national survey of 1,200 respondents that takes 15-20 minutes will cost roughly $30,000 to $35,000. That is a lot of money, especially if the news organization wants to refresh their data once every week or so.
TIME: Time becomes a factor because it takes five days to a week turn around survey, and timeliness is, after all, a hallmark of good journalism. That means most interviewing has to take place in a three-day time frame, which also constrains the number of completed questionnaires that even a state-of-the-art survey center can generate.
ACCURACY: Accuracy is a technical question bounded by the mathematics behind the confidence interval, which is the critical metric in determining how precise statistical data is. It turns out 1,200 is a magic number because the nonlinear relationship between sample size and confidence intervals. True, reducing the size of the survey to 500 saves time and money, but it also generates a C.I. of over ±5%, which is generally to coarse to be useful. At the other end of the scale, to generate C.I. of ±2% or less ups sample size to 2,400, which prohibitively expensive and time consuming. Remember that the final step in generating “likely voters” cuts the sample down to about 650 with a C.I. of ±4%.
And now for the final irony: The polls really were close
The popular vote
It would be technically impossible to have predicted the election’s outcome from the standard survey just described, and when journalism went ahead and did it anyway it violated Cardinal Sin #1. According to The Cook Political Report the vote tally as of Nov. 29 in the Presidential election gave Hillary Clinton 48.2 percent of the popular vote while Donald Trump won 46.4 percent.
The standard survey of 1,200 respondents, reduced to about 650 due to likely voter filters, would have a C.I. of ±4 percent. NO legitimate predication could have been made from that C.I. given the narrow margin between candidates. With Clinton’s “true” score of 48.2% the application of a ±4% C.I. would have predicted an estimate of her strength somewhere between 52.2% and 44.2%. Similarly, the ±4% C.I. applied to Trump’s “true” score of 46.4% would have predicted estimates between and 50.4% and 42.4%. Notice there is a large overlap in the two C.I.s, between Clinton’s low, 44.2% and Trumps high, 50.4%. That overlap suggests that scenarios in which either candidate might have been ahead were equally likely. The only valid interpretation that would have allowed for a Clinton win would have been if those C.I.s did not overlap. Thus, the only legitimate claim that pollsters could have made, and that journalists should have reported, would have been that the polls were not sufficiently accurate to make a prediction.
To illustrate the problem, suppose the carpenter’s rule mentioned at the outset of this essay was marked in increments of ½ inch. But the carpenter takes on a job that demands a 3 ¼ inch cut. She would be stuck because she could mark the board at either 3 inches or 3 ½ inches, but could not rely on her measuring stick to show ¼ inch increments. Thus, because only about 2% separated Clinton from Trump, it would have taken a poll with a C.I. of ±1% to detect a difference. That would have required a sample of 9,602, and is not going to happen within the real-world constraints on pollsters and journalism enumerated above.
But wait, the statistician in the back of the room points out that if a poll with a C.I. of ±4% were correctly interpreted to indicate no predication could be made, it would have been accurate! Alas, that is not what editors want to hear and it is not what happened is it?
The Electoral College: The Wisconsin example
Winning the popular vote can’t give Clinton much solace because she lost by a hair in Wisconsin, Pennsylvania, and Michigan, and that cost her the Presidency in the Electoral College. Wisconsin provides a good example of bad campaign strategy that may have been driven by flawed polls. Clinton, who posted at least a 6% lead in every state-wide poll realclearpolitics reported, did not make a single campaign stop in Wisconsin. She won voters under 30 years old by just 4 points. Obama won them by 23 points four years ago, according to the Detroit Free Press. The state voted Republican for the first time since 1984. “It’s is nothing short of malpractice that her campaign didn’t look at the electoral college and put substantial resources in states like Michigan and Wisconsin,” Democratic pollster Paul Maslin said.
Similar overestimates of Clinton’s performance in polls can be seen in all the other so-called battleground states.
So, what happened?
Clinton’s chickens never hatched
Pretty much all the election postmortems critiquing the polls point to bad decisions about likely voters which overestimated turnout for Clinton supporters. Remember, teasing out likely voters from a sample based on registered or eligible voters is subjective. Polls frequently include a bank of questions explicitly intended to spot likely voters, such as: “How likely are you to vote in the upcoming elections?” or “Did you vote in the last election?” But the answers to those questions are dependent on the respondent telling the truth, and they frequently don’t. More than one postmortem pointed exactly to a “social desirability bias effect” among millennials for overestimates of Clintons strength. That is, many of her young supporters said they were going to vote but never made it to the polls.
The Truth was out there
The scenarios for misinterpretation of the polls were outlined in the articles posted on MediaFile before the election and contend that beginning around the time Trump began winning primaries he should have been taken more seriously.
Trump had a ceiling of somewhere between 40%-45% and it was unlikely his support would to rise above that level. Most mainstream analysis agreed with that idea. What mainstream media and pollsters failed to factor in was the intensity of that support, which was extremely high, committing Carinal Sin #4. The other side of the coin for Trump was that his supporters were going to turnout and his basement would not be below his 40%-45% ceiling. Texas Tech Prof. Erik Bucy, who specializes in political psychology, measured intensity of liking for the candidates in real time during the debates. The striking features of the analysis is the spike in intensity among Republicans when Trump spoke, and the flat lines generated by Clinton’s presentations, even among Democrats.
Clinton, on the other hand, had a firm basement of support at about the same level as Trumps ceiling, somewhere in the mid to upper 40s. Her peak, I surmised, was somewhere in the 50s, but was very soft due to her low intensity problem. An earlier essay warned that Trump could expect a high turnout, while Clinton was in danger of a low turnout driven by lack of intensity. That predication was right.
Things to bring up at the intervention for journalism
If a poll’s results are not outside its confidence interval, don’t report them as if they were. If confidence intervals overlap, then the poll gives you no guidance. Don’t pretend it does. The implications of this mistake are palpable, and false expectations among both voters and candidates may have affected voter turnout and Clinton’s campaign decisions, costing her the election.
Don’t be afraid to play around with the data: The “likely voter” questions in most polls have been around 50 years and their limitations are well known. Ask “What if?” questions and run the data again. If someone had asked “What if support among millennials is 5%-10% below levels enjoyed by Barack Obama?” the analysis would have nailed the true outcome. Writing alternative scenarios also can make polling stories more interesting.
Look for alternative sources to support polling data: This goes to an age-old guideline in journalism: Don’t run one source stories. It is not a new idea. There were other ways to report public opinion. How about the size and intensity of crowds at rallies? Both Trump and Sanders events were packed with energetic supporters. Clinton’s were not. The number of small donations has also come into play. Sander’s effort was fueled by small contributions and surpassed Clinton’s deep pocket donations. Another thing to look harder actual electoral performance in primaries. Trump beat back challenges from nine other rivals, despite predictions his movement would collapse at any moment. Sanders was winning primaries in the states where Clinton went on to lose the election. Clinton’s nomination was based largely on wins in predominantly Republican states where she would go on to lose the election. As Sander’s pointed out, the pool of so-called Super Delegates that helped her over the top was made up of party cronies and loaded against a grassroots candidate.
Just ask good questions: This is, again, just basic journalism. Two questions nagged the Clinton campaign:
If Trump was as dangerous and incompetent as many mainstream media reports proposed, why didn’t Clinton post a 20-point lead over him in the polls? Pennsylvania was a bellwether state. It was the one place Clinton could show a lead outside the polls’ C.I., usually around 8%. When that lead evaporated in the run-up to the election, any clear-headed analysist had to wonder what was going on. Clinton reportedly blamed her loss on FBI Director James Comey, who reopened the investigations into her emails a week before the election. But her lead in Pennsylvania and elsewhere began to inexplicably disappear in polls executed before that took place.
Why did Sanders keep beating Trump in head-to-head polls by double digits when Clinton could not get the margin in similar match-ups out of the C.I. during the primaries. Sanders won primaries in Midwestern states such as Wisconsin and Michigan where Clinton lost the election. These were the kinds of anomalies good journalists notice. The nearly universal assessment among political psychologists and professionals was that Clinton had no message and inspired no energy among her base. But with all the red flags were waving, only the National Enquirer seemed to call the election right.
Just how high are the stakes?
Journalism is supposed to tell people what is going on in the world, and that includes reporting public opinion polls accurately. This is especially true when “what is going on” is complex and scary, as was the 2016 Presidential Election. The profession embraced Philip Meyer’s 1973 book, Precision Journalism: A Reporter’s Guide to the Social Science Methodology, and his challenge for journalists to understand and interpret public opinion polls a generation ago. Yet a lapse in precision in reporting polls may have affected the outcome of the 2016 election and have implications for a generation.
It’s time to hold an intervention for journalism. While reassessing the way polls are interpreted and reported is not the only part of the soul-searching journalism should be doing right now, it is an important part of it.