In just over a week, Ohio voters will decide on Issue 2, a referendum on whether to keep a recently enacted law which “limits collective bargaining for public employees in the state.” Recent polls show decidedly more public opposition to this law than support for it, with a 57-32 pro-repeal split in a Quinnipiac poll and a similar 56-36 split in a PPP poll. But as Washington Post blogger Greg Sargent reports, labor groups opposing the law are circulating a once-internal memo questioning just how predictive those polls will turn out to be. The memo points out that the surveys do not include the actual language that will appear on the ballot, that turnout levels in off-year elections are uncertain, and that polling on prior Ohio ballot measures has been inaccurate.
The accuracy of telephone surveys on ballot measures is an empirical question, and one that is of interest well beyond Ohio’s Issue 2, so let’s look at the data. Between 2003 and 2010, we can track down 438 publicly available survey questions about support for state-level ballot measures in upcoming elections. The first point to make is that there is some external validity to that internal memo: there is often a pronounced gap between the surveys and the election-day outcomes. The average absolute difference between the polls and the outcomes—the average polling error—is a striking 7.8 percentage points. In 26.5% of the polls, the eventual election outcome differed from what the poll predicted by more than the swing needed to turn the Ohio PPP result into a dead heat on election day.
Yet those results tell us about the magnitude of the errors, not about their likely direction. So the question becomes, can we identify any systematic biases in the pre-election polls? Only a handful of the ballot measures in this data set deal with labor unions specifically, so there is not much to say there. But we can identify clusters of ballot measures on more common issues and then look for patterns in the issues that induce larger or smaller polling-performance gaps.
The Figure just below shows the relevant regression coefficients, where the gap between performance and polls is predicted using the ballot measure’s baseline support, issue-specific indicator variables, and the year of the election. (A similar technique is at work in some prior research.) Each estimated coefficient is a dot, with the thin line representing a 95% confidence interval. Positive coefficients indicate issues that receive more support at the ballot box than the poll would predict, with the baseline being the hundreds of polls on all other issues.
The results? Surveys on ballot measures to ban same-sex marriage systematically and strongly understate their support on election day, by an average of 8.3 percentage points. NYU’s Patrick Egan has documented this bias as well. Voters tell pollsters they will oppose same-sex marriage bans at notably higher rates than they actually do. Polls on ballot measures to restrict immigration show a similar bias, as they also underestimate the electorate’s support. Social desirability gives us an off-the-shelf explanation for both results. And it gets added weight from the marijuana-related ballot measures, whose support is underestimated as well. People vote to expand access to marijuana at higher rates than they indicate to phone interviewers, a finding that Nate Silver has explored.
By contrast, most of the other issues don’t produce very large biases in either direction, including the issues (like education and tax reduction) most closely related to Ohio’s Issue 2. These patterns are consistent with the claim that polling is especially difficult where social desirability comes into play, with respondents not wanting to seem homophobic, anti-immigrant, or pro-marijuana. In fact, the only result that doesn’t make immediate sense from a social desirability standpoint is the null result for ballot measures on restricting gambling. So while the polling of ballot measures is prone to significant variability, it appears to be those ballot measures that address socially sensitive groups or topics that give rise to the most predictable errors.