There's a new study out purporting to show that Twitter mentions are just as good as polling in predicting elections. I'm skeptical, and regardless of the study's findings, the truth is that good survey research-whether for campaigns, news organizations, or academic research-does far more than predict winners.
The study (preliminary version here), which was reported by co-author Fabio Rojas in a Washington Post op-ed, collected tweets about candidates for the House in the 2010 election cycle, and found that the number of mentions was correlated with election outcomes-the higher percentage of tweets that mentioned a candidate, the higher percentage of the vote she would receive. It didn't matter whether the comments were positive or negative; it seemed that the candidate who got the more attention just did better.
That's fine, and not particularly surprising-after all, while Rojas brags about hitting the winner of almost every race, the truth is that most House elections can be easily predicted just by going with the incumbent. I'm doubtful that he has more than that going for him; not only is it not clear that the results would translate to other elections in other years, but even worse, it's very possible that his results may only hold for the particular stage that Twitter was at in 2010 (much less for other nations, contrary to Rojas's suggestion that his Twitter could easily be exported worldwide; for more, see critiques from Stu Rothenberg and Jason Linkins).
The real problem, however, is that the claims Rojas makes about the uses of this technique-that it could replace polling or substitute for it when polling isn't possible-don't seem to be based on what surveys are designed to do.
He suggests that prediction-by-Twitter could be great for democracy: "Polling favors the established candidates because it is relatively expensive." But campaigns don't use polling only to figure out whether they are winning or losing, or to predict the outcome. Instead, candidates need to know where and how to deploy campaign resources-something that a "horse race" top-line number doesn't help with at all. Indeed, to the extent that campaigns are already using sophisticated tools to extract useful information from social media, odds are that it requires exactly the same kind of expensive expertise that well-funded campaigns have traditionally purchased in order to get good survey information. No, Barack Obama's ballyhooed analytics department couldn't be replaced by an app.
News organizations also use horse-race polling quite a bit. If it turned out that Twitter analysis was a good substitute, they might consider using it-although some would argue that there's already too much emphasis on "who's ahead?" coverage. Even so, however, it's not clear how useful the technique might be. It again depends on how well it works in other contexts, other types of elections, other years-but also on how it works within a cycle. How does social media compare with polling or other types of predictions six months out? Nine months out? A year?
Even news organizations use polling for far more than horse-race coverage. They use it to try to understand what the voters are thinking-even why! They use it to put together information not just of what the electorate will do, but who they are.
Academic researchers use survey research-such as the American National Election Studies-for a far more rigorous version of those "Who?" and "Why?" questions. Who votes? What demographic or political voters drive turnout? How important is party identification for vote choice-when it comes down to it, do we really vote the person, or the party? Given how important party identification is, what does it take to budge voters away? And thousands of other questions about voters, about campaigns and electioneering, about the effects of underlying conditions, and more. Being able to predict election outcomes is nice, but it doesn't do very much for our interest in understanding them.
Certainly, political scientists are looking to supplement the ANES and other traditional information with data taken from social media, but no one thinks that collecting tweets can replace it.
Because even if this technique really holds up-and, again, I'm skeptical-it's just not clear what it gives us. Knowing that the candidate who generates the most publicity also wins the most votes? Yeah, we already know that. Being able to therefore predict the winner? Eh. The election is going to happen; with all due respect to poll aggregators such as Nate Silver and Mark Blumenthal, that part of what they do isn't very important at all. After all, sooner or later the election will happen, and then we'll all know who won; knowing the likely winner in advance doesn't really do much in the big scheme of things. What we get from polling-and for that matter, the data available from social media-are far more informative than that.