Can Social Media Data Save The Polls?

When it comes to political decisions, there are few that were talked about more on social media than the EU Referendum. In the six weeks leading up to the vote, there were 54 million tweets about Brexit. These tweets were collated in the Press Association's live referendum data hub.
|
Open Image Modal
Bloomberg via Getty Images

The first known example of an opinion poll was in 1824, which was a local straw poll in Pennsylvania on the US presidential election. In 1916, The Literary Digest embarked on a national survey (partly as a circulation-raising exercise) and correctly predicted Woodrow Wilson's election as president. Mailing out millions of postcards and simply counting the returns, The Literary Digest correctly predicted the victories of Warren Harding in 1920, Calvin Coolidge in 1924, Herbert Hoover in 1928, and Franklin Roosevelt in 1932. Of course polling has come a long way since those early days but one inherent feature still remains - human surveys - usually conducted over the telephone.

However, it's now becoming a familiar refrain - the polls got it wrong (again)! The 2015 UK general election, Brexit and the most high profile of them all, the US presidential election 2016. Sites that used the most advanced aggregating and analytical modelling techniques available had a Clinton victory at silly odds: the New York Times had her chances of winning at 84% and the Princeton Election Consortium had her at 95-99%! Even Nate Silver who has had a pretty impressive track record gave Clinton a 71% chance of winning on the eve of the election.

So what's going wrong? There are a number of trends driving the unreliability of election and other polling. The first is mobiles. Prior to mobiles, the ubiquity of landline telephones made finding reasonably random and representative samples easy, as pollsters could just pick random names out of phone books, call potential voters and talk them through interviews. This provided the kind of rich context and human understanding necessary for properly analysing their responses. That method also ensured reasonably high response rates and helped control nonresponse bias. But the rise of mobiles and the demographic differences of their adoption mean that random samples of landlines have become increasingly inadequate and unrepresentative of the population. The problem with moving to mobiles or even attempting a hybrid approach is that mobiles are not usually publicly-listed, making it harder to find representative samples. Various online survey methods have been used to supplement or supplant more expensive and less expansive phone methods, but they often also suffer from bias and are generally considered of lower quality than other polls.

The second factor is the decline in people willing to answer surveys. Telephone surveys in the US in the late 1970s achieved an 80% response rate. Enter voicemail, mobiles, decline of landlines and people generally not answering, and by 1997 response rates were down to 36%, crashing to 8% by 2014.

You don't need a PhD in maths statistics to know that sample size is important for accuracy. Put simply: poll more people and your errors go down. If you're struggling with numbers errors will increase. To get more numbers requires more time and hence cost. These two factors have made high-quality research much more expensive to do, so there is less of it. To top it off, a perennial election polling problem, how to identify "likely voters" has become even thornier. Consequently election polling is in near crisis. If only there was an accessible place where people freely and willingly share their true feelings...

Without a doubt, social media has brought about a revolution in communication. Since 2004, its growth has been near exponential. It's no longer the preserve of teenagers and millennials but firmly fixed in everyone's daily lives. So much so, headlines suggest that social media won the US election - from Trump's extensive use of Twitter to push his campaign messages to the influence of 'fake' news and echo chambers. Like never before people are freely broadcasting their views and sharing other people's opinions they support.

Like with all data though, it's what you do with it that counts. Simply measuring volume won't tell you much if you're hoping to predict the outcome of an election - or anything else for that matter. You need to combine volume with other elements like emotion, and the strength of that emotion to get a clear picture of overall opinion. The ability to analyse this type of data across social channels in real-time allows campaigners, broadcasters and more, to sit above the echo chamber and get a more realistic view of what is going on.

For example, in 2014 we worked with LBC on the two live debates between Nigel Farage and Nick Clegg. Over the course of both debates, there were 100,000 relevant tweets available for analysis. Unlike popular opinion in the green room, these clearly showed Farage stormed to victory in both debates, seconds after they had finished. This was confirmed by the official polls hours later.

When it comes to political decisions, there are few that were talked about more on social media than the EU Referendum. In the six weeks leading up to the vote, there were 54 million tweets about Brexit. These tweets were collated in the Press Association's live referendum data hub. Contrary to what the polls, and the bookies, were telling us, data from social media showed 'Leave' was ahead throughout the campaign. Incredibly, at 1:00am on 23 June, the hub was predicting an outcome of Leave at 57.7% and Remain at 42.3%. At 2:00am the positions were Leave at 51.3% and Remain at 48.7%.

The advantage of analysing data from social media, is that you can capture everything that's relevant and get a true reflection of opinion. Social data doesn't have a problem with sample sizes and response rates. People are freely and openly sharing their views - all you need to do is harvest them. While critics will complain social isn't a representative sample of the population; we'd simply say - do you know anyone who doesn't use social?

In many respects, analysing data from social media is doing what the Literary Digest did back in the early 20th Century. Only this time, we aren't mailing postcards to people. People are writing and mailing their own postcards in vast numbers, on a daily basis.