Tuesday, June 22, 2010

Blogs and tweets could predict the future

by Jim Giles

In the time it takes you to read this sentence, more than a thousand tweets will have been twittered and dozens of blogs posted. Much of their content will be ephemeral fluff: personal gripes and tittle-tattle interesting to no one but the parties concerned. Yet despite this, it is possible to use that torrent of information to make predictions about social and economic trends that affect us all.
Interest in the idea of analysing web data to make predictions took off around a year ago, when researchers at Google used the frequency of certain search terms to forecast the sales of homes, cars and other products.
In their landmark study, Hal Varian, Google's chief economist, and his colleague Hyunyoung Choi showed how the volume of searches for certain products, such as types of car, rose and fell in line with monthly sales. Google keeps extensive records of what is being searched for, and that information is available almost instantaneously. That could make Varian and Choi's method a far quicker way of gauging purchasing behaviour than traditional sales forecasts, which are often made by looking back at purchasing patterns.
Other researchers have since analysed search terms to look at all manner of behaviours. In late 2009, economists at the Bank of Italy showed that the volume of searches for terms like "job search engine" is a good indicator of coming changes in the unemployment rate in the US. Researchers at the Ruhr University in Bochum, Germany, showed that tracking Google searches for consumer goods provided a better means of forecasting US retail sales than the traditional method of using surveys of consumer attitudes - the so-called Consumer Confidence Index.
Now other sources, such as blog posts and tweets, are being mined too, and the variety of subject matter they address might mean that phenomena other than purchasing patterns can be explored. "The possibilities are enormous," says Joseph Engelberg, a finance researcher at the University of North Carolina at Chapel Hill.
Tweets may prove useful to political pollsters, for example. Bryan Routledge and his colleagues at Carnegie Mellon University in Pittsburgh, Pennsylvania, ran a sentiment analysis on tweets posted in the run-up to the 2008 US presidential election relating to candidates Barack Obama and John McCain. They used the results to try to assess voting intentions as the election neared.
The researchers found that this Twitter rating tracked more formal opinion polls closely. And while they were not able to improve on the accuracy of those polls, the work did show that Twitter could provide a cheaper, quicker alternative, says Routledge.
Blog posts can be used to predict stock market behaviour, according to Eric Gilbert and Karrie Karahalios at the University of Illinois at Urbana-Champaign, who presented their findings last month, at the International Conference on Weblogs and Social Media in Washington DC.
They used over 20 million posts from the LiveJournal website to create an index of the US national mood, which they called the Anxiety Index. It is a measure of the frequency with which a range of words related to apprehension, such as "nervous", appear in the posts. Gilbert and Karahalios described how they have used the index to improve forecasts of the movement of the S&P 500, a stock market index based on large, public US companies.
Movement of the S&P 500 can be predicted with some degree of accuracy using a model that extrapolates from the past three days' prices. Gilbert and Karahalios found that when the Anxiety Index rose sharply, the S&P 500 ended the day marginally lower than the three-day model predicted. This shows, the researchers say, that the index can be a useful bellwether of economic behaviour. "Blogs provide a sample of what is going on in society," says Gilbert.
Posts on Twitter may hold similar predictive power. Johan Bollen and his colleagues at Indiana University in Bloomington have created an anxiety rating based on an analysis of hundreds of millions of tweets by people in the US. Their paper has not yet been published, but Bollen says they too found that increases in anxiety on their scale correlated with lower than expected stock prices. "We're astounded," he says. "We didn't think it would be a predictive relationship."
The frequent appearance of words like 'nervous' in blogs correlated with lower stock prices
That's because very few of the tweets were actually about stock trades. Instead it seems that the messages capture the "national mood", a collective feeling known to influence trading decisions.
Such a knowledge of national mood could be useful for stock traders. They will be less likely to take risks if they know consumers are pessimistic, for example, since consumer spending is a big part of economic growth.
Another group likely to seize on these kinds of predictive tools are hedge funds, for which anything that offers an edge can be worth millions of dollars. Engelberg has been analysing search engine terms to predict market behaviour and was asked to present his results to the directors of a New York-based hedge fund earlier this month. "They were very familiar with the data," he says. "I got the sense they were [already] using it."
It is likely that the predictive power of these techniques will increase as researchers develop more sophisticated methods for gauging the emotional content of blogs and tweets. For example, it may be possible for Gilbert and Karahalios to fine-tune their Anxiety Index to look at a broader range of emotional cues.
Other researchers are sceptical about the reliability of blogs and tweets, however. Paul Tetlock at Columbia University in New York studies how stock markets are prone to being influenced in unexpected ways. In 2007, he showed that the sentiments expressed in a column in The Wall Street Journal can influence stock market behaviour.
The problem with using sentiment analysis from blogs and the like, Tetlock says, is that it is only indirectly linked to trading decisions. "A person talking about anxious feelings in a blog or tweet may or may not be more averse to taking trading risks," he says. "Moreover, the people on many of these sites are kids, whose general anxiety is probably only weakly correlated with their parents' investing behaviour."
Search terms, on the other hand, are a "particularly promising" means of predicting market behaviour, Tetlock says. They are a direct measure of what people are paying attention to, and therefore likely to correlate to real-world behaviour.
Issue 2765 of New Scientist magazine
  • Subscribe to New Scientist and you'll get:
  • New Scientist magazine delivered to your door
  • Unlimited access to all New Scientist online content -
    a benefit only available to subscribers
  • Great savings from the normal price
  • Subscribe now!

No comments:

Post a Comment