Here's a snapshot taken from our paper "Predicting and Characterising User Impact on Twitter" that will be presented at EACL '14.

In this work, we tried to predict the impact of Twitter accounts based on actions under their direct control (e.g. posting quantity and frequency, numbers of @-mentions or @-replies and so on), including textual features such as word or topic (word-cluster) frequencies. Given the decent inference performance, we then digged further into our models and qualitatively analysed their properties from a variety of angles in an effort to discover the specific user behaviours that are decisive impact-wise.

On this figure, for example, we have plotted the impact score distribution for accounts with low (L) and high (H) participation in certain behaviours compared to the average impact score across all users in our sample (red line). The chosen user behaviours, i.e. the total number of tweets, the numbers of @-replies, links and unique @-mentions, and the days with nonzero tweets, were among the most relevant ones for predicting an account's impact score.

In a nutshell, based on our experimental process, we concluded that activity, interactivity (especially when interacting with a broader set of accounts rather than a specific clique) and engagement on a diverse set of topics (with the most prevalent themes of discussion being politics and showbiz) play a significant role in determining the impact of an account. For a more detailed analysis, however, you'll need to read the actual paper.

How to read the figure: The 2nd top-down subfigure indicates that users with very few unique @-mentions (those who mention a very limited set of other accounts) have an impact score approximately identical to the average (red line), whereas users with a high number of unique @-mentions tend to be distributed across impact scores distinctively higher than the average one.

Fig. 1(a): Pearson's r between Literary Misery (LM) and Economic Misery (EM) for various smoothing windows, using a lagged version or a moving average over the past years.

Fig. 1(b): Literary Misery (LM) versus a moving average of the Economic Misery (EM) using the past 11 years (t=11).

In previous works, we have investigated emotion signals in English books (PLOS ONE, 2013) as well as the robustness of such signals under various metrics and statistical tests (Big Data '13).

Extending our previous research, we are now showing how emotions in books could correlate with systemic factors, such as the status of an economy (PLOS ONE, 2014). In our main experiment, we use a composite economic index that represents unemployment and inflation through the years, titled as Economic Misery (EM), and correlate it against a Literary Misery index (LM), that represents the composite emotion of Sadness minus Joy in books. We observe the best correlations, when EM is averaged over the past decade (see Fig. 1(a) & 1(b)); correlations increase for the period of 1929 (Great Depression) onwards. Interestingly, we get very similar results for books written in American English, British English and German when compared to their local EM indices (i.e. for the US, UK and Germany respectively). For more methodological details, a better presentation of all the results and an interesting discussion, where we argue that causation may be the reason behind this correlation, I have to point you to the actual paper.

Press Release: University of Bristol
Media Coverage: The Guardian, New York Times, Independent, The Conversation

Bentley A.R., Acerbi A., Ormerod P. and Lampos V. Books average previous decade of economic misery. PLOS ONE, 2014.

Many past research efforts have tried to exploit human-generated content posted on Social Media platforms to predict the result of an election1,2 or of various sociopolitical polls including the ones targeting voting intentions3,4. Most papers on the topic, however, received waves of criticism as their methods failed to generalise when applied to different case studies. For example, a paper5 showed that prior approaches1,3 did not predict the result of the US congressional elections in 2009.

One may also spot various other shortcomings of these approaches. Firstly, the modelling procedure is often biased towards specific sentiment analysis tools6,7. Therefore, the rich textual information is compressed to quite a few features expressing different sentiment or affective types. Apart from their ambiguity and overall moderate performance, those tools are also language-dependent to a significant extent and in most occasions machine-translating them creates problematic outputs. Furthermore, Social Media content filtering is performed using handcrafted lists of task-related terms (e.g., names of politicians or political parties) despite the obvious fact that the keywords of interest will change as new political players or events come into play. Most importantly, methods are mainly focusing on the textual content only without making any particular effort to model individual users or to jointly account for both words and user impact. Finally, predictions are one-dimensional meaning that one variable (a party or a specific opinion poll variable) is modelled each time. However, in a scenario where political entities are competing with each other, it would make sense to use a multi-task learning approach that will incororate all political players in one shared model.

As part of our latest research, we propose a method for text regression tasks capable of modelling both word frequencies and user impact. Thus, we are now in the position to filter (select) and weigh not only task-related words, but also task-related users. The general setting is supervised learning performed via regularised regression that, in turn, favours sparsity — from tens of thousands of words and users in a Social Media data set, we are selecting compact subsets that ideally will be related to the task at hand.

This drawing (HQ version) shows a result from our research published today in PLOS ONE. It actually is a simple plot depicting the difference between the emotions of Joy and Sadness in approx. 5 million books published in the 20th century (1900-2000). You may have noticed that the peak for Sadness — and equivalently the minimum level of Joy — occurred during the World War II period.

However, for people who despise simplicity, we have also included some more elaborate (and perhaps more interesting) results in our paper. I'm not going to repeat them here, of course, as it makes no sense — that's the purpose of the publication!

If you feel that reading an academic paper is a painful task (even for this easy-to-read, made-for-humans paper), then you might find the following press releases useful: PLoS ONE, University of Bristol or University of Sheffield, Nature.

Acerbi A., Lampos V., Garnett P., Bentley R.A. (2013) The Expression of Emotions in 20th Century Books. PLOS ONE 8(3): e59030. doi:10.1371/journal.pone.0059030