Blogs

Predicting and characterising user impact on Twitter


How to read this figure: The 2nd top-down subfigure indicates that users with very few unique @-mentions (those who mention a very limited set of other accounts) have an impact score approximately identical to the average (red line), whereas users with a high number of unique @-mentions tend to be distributed across impact scores distinctively higher than the average one.

Here's a snapshot taken from our paper "Predicting and Characterising User Impact on Twitter" that will be presented at EACL '14.

In this work, we tried to predict the impact of Twitter accounts based on actions under their direct control (e.g. posting quantity and frequency, numbers of @-mentions or @-replies and so on), including textual features such as word or topic (word-cluster) frequencies. Given the decent inference performance, we then digged further into our models and qualitatively analysed their properties from a variety of angles in an effort to discover the specific user behaviours that are decisive impact-wise.

On this figure, for example, we have plotted the impact score distribution for accounts with low (L) and high (H) participation in certain behaviours compared to the average impact score across all users in our sample (red line). The chosen user behaviours, i.e. the total number of tweets, the numbers of @-replies, links and unique @-mentions, and the days with nonzero tweets, were among the most relevant ones for predicting an account's impact score.

In a nutshell, based on our experimental process, we concluded that activity, interactivity (especially when interacting with a broader set of accounts rather than a specific clique) and engagement on a diverse set of topics (with the most prevalent themes of discussion being politics and showbiz) play a significant role in determining the impact of an account. For a more detailed analysis, however, you'll need to read the actual paper.

Emotions in books reflect economic misery


Fig. 1(a): Pearson's r between Literary Misery (LM) and Economic Misery (EM) for various smoothing windows, using a lagged version or a moving average over the past years.

Fig. 1(b): Literary Misery (LM) versus a moving average of the Economic Misery (EM) using the past 11 years (t=11).

In previous works, we have investigated emotion signals in English books (PLOS ONE, 2013) as well as the robustness of such signals under various metrics and statistical tests (Big Data '13).

Extending our previous research, we are now showing how emotions in books could correlate with systemic factors, such as the status of an economy (PLOS ONE, 2014). In our main experiment, we use a composite economic index that represents unemployment and inflation through the years, titled as Economic Misery (EM), and correlate it against a Literary Misery index (LM), that represents the composite emotion of Sadness minus Joy in books. We observe the best correlations, when EM is averaged over the past decade (see Fig. 1(a) & 1(b)); correlations increase for the period of 1929 (Great Depression) onwards. Interestingly, we get very similar results for books written in American English, British English and German when compared to their local EM indices (i.e. for the US, UK and Germany respectively). For more methodological details, a better presentation of all the results and an interesting discussion, where we argue that causation may be the reason behind this correlation, I have to point you to the actual paper.

Press Release: University of Bristol
Media Coverage: The Guardian, New York Times, Independent, The Conversation

Reference
Bentley A.R., Acerbi A., Ormerod P. and Lampos V. Books average previous decade of economic misery. PLOS ONE, 2014.

Predicting voting intention from Twitter

Many past research efforts have tried to exploit human-generated content posted on Social Media platforms to predict the result of an election1,2 or of various sociopolitical polls including the ones targeting voting intentions3,4. Most papers on the topic, however, received waves of criticism as their methods failed to generalise when applied to different case studies. For example, a paper5 showed that prior approaches1,3 did not predict the result of the US congressional elections in 2009.

One may also spot various other shortcomings of these approaches. Firstly, the modelling procedure is often biased towards specific sentiment analysis tools6,7. Therefore, the rich textual information is compressed to quite a few features expressing different sentiment or affective types. Apart from their ambiguity and overall moderate performance, those tools are also language-dependent to a significant extent and in most occasions machine-translating them creates problematic outputs. Furthermore, Social Media content filtering is performed using handcrafted lists of task-related terms (e.g., names of politicians or political parties) despite the obvious fact that the keywords of interest will change as new political players or events come into play. Most importantly, methods are mainly focusing on the textual content only without making any particular effort to model individual users or to jointly account for both words and user impact. Finally, predictions are one-dimensional meaning that one variable (a party or a specific opinion poll variable) is modelled each time. However, in a scenario where political entities are competing with each other, it would make sense to use a multi-task learning approach that will incororate all political players in one shared model.

As part of our latest research, we propose a method for text regression tasks capable of modelling both word frequencies and user impact. Thus, we are now in the position to filter (select) and weigh not only task-related words, but also task-related users. The general setting is supervised learning performed via regularised regression that, in turn, favours sparsity — from tens of thousands of words and users in a Social Media data set, we are selecting compact subsets that ideally will be related to the task at hand.

Emotions in English books

This drawing (HQ version) shows a result from our research published today in PLOS ONE. It actually is a simple plot depicting the difference between the emotions of Joy and Sadness in approx. 5 million books published in the 20th century (1900-2000). You may have noticed that the peak for Sadness — and equivalently the minimum level of Joy — occurred during the World War II period.

However, for people who despise simplicity, we have also included some more elaborate (and perhaps more interesting) results in our paper. I'm not going to repeat them here, of course, as it makes no sense — that's the purpose of the publication!

If you feel that reading an academic paper is a painful task (even for this easy-to-read, made-for-humans paper), then you might find the following press releases useful: PLoS ONE, University of Bristol or University of Sheffield, Nature.

Reference
Acerbi A., Lampos V., Garnett P., Bentley R.A. (2013) The Expression of Emotions in 20th Century Books. PLOS ONE 8(3): e59030. doi:10.1371/journal.pone.0059030

Flu Detector in the news

Flu Detector - Tracking Epidemics on Twitter - Nowcasting Flu Rates from Twitter contentOur latest work, where influenza-like illness rates are predicted from the content of Twitter, has been featured in mainstream media about technology and science.

BBC Radio 4, Costing the Earth (March 21, 2012)
by Martin Poyntz-Roberts

Examiner, Brits study forecasting flu outbreaks using Twitter (December 27, 2011)
by Linda Chalmer Zemel

New Scientist, Engines of the future: The cyber crystal ball (November 17, 2010)
by Phil McKenna

MIT Technology Review, How Twitter Could Better Predict Disease Outbreaks (July 14, 2010)
by Christopher Mims

Press releases
University of Bristol News, Could social media be used to detect disease outbreaks? (November 1, 2011)
University of Bristol, Computer Science Department News, Predicting Flu from the content of Twitter (July 1, 2010)

References
Lampos and Cristianini. Nowcasting Events from the Social Web with Statistical Learning. ACM TIST, 2012. [ pdf ]
Lampos, De Bie and Cristianini. Flu Detector - Tracking Epidemics on Twitter. ECML/PKDD '10. [ pdf ]
Lampos and Cristianini. Tracking the flu pandemic by monitoring the Social Web. CIP'10. [ pdf ]

LaTeX beamer how-to

Screenshot - LaTeX presentation using the Beamer classWhy not produce a presentation using a WYSIWYG editor (like MS Powerpoint)? For the same reasons you won't write your publications or books in MS Word. Coding your presentation in LaTeX with the 'help' of the beamer class makes the task easy, especially if you already have some experience with TeX scripting.

Beamer is a LaTeX class which provides a lot of easy-to-deploy commands, tricks, templates and libraries for producing presentations. I was searching the web for an easy-to-follow and use application of it, but this process wasn't very straightforward, so when I finally compiled a LaTeX presentation, I thought it might be useful to make the source code publicly available. Therefore, the attached documents are only good for beamer beginners (since they are the result of my first LaTeX presentation tryout).

How to sort a HashMap and maintain duplicate entries

HashMaps are not meant to be sorted. Anyway: this is how you could sort a HashMap maintaining, at the same time, duplicate values.

public LinkedHashMap sortHashMapByValuesD(HashMap passedMap) {
    List mapKeys = new ArrayList(passedMap.keySet());
    List mapValues = new ArrayList(passedMap.values());
    Collections.sort(mapValues);
    Collections.sort(mapKeys);
        
    LinkedHashMap sortedMap = 
        new LinkedHashMap();
    
    Iterator valueIt = mapValues.iterator();
    while (valueIt.hasNext()) {
        Object val = valueIt.next();
        Iterator keyIt = mapKeys.iterator();
        
        while (keyIt.hasNext()) {
            Object key = keyIt.next();
            String comp1 = passedMap.get(key).toString();
            String comp2 = val.toString();
            
            if (comp1.equals(comp2)){
                passedMap.remove(key);
                mapKeys.remove(key);
                sortedMap.put((String)key, (Double)val);
                break;
            }

        }

    }
    return sortedMap;
}