I am currently a Visiting Researcher at Google (UK) and a Research Associate at the Computer Science Department of UCL. My primary research focus is the analysis of human-generated content published on the Web, mainly on Social Media platforms. I am also interested in interdisciplinary research tasks that bring together Computer Science, Healthcare, Statistics and Social Sciences.

News snippets

Tags: Artificial Intelligence; Machine Learning; Natural Language Processing; Social Media; Twitter; Long Data

A bilinear model of the group ℓ1/ℓ2 regulariser for multi-task learning and multi-output text regression from Social Media Modelling voting intention from Twitter content - Bilinear text regression (ACL '13)

Instead of just modelling word frequencies (or in general word characteristics) by learning a weight vector w, we are also learning a weight for each user (uT). Thus, from a linear regression model, we now go bilinear. This idea was applied for predicting voting intention polls from tweets in two countries (Austria and the UK), but it is also applicable on various other NLP tasks, such as the extraction of socioeconomic patterns from the news. You may download a beta version of Bilinear Elastic Net (BEN) for MATLAB; relevant slides are available.

Modelling user impact on Social Media with Gaussian Processes (EACL '14)

What are the most important factors for determining user impact on Social Media platforms? Can we identify user actions that have a significant effect on their impact? In this work, we propose a set of nonlinear models based on Gaussian Processes for inferring user impact on Twitter. Our modelling is based on actions under the direct control of a user, including textual features such as word or topic (word-cluster) frequencies. Given the strong inference performance, we then dig further into our models and qualitatively analyse their properties from a variety of angles in an effort to discover the specific user behaviours that are decisive impact-wise. A brief summary of this work is given in this blog post.

Differences of z-scores between Joy and Sadness in books from 1900 to 2000 Affective patterns in books

What happens if we quantify affective expression in millions of books? We can probably identify periods with dominant emotions, extract temporal emotion patterns through the century and come up with interesting scenarios that may explain them (PLOS ONE, 2013). Additionally, we could explain those patterns by looking at their reflection in real-world tendencies such as indices about the main driving factor of the system we are living in, the economy (PLOS ONE, 2014).

An effort to assess the statistical robustness of the above findings together with comparative figures across different emotion detection tools are presented in this paper (Big Data '13).

Press releases: University of Bristol (1), University of Bristol (2), University of Sheffield
Media coverage: Nature, The Guardian, New York Times (1), New York Times (2), SlateBBC Radio 4, Die Welt

Word Cloud with automatically extracted n-grams used to track rainfall rates from Twitter content. Font size is proportional to an n-gram's importance weight and flipped words take negative weights. Nowcasting events from the Social Web (ACM TIST, 2012)

Can we exploit text generated by Social Media (e.g. Twitter) users to quantify the magnitude of events, such as an infectious disease (e.g. flu) or even a rainfall by applying Machine Learning methods?

Press release: University of Bristol 
Media coverage: ScienceDaily, Natural Hazards ObserverBBC Radio 4 
Distinctions: EPSRC research highlight (2011)Most notable computing publications (2012)

Flu Detector - Tracking Epidemics on Twitter
Tracking a flu epidemic by monitoring Twitter (CIP '10 & ECML/PKDD '10)

This is the first work showing that Social Media can be used to track the level of an infectious disease, such as influnza-like-illness (ILI), in the population. To achieve that we collected geolocated tweets from 54 UK cities, used them in a regularised regression model which was trained and evaluated against ILI rates from the Health Protection Agency. Flu Detector is a demonstration that used (now stopped!) the content of Twitter for nowcasting the level of flu-like illness in several UK regions on a daily basis. We've recently came up with an improved visualisation of predicted flu rates from Twitter data, but it is still in its alpha version.

Press release: Computer Science Department, University of Bristol
Media coverage: MIT Technology Review, New Scientist

Mood of the Nation infers Mood and Affect scores for several regions in the UK based on Twitter content Extracting collective mood patterns from Twitter on a daily (WWW '12) or an hourly (arXiv, 2013) basis

Mood of the Nation used (now stopped!) more than half a million geolocated tweets on a daily basis to detect mood and affect trends in the UK population focus on 4 categories of affect: joy, sadness, anger and fear. A simple assessment of those patterns reveals quite interesting results. Check this out for example!

Press release: University of Bristol 
Media coverage: Mashable, New Scientist, DradioBBC World News