Can user-generated Internet content be used to assess the impact of a health intervention? In a new paper published at Data Mining and Knowledge Discovery, we propose a method for estimating the impact of a vaccination program for influenza based on social media content (Twitter) and search query data (Bing). The work has been done in collaboration with Public Health England and Microsoft Research, was funded by the interdisciplinary project i-sense and will be presented at the journal track of ECML PKDD 2015 in September.
Assessing the effect of a health-oriented intervention by traditional epidemiological methods is commonly based only on population segments that use healthcare services. Here we introduce a complementary framework for evaluating the impact of a targeted intervention, such as a vaccination campaign against an infectious disease, through a statistical analysis of user-generated content submitted on web platforms. Using supervised learning, we derive a nonlinear regression model for estimating the prevalence of a health event in a population from Internet data. This model is applied to identify control location groups that correlate historically with the areas, where a specific intervention campaign has taken place. We then determine the impact of the intervention by inferring a projection of the disease rates that could have emerged in the absence of a campaign. Our case study focuses on the influenza vaccination program that was launched in England during the 2013/14 season, and our observations consist of millions of geo-located search queries to the Bing search engine and posts on Twitter. The impact estimates derived from the application of the proposed statistical framework support conventional assessments of the campaign.
Vasileios Lampos, Elad Yom-Tov, Richard Pebody and Ingemar J. Cox. Assessing the Impact of a Health Intervention via User-generated Internet Content. Data Mining and Knowledge Discovery, 2015. doi: 10.1007/s10618-015-0427-9
Paper | Supplementary Material