An analysis of the user occupational class through Twitter content

In our ACL '15 paper — co-authored with Daniel Preoţiuc-Pietro and Nikolaos Aletras — "An analysis of the user occupational class through Twitter content," we explore the dynamics of social media information in the task of inferring the occupational class of users. We base our analysis on the Standard Occupational Classification from the Office of National Statistics in the UK, which encloses 9 extensive categories of occupations.

The investigated methods take advantage of the user's textual input as well as platform-oriented characteristics (interaction, impact, usage). The best performing methodology uses a neural clustering technique (spectral clustering on neural word embeddings) and a Gaussian Process model for conducting the classification. It delivers a 52.7% accuracy in predicting the user's occupational class, a very decent performance for a 9-way classification task.

Our qualitative analysis confirms the generic hypothesis of occupational class separation as indicated by the language usage for the different job categories. This can be due to a different topical focus, e.g. artists will talk about art, but also due to more generic behaviours, e.g. the lower-ranked occupational classes tend to use more elongated words, whereas higher-ranked occupations tend to discuss more about Politics or Education.

We are also making the data of the paper available (README).

D. Preoţiuc-Pietro, V. Lampos and N. Aletras. An analysis of the user occupational class through Twitter content. ACL '15, pp. 1754-1764, 2015.