Social Data Science Lab finds link between social media and crime patterns

25 August 2015

Social and computer scientists at the Social Data Science Lab at Cardiff University have analysed tweets to help predict crime patterns in London.

In 2013 the Lab was awarded an Economic and Social Research Council grant to examine if ‘big data’ - data sets so large or complex that traditional data processing applications are inadequate - can predict crime.  It took 12 months to collect 180 million geocoded tweets and close to 600,000 Metropolitan Police recorded crime incidents, and a further nine months to transform the data in order to build their predictive models.

The inter-disciplinary project team includes academics from Cardiff University’s School of Social Sciences and the School of Computer Science and Informatics.

The research developed new data fusion techniques and improved upon existing mathematical models that have used social media data to predict voting patterns, the spread of disease, the revenue of Hollywood movies, and the estimates of the centres of earthquakes.

Project leader and Lab Director Dr. Matthew Williams, who came up with the hypothesis that social media communications are related to offline crime patterns said: “These studies illustrate how social media generates naturally occurring socially relevant data that can be used to complement and augment conventional curated data to predict offline phenomena.

In our project, we hypothesised that crime and disorder related tweets would be associated with actual crime rates. Our preliminary statistical results that are driven by criminological theory show that tweets about certain crime types and signatures of crime and disorder help estimate actual patterns of crime, often over and above conventional correlates such as unemployment and proportion of young people in an area”.

The outcomes of the project will be of use to such organisations as the Metropolitan Police Service, the Home Office, the Association of Chief Police Officers, the College of Policing, Police and Crime Commissioners, the Office for National Statistics and various voluntary organisations. 

Dr Luke Sloan, Deputy Director, said: “The potential value added by social media data is that it is user-generated in real-time in voluminous amounts. As such it can provide insight into the behaviour of populations on the move; the ‘pulse of the city’. This is in contrast to the necessarily retrospective snapshots of social trends and populations provided by conventional methods such as household surveys and officially recorded data.”

“We have employed advanced statistical analysis that takes into account variation in time and space given that new forms of big data, like social media communications, occur in real-time, unlike conventional data that the police are used to using.  These models allow us to re-test classic criminological theories, bringing their explanatory power into the 21st century” said Dr. Williams.

He continues: "Recent claims have been made that big data make theory and scientific method obsolete.  Yet high profile failures of big data, such as the inability to predict the US housing bubble in 2008 and the spread of influenza across the US using Google search terms, have resulted in many questioning the power of these new forms of data."

Dr. Pete Burnap, Director of the Lab and computational lead on the project, commented: “To date the default approach in big data research seems to have been wholly data driven in the effort to predict.  However, without theory driven data collection, transformation and analysis we cannot answer the substantive questions about social processes and mechanisms that concern us. Purely data driven approaches tend to produce models and algorithms that are over fit to the idiosyncrasies of a particular data set, leading to spurious results that often do no not reflect reality.  This is why we have put a series of strict checks and balances in place, such as augmenting big data with conventional sources and using theory to drive our analytical process.”

This work was made possible by a National Centre for Research Methods Methodological Innovations grant and was recently featured in their summer newsletter. Funds from various Economic and Social Research Council programmes including Digital Social Research, Google Data Analytics, Global Uncertainties and National Centre for Research Methods, have also enabled the Lab researchers to detect online racial tension following sporting events, model the propagation of cyberhate following a terrorist attack, and detect the presence of counter-speech as form of online community based regulation.