150 years of British history

11 January 2017

Web browser with blue overlay

What could be learnt about the world if you could read the news from over 100 local newspapers for a period of 150 years?

This is what a team of researchers from Cardiff University and the University of Bristol have done using of Artificial Intelligence (AI) to analyse 150 years of British regional newspapers.

The patterns that emerged from the automated analysis ranged from the detection of major events, to the subtle variations in gender bias across the decades. The study has investigated transitions such as the uptake of new technologies and even new political ideas, in a new way that is more like genomic studies than traditional historical investigation.

The team of academics collaborated with the company findmypast, which is digitising historical newspapers from the British Library as part of their British Newspaper Archive project.

The main focus of the study was to establish if major historical and cultural changes could be detected from the subtle statistical footprints left in the collective content of local newspapers. How many women were mentioned? In which year did electricity start being mentioned more than steam? Crucially, this work goes well beyond counting words, and deploys AI methods to identify people and their gender, or locations and their position on the map.

The landmark study collected a huge amount of regional newspapers from the UK, including geographical and time-based information that is not available in other textual data such as books. Over 35 million articles and 28.6 billion words, from the British Library’s newspaper collections, representing 14 per cent of all British regional outlets from 1800 to 1950, were used for the study.

Nello Cristianini, Professor of Artificial Intelligence, from the University of Bristol who led the study said: “The key aim of the study was to demonstrate an approach to understanding continuity and change in history, based on the distant reading of a vast body of news, which complements what is traditionally done by historians..."

“The research team showed that changes and continuities detected in newspaper content can reflect culture, biases in representation or actual real-world events. More detailed studies on the same data will be performed.”

Professor Nello Cristianini, University of Bristol

Simple content analysis allowed the researchers to detect specific key events like wars, epidemics, coronations or gatherings with high accuracy, while the use of more refined techniques from AI enabled the research team to move beyond counting words by detecting references to named entities, such as individuals, companies and locations.

Some of the results were to be expected, and acted as a rational check for the approach, while other outcomes were not so obvious at the start of the analysis.

The researchers found in the areas of values, beliefs and UK politics that in the 19th century Gladstone was much more newsworthy than Disraeli; until the 1930’s Liberals were mentioned more than Conservatives, and that reference to British identity took off in the 20th century.

In the subjects of technology and economy, the research team tracked the steady decline of steam and the rise of electricity, with a crossing point of 1898; trains overtook horses in popularity in 1902; and the four largest peaks for ‘panic’ corresponded with negative market movements linked to banking crises in 1826, 1847, 1857 and 1866.

The researchers have shown in the subjects of social change and popular culture that the Suffragette movement fell within a delimited time interval 1906 to 1918; ‘actors’, ‘singers’ and ‘dancers’ began to increase in the 1890s, rising significantly from then on, while references to ‘politicians’, by contrast, gradually declined from the early 20th century; and that ‘football’ was more prominent than ‘cricket’ from 1909.

Replicating a previous study done on book content, the researchers then moved on to link famous people in the news to their profession, finding that politicians and writers are most likely to achieve notoriety within their lifetimes, while scientists and mathematicians are less likely to achieve fame but decline less sharply.

More importantly, the researchers found that males are systematically more present than females during the entire period studied, but there is a slow increase of the presence of women after 1900, although it is difficult to attribute this to a single factor at the time. Interestingly, the amount of gender bias in the news over the period of investigation is not very different from current levels.

Justin Lewis

“Our research shows the enormous potential of the use of artificial intelligence techniques for the analysis of huge media data sets."

Professor Justin Lewis, Professor of Communication

Professor Justin Lewis, of Cardiff University’s School of Journalism, Media and Cultural Studies, who was involved in the study, said: "While this will never replace more qualitative forms of analysis, it does allow us to trace the broad content of media across numerous outlets over long time periods. In the future, this approach will enhance our ability to explore keys issues, from political impartiality to social representation and influence.”

Content analysis of 150 years of British periodicals by Lansdall-Welfare et al is published in the journal Proceedings of the National Academy of Sciences.

For more information about our research, courses and staff visit our School website.