Skip to main content

Lunchtime Seminar: Information Extraction and Analytics from Social Media and Online Marketplaces for UK Illegal Plant trade, military intelligence analysis and breaking news

Calendar Thursday, 21 February 2019
Calendar 12:00-13:00

This event has ended.

Contact

Add to calendar

Stuart Middleton Portrait

Near real-time information extraction from social media streams, online marketplaces and online forums is providing a new "virtual sensor" capability for Open Source Intelligence (OSINT). End users such as law enforcement agencies (e.g. UK Border Force, National Crime Agency), emergency response agencies (e.g. Tsunami early warning centres, Civil protection authorities) and news agencies (e.g. Deutsche Welle, BBC News) and  are all actively investigating how best to use this new data to support their day to day decision making requirements.

Research challenges in the area of online information extraction include how to assess the veracity of content, how to respond to its dynamic nature and how to develop algorithms which can cope with the growing volumes of online data. In the online world fake news is rife so analysing the sentiment, stance and provenance of sources is very important. Automated fact extraction and checking is a challenging task and still very much in its infancy. However techniques are emerging to support human verification processes, identifying contextual information around factual claims for cross-checking and collating content from different viewpoints and sources to develop a balanced picture of what is going on. Natural Language Processing (NLP) and Information Extraction (IR) approaches typically work with either large web-scale corpuses of example posts, or small hand crafted corpuses with annotated language patterns and/or vocabularies. In domains like breaking news the topic of interest changes every few hours, so compiling training data is not practical. In domains like cybercrime information exchanges are often hard to get and fragmented, with discussion threads switching between public forum exchanges and hidden private messaging frequently. Unsupervised Open Information Extraction (OpenIE) approaches are able to work with little or no training data, and incrementally self-learning strategies can be used to utilize relevance feedback and boost precision. Algorithm scalability is critical for near-real-time processing, so efficient indexing and/or naive parallelization are also becoming increasingly important.

In this seminar Prof Middleton will chart a path through his research into information extraction over the last 5 years, starting with algorithms to help breaking news verification and leading on to supporting sensemaking from OSINT for military intelligence analysis and law enforcement agencies. He will explain the algorithms used, results obtained and suggest some lessons learnt along the way.