Special Interest Groups
Each of the Institute Ambassadors lead on a special interest group (SIG).
This could be methodological and/or application focussed within the broad area of data science/innovation, and is operated through the ambassador’s existing research networks and interest groups. Amongst other activities, SIGs are supported by the Institute to identify and collaborate on bid submissions and to organise events such as workshops and seminars.
Contact: firstname.lastname@example.org or the relevant DIRI ambassador. We would be very happy to hear from you.
Chris Arnold, School of Law and Politics - Computational social sciences
The SIG focuses on using data-driven technologies to support social science research. Recent years have seen dramatic advances in the capabilities to gather and analyse data. Algorithms and computing power continue to improve at a vast pace. Most importantly for social sciences, humanity collects data about individuals and societies at an unprecedented level.
These innovations are changing the research questions social scientists will be able to answer. At the same time, these innovations alter the way in which we do social research.
This SIG will create a forum to critically advance the possibilities that stem from the combination of data science and social sciences. We exchange experiences, develop new strands of research and foster collaboration within Social Sciences, but also with colleagues from other disciplines. Aware of the deep impact of these technology driven innovations, we also examine their use and implication for society.
We understand social sciences in very broad terms and seek to include anybody who is interested to study human behaviour at the meso- or macro-level; including disciplines like business administration, economics, ethnography, geography, political science, or sociology among others.
DIRI provides an excellent infrastructure to implement research projects that use structured as well as unstructured data. Our research covers a broad portfolio of projects. Some of us focus on analysing visual data. For example one project predicts the likelihood for voting fraud in remote rural areas in developing countries on the basis of satellite data. Other projects study language and text, like the spread of misinformation in online debates. Another group of people is working on data protection. A project explores how to generate synthetic micro-data to protect sensitive data sets.
The Computational Social Sciences SIG regularly meet for midday lunch on the first Wednesday of each month in the cafeteria of the University’s main building. If you’d like to get in touch, please feel free to contact Chris Arnold:
Clarice Bleil de Souza, Welsh School of Architecture - Simulation and scientific visualisation
This group aims to connect people working with different types of simulations and their related scientific visualisations from the built environment to the biological and social domains.
The SIG will discuss and exchange ideas about the different types of simulations used in different research areas and knowledge domains opening doors for cross fertilisation. It will also map and try and find commonalities and novel applications for the different common topics for transdisciplinary research such as:
- Common issues in validation of results post processing of large datasets from simulation results
- Uncertainty in simulations
- Connections between simulation and different AI methods to aid data analysis and potentially data input (neural networks)
Examples of simulations
- building performance (e.g. thermal, daylighting, acoustics, life cycle assessment)
- emergency related (e.g. evacuation, fire, disaster)
- urban related simulations (e.g. pedestrian flow, space syntax, urban heat islands, mobility)
- resilience and post disaster relief
- transport related simulations
- animal behaviour and ecosystems
- natural cycles and processes (e.g. erosion, run off)
Erminia Calabrese and Cosimo Inserra, School of Physics and Astronomy - Unveiling and predicting data patterns
The SIG focuses on the development and application of supervised and unsupervised learning methodologies on large datasets, aiming also at increasing performances when using cross-correlated datasets.
Astronomical datasets are the eldest big data collection we have and allow us to extract information on broad and diverse physics phenomena.
As such, astronomy is the perfect test-bed for any kind of data mining, manipulation, recognition and forecasting algorithms. Examples of areas falling within the remit of the SIG include analyses of time series, of sky images, and of astronomical objects.
Some of these exploitations are combined in a deep learning environment in the form of recurrent (and convolutional) neural networks—Cosimo’s area of interest. Others rely on application of Bayesian statistics and development of new algorithms improving comparison between large datasets and complex theories—Erminia’s expertise.
Rhian Daniel, School of Medicine - Statistical Methodology
The Statistical Methodology SIG exists to promote the application and development of new statistical ideas. Novel methodology often arises directly from a desire to improve the depth or scope of a scientific investigation but may equally be motivated by simple curiosity.
Research in this field is distinguished from applied data science in seeking to use mathematical thinking to resolve an extended family of related problems. Examples of areas falling within the remit of the SIG include (but are of course not limited to!) time series analysis, classical biostatistics, experimental design, causal inference, machine learning, and extreme value theory.
Daniel Gartner, School of Mathematics - Healthcare analytics
The Healthcare Analytics SIG seeks to improve the understanding of the role that data innovation plays in health care systems around the globe.
Through this understanding the SIG will identify data science techniques that contribute to the efficient and effective delivery of quality care.
The SIG will foster collaborations in research and facilitate training and dialog among key health care stakeholders, including researchers, health care providers, decision-makers, policy-makers, and patients.
Dawn Knight, School of English, Communication and Philosophy - Corpus, discourse and text analysis
This SIG focuses on the development, use and applications of corpus-based methodologies, including:
- Development and applications of NLP and corpus methodologies in real-world contexts
- This SIG focuses on the development, use and applications of corpus-based methodologies, and their integration with NLP approaches, including:
- New methods and techniques in corpus development, annotation, visualisation and analysis
- New tools and techniques developed in corpus-based computational linguistics and NLP
- Critical explorations of existing measures and methods in corpus linguistics, and their integration with NLP approaches
- Advances in quantitative techniques for corpus exploration
Of interest, Dawn is currently organising the International Corpus Linguistics conference: an event which requires input from a range of different researchers to ask specific research questions of a particular dataset, and how it can be approached, providing multiple perspectives of the same stimulus, will be announced in due course.
Penny Lewis, School of Psychology - Computational modelling of human brain function
This SIG plans to approach the topic from all angles, from abstract connectionism to biologically plausible deep spiking networks. We are also exploring neural simulator frameworks that aim to strike a balance between these levels of explanation. Through implementing these frameworks, we hope to provide an essential interface for researchers to experiment with novel machine-learning techniques through the lens of what is currently understood about human cognition. Thus, our main goal is to build models of the brain which facilitate our understanding of how it works. Ideally, these models will be informed by both behavioural and neural data - and will in turn be used to inform experimental design for further data collection.
Steve Schockaert, School of Computer Science and Informatics - Semantic representation learning
Representation learning is the task of converting data from the raw form in which it is given into a form that is more suitable for machine learning models.
Given the popularity of deep learning, the most common approach is to learn a mapping from data items onto a fixed-dimensional vector. Such vector representations are commonly used in image processing and natural language processing in particular. A particular challenge, however, is to learn vector representations which are, in some sense, semantically meaningful. On the one hand, this is critical for enabling machine learning models which are interpretable, a property that is quickly becoming critical in many applications. On the other hand, semantically meaningful representations are needed to incorporate existing domain knowledge (e.g. provided by a domain expert or available from some knowledge base), and thus to inject knowledge into machine learning models.