Special Interest Groups
The Data Innovation Research Institute was open between 2015 and 2021. This page shows the Institute’s past work. It is not monitored or updated.
Our research can be applied to numerous applications in the fields of physics, astronomy and engineering.
Physics and astronomy
We will work collaboratively with the School of Physics and Astronomy who have existing research expertise in large-scale data analysis. We would build on and contribute to their experience specifically in:
- the analysis of gravitational and astronomical data from the Herschel Space Observatory operated by the European Space Agency (ESA) which currently produces data more rapidly than it can logisitically physically transport off site
- running simulations for major astronomical events such as the birth of stars to the merging of black holes
- handling and managing data analysis from the Square Kilometre Array (SKA) radio telescope as it increases in size and scale
- handling the increasing complexity and size of geological, meteorological and climate data sets which when coming from a variety of different sources, presents a problem in drawing sound conclusions from the information.
Data forms the backbone of all digital manufacturing technologies. Our work with the School of Engineering will focus on finding solutions to problems which include:
- the analysis and reduction of data sets to help produce efficient manufacturing, sustainable and climate friendly development
- providing secure manufacturing environments
- preventing digital and intellectual property crime.
Each of the Institute Ambassadors lead on a special interest group (SIG).
This could be methodological and/or application focussed within the broad area of data science/innovation, and is operated through the ambassador’s existing research networks and interest groups. Amongst other activities, SIGs are supported by the Institute to identify and collaborate on bid submissions and to organise events such as workshops and seminars.
Iskander Aliev, School of Mathematics - Discrete Compressed Sensing
We are interested in applications of discrete mathematics to the emerging problems in data science with a special focus on discrete compressed sensing.
Although the classical real-valued compressed sensing is a well-established research field of data science, there is a very little understanding of the role of discrete (eg finite or lattice valued) signals in this paradigm. This is especially surprising, given large number of applications in which the signal is known to have discrete-valued entries. For instance, this scenario applies in MIMO-channels, wireless communications and error-correcting codes. We aim to tackle research problems of discrete compressed sensing using modern methods of discrete and computational geometry.
Chris Arnold, School of Law and Politics - Computational social sciences
The SIG focuses on using data-driven technologies to support social science research. Recent years have seen dramatic advances in the capabilities to gather and analyse data. Algorithms and computing power continue to improve at a vast pace. Most importantly for social sciences, humanity collects data about individuals and societies at an unprecedented level.
These innovations are changing the research questions social scientists will be able to answer. At the same time, these innovations alter the way in which we do social research.
This SIG will create a forum to critically advance the possibilities that stem from the combination of data science and social sciences. We exchange experiences, develop new strands of research and foster collaboration within Social Sciences, but also with colleagues from other disciplines. Aware of the deep impact of these technology driven innovations, we also examine their use and implication for society.
We understand social sciences in very broad terms and seek to include anybody who is interested to study human behaviour at the meso- or macro-level; including disciplines like business administration, economics, ethnography, geography, political science, or sociology among others.
DIRI provides an excellent infrastructure to implement research projects that use structured as well as unstructured data. Our research covers a broad portfolio of projects. Some of us focus on analysing visual data. For example one project predicts the likelihood for voting fraud in remote rural areas in developing countries on the basis of satellite data. Other projects study language and text, like the spread of misinformation in online debates. Another group of people is working on data protection. A project explores how to generate synthetic micro-data to protect sensitive data sets.
The Computational Social Sciences SIG regularly meet for midday lunch on the first Wednesday of each month in the cafeteria of the University’s main building. If you’d like to get in touch, please feel free to contact Chris Arnold:
Clarice Bleil de Souza, Welsh School of Architecture - Simulation and scientific visualisation
This group aims to connect people working with different types of simulations and their related scientific visualisations from the built environment to the biological and social domains.
The SIG will discuss and exchange ideas about the different types of simulations used in different research areas and knowledge domains opening doors for cross fertilisation. It will also map and try and find commonalities and novel applications for the different common topics for transdisciplinary research such as:
- Common issues in validation of results post processing of large datasets from simulation results
- Uncertainty in simulations
- Connections between simulation and different AI methods to aid data analysis and potentially data input (neural networks)
Examples of simulations
- building performance (e.g. thermal, daylighting, acoustics, life cycle assessment)
- emergency related (e.g. evacuation, fire, disaster)
- urban related simulations (e.g. pedestrian flow, space syntax, urban heat islands, mobility)
- resilience and post disaster relief
- transport related simulations
- animal behaviour and ecosystems
- natural cycles and processes (e.g. erosion, run off)
Erminia Calabrese and Cosimo Inserra, School of Physics and Astronomy - Unveiling and predicting data patterns
The SIG focuses on the development and application of supervised and unsupervised learning methodologies on large datasets, aiming also at increasing performances when using cross-correlated datasets.
Astronomical datasets are the eldest big data collection we have and allow us to extract information on broad and diverse physics phenomena.
As such, astronomy is the perfect test-bed for any kind of data mining, manipulation, recognition and forecasting algorithms. Examples of areas falling within the remit of the SIG include analyses of time series, of sky images, and of astronomical objects.
Some of these exploitations are combined in a deep learning environment in the form of recurrent (and convolutional) neural networks—Cosimo’s area of interest. Others rely on application of Bayesian statistics and development of new algorithms improving comparison between large datasets and complex theories—Erminia’s expertise.
Maggie Chen, School of Maths - FinTech and Smart Finance
The FinTech and Smart Finance Special Interest Group (SIG) are a team of passionate academic experts, who believe in, and advocate, interdisciplinary research and academic-business collaborations worldwide. We have a plethora of expertise in the areas of mathematics/statistics, business/economics and computer science (such as machine learning). Using state of art technology and model frameworks, we contribute to both academic literature and theory, in addition to more practical challenges experienced within the financial industry, markets and regulators. Our work spans a wide spectrum of finance research including, but not limited to:
- Time series and stochastic modelling for finance
- Financial pricing theory and modelling
- Portfolio management
- Asset management
- Behavioural finance
- Financial markets and regulation
- Fraud detection
- Financial stability and reliability
- Financial networks and visualisation
- Market microstructure
- Trading and hedging etc
Our research focuses on their scientific foundation and processes, and recognises the importance of the economic, social and policy impact towards a wide range of user groups, including the general public to finance professionals.
We are dedicated to developing cutting-edge research in FinTech, we have expertise in: blockchain and cryptocurrency trading, AI and machine learning in finance, social trading network etc. We have a solid grounding in classic financial modelling, in addition to leading the financial applications of incorporating flexible and advantageous stochastic models, namely Hawkes processes, where we deal with more complex problems such as contagion and financial jumps. We also examine these areas from the perspective of behaviour finance and decision theories. One of the examples is to model the dynamics between investment sentiment and complex financial market through the entropy-based new modeling framework.
The collected vision of the group is to become further established in the area of FinTech and explore broader topics such as green finance, de-financing network, financial inclusion, knowledge management in the complex financial system and more. Our hope is to continue growing strong interdisciplinary research projects across computer science, business and management. We also aim to lead various innovative initiatives of collaboration with world renowned research institutes and organisations such as NSF (National Science Foundation, USA), Innovative UK and CFTC (Commodity Futures Trading Committee). Ultimately, we expect our research outcomes to be deployed to benefit network organisations such as FinTech Wales, FinTech companies directly and the general public.
Rhian Daniel, School of Medicine - Statistical Methodology
The Statistical Methodology SIG exists to promote the application and development of new statistical ideas. Novel methodology often arises directly from a desire to improve the depth or scope of a scientific investigation but may equally be motivated by simple curiosity.
Research in this field is distinguished from applied data science in seeking to use mathematical thinking to resolve an extended family of related problems. Examples of areas falling within the remit of the SIG include (but are of course not limited to!) time series analysis, classical biostatistics, experimental design, causal inference, machine learning, and extreme value theory.
Daniel Gartner, School of Mathematics - Healthcare analytics
The Healthcare Analytics SIG seeks to improve the understanding of the role that data innovation plays in health care systems around the globe.
Through this understanding the SIG will identify data science techniques that contribute to the efficient and effective delivery of quality care.
The SIG will foster collaborations in research and facilitate training and dialog among key health care stakeholders, including researchers, health care providers, decision-makers, policy-makers, and patients.
Dawn Knight, School of English, Communication and Philosophy - Corpus, discourse and text analysis
This SIG focuses on the development, use and applications of corpus-based methodologies, including:
- Development and applications of NLP and corpus methodologies in real-world contexts
- This SIG focuses on the development, use and applications of corpus-based methodologies, and their integration with NLP approaches, including:
- New methods and techniques in corpus development, annotation, visualisation and analysis
- New tools and techniques developed in corpus-based computational linguistics and NLP
- Critical explorations of existing measures and methods in corpus linguistics, and their integration with NLP approaches
- Advances in quantitative techniques for corpus exploration
Penny Lewis, School of Psychology - Computational modelling of human brain function
This SIG plans to approach the topic from all angles, from abstract connectionism to biologically plausible deep spiking networks. We are also exploring neural simulator frameworks that aim to strike a balance between these levels of explanation. Through implementing these frameworks, we hope to provide an essential interface for researchers to experiment with novel machine-learning techniques through the lens of what is currently understood about human cognition. Thus, our main goal is to build models of the brain which facilitate our understanding of how it works. Ideally, these models will be informed by both behavioural and neural data - and will in turn be used to inform experimental design for further data collection.
Scott Orford, School of Geography and Planning - Spatial Analysis
The SIG focuses on the use of spatial analysis in understanding the world around us. The past decade has seen a substantial increase in the quantity, quality and availability of data encoded with a spatial reference allowing the data to be analysed from a spatial perspective.
Often these data are associated with disciplines that do not have a tradition of spatial analysis, such as the humanities, introducing new and exciting opportunities for research. At the same time, new forms of spatial data are driving research innovations within disciplines with a long history of spatial analysis.
Hence this SIG is multi-disciplinary in nature and is concerned with both methodological innovation and substantive research in different application domains, including other SIGs in the Research Institute. Current research is focused in the following areas:
- Investigating new forms of data - such as mobile phone records, social media records, sensor network data records, Internet of Things - and their use in spatial analysis research, especially in relation to Smart Cities and urban analytics.
- Urban Morphometrics and the quantification and measurement of the built environment across different spatial scales. Research domain applications include public health, planning and urban design, land and housing economics, and land-use and transport planning.
- Conceptualising, quantifying and applying measures of accessibility to amenities and services in order to reduce inequalities in provision. It includes understanding how different approaches to measuring accessibility can affect analysis and interpretation in different application domains. Research includes accessibility to green/blue spaces in relation to health and wellbeing, sport and leisure facilities, financial services, and health care provision.
- The development of spatial data portals, dashboards and web-mapping software, especially in relation to spatial meta-data, spatial data linkage, APIs and automated and interactive mapping and map mash-ups. There is a focus on developing software that makes mapping on-line spatial datasets easier and facilitates the use of such data in policy and practice.
- The opportunities and barriers in undertaking spatial analysis using micro-data and linked data in safe and secure settings. This can be in relation to the quality of spatial data, spatial data linkage, issues of geo-privacy and disclosure, training and software provision, and ethics.
- Mapping and spatial analysis of qualitative data, building on the work of Qualitative and Mixed Methods Geographic Information Systems (GIS).
- Mobile methods as a way of collecting and contextualising qualitative data records about space and place.
Steve Schockaert, School of Computer Science and Informatics - Semantic representation learning
Representation learning is the task of converting data from the raw form in which it is given into a form that is more suitable for machine learning models.
Given the popularity of deep learning, the most common approach is to learn a mapping from data items onto a fixed-dimensional vector. Such vector representations are commonly used in image processing and natural language processing in particular. A particular challenge, however, is to learn vector representations which are, in some sense, semantically meaningful. On the one hand, this is critical for enabling machine learning models which are interpretable, a property that is quickly becoming critical in many applications. On the other hand, semantically meaningful representations are needed to incorporate existing domain knowledge (e.g. provided by a domain expert or available from some knowledge base), and thus to inject knowledge into machine learning models.