Statistics Research Group
The group is very active both in applications of statistical techniques and in theory.
The main areas of research within the current group are:
 time series analysis
 multivariate data analysis
 applications to market research
 search algorithms and stochastic global optimisation
 probabilistic number theory
 optimal experimental design
 stochastic processes and random fields with weak and strong dependence
 diffusion processes and PDE with random data
 anomalous diffusion
 Burgers and KPZ turbulence, fractional ordinary and PDE, and statistical inference with higherorder information
 extreme value analysis.
Various topics in fisheries and medical statistics are also considered, such as errors in variables regression.
Collaborations
Statisticians within the School have been prominent in collaborating with researchers in other disciplines. There are strong links with:
 the School of Medicine, working on applications of multivariate statistics and time series analysis in bioinformatics
 the School of Engineering, in the areas of image processing and stochastic global optimisation of complex systems
 the Business School, in the field of analysis of economics time series.
Ongoing international collaborations exist with many Universities including Columbia, Taiwan, Queensland, Aarhus, Roma, Cleveland, Pau, Hokkaido, Boston, Caen, Calambria, Maine, Trento, Nice, Bratislava, Linz, St.Petersburg, Troyes, Vilnius, Siegen, Mannheim, and Copenhagen.
Industrial sponsorship
Significant industrial sponsorship has been obtained from:
 Procter and Gamble (USA) working on statistical modelling in market research
 the Biometrics unit of SmithKline Beecham collaborating on different aspects of pharmaceutical statistics
 ACNielsen/BASES (USA) on applications of mixed Poisson models in studying marketing consumer behaviour
 General Electric HealthCare on environmental statistics.
Our main areas of research within the current group are:
 time series analysis
 multivariate data analysis
 applications to market research
 search algorithms and stochastic global optimisation
 probabilistic number theory
 optimal experimental design
 stochastic processes and random fields with weak and strong dependence
 diffusion processes and PDE with random data
 anomalous diffusion
 Burgers and KPZ turbulence
 fractional ordinary and PDE, and statistical inference with higherorder information.
In focus
Time series analysis
In recent years a powerful technique of time series analysis has been developed and applied to many practical problems. This technique is based on the use of the Singularvalue decomposition of the socalled trajectory matrix obtained from the initial time series by the method of delays. It is aimed at an expansion of the original time series into a sum of a small number of 'independent' and 'interpretable' components.
Also, the spatial analogies of the popular ARMA type stochastic time series have been developed based on the fractional generalizations of the Laplacian with two fractal indices. These models describe important features of processes of anomalous diffusions such as strong dependence and/or intermittency.
Multivariate statistics
The objective is development of a methodology of exploratory analysis of temporalspatial data of complex structure with the final aim of construction of suitable parametric models.
The applications include various medical, biological, engineering and economical data. Several market research projects where the development of statistical models was a substantial part have taken place.
Stochastic global optimisation
Let ƒ be a function given on an ddimensional compact set X and belonging to a suitable functional class F of multiextremal continuous functions.
We consider the problem of its minimization, that is approximation of a point x' such that ƒ(x')=min ƒ(x), using evaluations of ƒ at specially selected points.
Probabilistic methods in search and number theory
Several interesting features of the accuracy of diophantine approximations can be expressed in probabilistic terms.
Many diophantine approximation algorithms produce a sequence of sets F(n), indexed by n, of rational numbers p/q in [0,1]. Famous examples of F(n) are the Farey sequence, the collection of rationals p/q in [0,1] with q<=n, and the collection of all nth continued fraction convergents.
Stochastic processes
New classes of stochastic processes with student distributions and various types of dependence structure have been introduced and studied. A particular motivation is the modelling of risk assets with strong dependence through fractal activity time.
The asymptotic theory of estimation of parameters of stochastic processes and random fields has been developed using higherorder information (that is, information on the higherorder cumulant spectra). This theory allows analysis of nonlinear and nonGaussian models with both short and longrange dependence.
Burgers turbulence problem
Explicit analytical solutions of Burgers equation with quadratic potential has been derived and used to handle scaling laws results for the Burgers turbulence problem with quadratic potential and random initial conditions of OrnsteinUhlenbeck type driven by Levy noise.
Results have considerable potential for stochastic modelling of observational series from a wide range of fields, such as turbulence or anomalous diffusion.
Topics in medical statistics
A number of topics that have been associated with medical statistics presently researched in Cardiff include timespecific reference ranges, and errors in variables regression. Current research focuses on the search for a unified methodology and approach to the errors in variables problem.
Extreme Value Analysis
Extreme value analysis is a branch of probability and statistics that provides nonparametric procedures for extrapolation beyond the range of data (as good as possible and depending on the quality of data, knowing the limits is also an important issue). Its methods are usually relevant for institutions that are exposed to high risks, for instance, financial services and insurance companies or environmental engineering institutions.
Group leader
Professor Anatoly Zhigljavsky
Chair in Statistics
 Email:
 zhigljavskyaa@cardiff.ac.uk
 Telephone:
 +44 (0)29 2087 5076
Academic staff
Dr Andreas Artemiou
Senior Lecturer in Statistics
 Email:
 artemioua@cardiff.ac.uk
 Telephone:
 +44 (0)29 2087 0616
Dr Jonathan Gillard
Reader in Statistics
Director of Admissions
 Email:
 gillardjw@cardiff.ac.uk
 Telephone:
 +44 (0)29 2087 0619
Dr Andrey Pepelyshev
Senior Lecturer
 Email:
 pepelyshevan@cardiff.ac.uk
 Telephone:
 +44 (0)29 2087 5530
All seminars will commence at 12:10pm in room M/0.34, The Mathematics Building, Cardiff University, Senghennydd Road (unless otherwise stated).
Please contact Dr Timm Oertel for more details regarding Operational Research/WIMCS lectures and Bertrand Gauthier and Kirstin Strokorb for more details regarding Statistics lectures.
Seminars
Date  Speaker  Seminar 

9 December 2019  Ruth Misener (Imperial College)  Scoring positive semidefinite cutting planes for quadratic optimization via trained neural networks Semidefinite programming relaxations complement polyhedral relaxations for quadratic optimization, but global optimization solvers built on polyhedral relaxations cannot fully exploit this advantage. We develop linear outerapproximations of semidefinite constraints that can be effectively integrated into global solvers for nonconvex quadratic optimization. The difference from previous work is that our proposed cuts are (i) sparser with respect to the number of nonzeros in the row and (ii) explicitly selected to improve the objective. A neural network estimator is key to our cut selection strategy: ranking each cut based on objective improvement involves solving a semidefinite optimization problem, but this is an expensive proposition at each Branch&Cut node. The neural network estimator, trained a priori of any instance to solve, takes the most time consuming computation offline by predicting the objective improvement for any cut. This is joint work with Radu BalteanLugojan, Pierre Bonami, and Andrea Tramontani. 
11 November 2019  Enrica Pirozzi (University of Naples)  On a Fractional OrnsteinUhlenbeck Process and its applications The seminar is centred on a fractional OrnsteinUhlenbeck process that is solution of a linear stochastic differential equation, driven by a fractional Brownian motion; it is also characterised by a stochastic forcing term in the drift. For such a process, mean and covariance functions will be specified, concentrating on their asymptotic behaviour. A sort of short or longrange dependence, under specified hypotheses on the covariance of the forcing process, will be shown. Applications of this process in neuronal modelling are discussed, providing an example of a stochastic forcing term as a linear combination of Heaviside functions with random center. Simulation algorithms for the sample path of this process are also given. 
21 October 2019  TriDung Nguyen (University of Southampton)  Game of Banks – Keeping Free ATMs Alive? The LINK ATM network is a fundamental part of the UK's payments infrastructure  with nearly 62,000 ATMs  and cash machines are by far the most popular channel for cash withdrawal in the UK, used by millions of consumers every week. The record high daily withdrawal in 2019 was 10.7 million ATM transactions (29 March) and with over half a billion pounds paid out by ATMs (28 June). The UK's cash machine network is special in that most of them are currently free of charge. Underlying this key feature is the arrangement among the banks and cash machine operators to settle the fees among themselves instead of putting the burden on the consumers' shoulders. The ATM network in the UK has recently, however, been experiencing many issues as some members are not happy with the mechanism for interchange fee sharing. In this talk, we show how Game Theory, especially how to combine mathematical models developed by John Nash and Lloyd Shapley, two Nobel laureates in Economics, to resolve the current ATM crisis. We present a novel `coopetition' game theoretic model for banks to optimally invest in the ATM network and to share the cost. This coopetition game includes both a cooperative game theory framework as the mechanism for interchange fee sharing and a noncooperative counterpart to model the fact that banks also wish to maximise their utilities. We show that the current mechanism for sharing is unstable, which explains why some members are threatening to leave. We also show that, under some settings, the Shapley allocation belongs to the core and hence it is not only fair to all members but also leads to a stable ATM network. We prove the existence of a pure Nash equilibrium, which can be computed efficiently. In addition, we show that the Shapley value allocation dominates the current mechanism in terms of social welfare. Finally, we provide numerical analysis and managerial insights through a case study using real data on the complete UK ATM network. 
14 October 2019  Ruth King (University of Edinburgh)  Challenges of quantity versus complexity for ecological data Capturerecapture data are often collected on animal populations to obtain insight into the given species and/or ecosystem. Longterm datasets combined with new technology for observing individuals are producing larger capturerecapture datasets – for example, repeated observations on >10,000 individuals are becoming increasingly common. Simultaneously, increasingly complex models are being developed to more accurately represent the underlying biological processes which permit a more intricate understanding of the system. However, fitting these more complex models to large datasets can become computationally very expensive. We propose a two step Bayesian approach: (i) fit the given capturerecapture model to a smaller subsample of the data; and then (ii) “correct” the posterior obtained so that it is (approximately) from the posterior distribution of the complete sample. For a feasibility study we apply this twostep approach to data from a colony of guillemots where there are approximately 30,000 individuals observed within the capturerecapture study and investigate the performance of the algorithm. 
7 October 2019  George Loho (LSE)  To be confirmed 
30 September 2019  Rajen Shah (University of Cambridge)  RSVPgraphs: Fast Highdimensional Covariance Matrix Estimation Under Latent Confounding We consider the problem of estimating a highdimensional p × p covariance matrix S, given n observations of confounded data with covariance S + GG^T , where G is an unknown p × q matrix of latent factor loadings. We propose a simple and scalable estimator based on the projection on to the right singular vectors of the observed data matrix, which we call RSVP. Our theoretical analysis of this method reveals that in contrast to approaches based on removal of principal components, RSVP is able to cope well with settings where the smallest eigenvalue of G^T G is relatively close to the largest eigenvalue of S, as well as when eigenvalues of G^T G are diverging fast. RSVP does not require knowledge or estimation of the number of latent factors q, but only recovers S up to an unknown positive scale factor. We argue this suffices in many applications, for example if an estimate of the correlation matrix is desired. We also show that by using subsampling, we can further improve the performance of the method. We demonstrate the favourable performance of RSVP through simulation experiments and an analysis of gene expression datasets collated by the GTEX consortium. 
22 August 2019 Time:11:10 to 12:00 Room M/2.06  Dr. Mofei Jia, Xi'an (JiaotongLiverpool University, China)  Curbing the Consumption of Positional Goods: Behavioural Interventions versus Taxation Little is known whether behavioural techniques, such as nudges, can serve as effective policy tools to reduce the consumption of positional goods. We study a game, in which individuals are embedded in a social network and compete for a positional advantage with their direct neighbours by purchasing a positional good. In a series of experiments, we test four policy interventions to curb the consumption of the positional good. We manipulate the type of the intervention (either a nudge or a tax) and the number of individuals exposed to the intervention (either the most central network node or the entire network). We illustrate that both the nudge and the tax can serve as effective policy instruments to combat positional consumption if the entire network is exposed to the intervention. Nevertheless, taxing or nudging the most central network node does not seem to be equally effective because of the absence of spillover effects from the center to the other nodes. As for the mechanism through which the nudge operates, our findings are consistent with an explanation where nudging increases the psychological cost of the positional consumption. 
18 July 2019 Time:11:10 to 12:00 Room M/2.06  Nina Golyandina  Detecting signals by Monte Carlo singular spectrum analysis: the problem of multiple testing The statistical approach to detection of a signal in noisy series is considered in the framework of Monte Carlo singular spectrum analysis. This approach contains a technique to control both type I and type II errors and also compare criteria. For simultaneous testing of multiple frequencies, a multiple version of MCSSA is suggested to control the familywise error rate. 
1 July 2019 Room M/0.40  Dr. Joni Virta (University of Aalto)  Statistical properties of secondorder tensor decompositions Two classical tensor decompositions are considered from a statistical viewpoint: the Tucker decomposition and the higher order singular value decomposition (HOSVD). Both decompositions are shown to be consistent estimators of the parameters of a certain noisy latent variable model. The decompositions' asymptotic properties allow comparisons between them. Also inference for the true latent dimension is discussed. The theory is illustrated with examples. 
8 April 2019  Dr. Andreas Anastasiou (LSE)  Detecting multiple generalized changepoints by isolating single ones In this talk, we introduce a new approach, called IsolateDetect (ID), for the consistent estimation of the number and location of multiple generalized changepoints in noisy data sequences. Examples of signal changes that ID can deal with, are changes in the mean of a piecewiseconstant signal and changes in the trend, accompanied by discontinuities or not, in the piecewiselinear model. The method is based on an isolation technique, which prevents the consideration of intervals that contain more than one changepoint. This isolation enhances ID’s accuracy as it allows for detection in the presence of frequent changes of possibly small magnitudes. Thresholding and model selection through an information criterion are the two stopping rules described in the talk. A hybrid of both criteria leads to a general method with very good practical performance and minimal parameter choice. Applications of our method on simulated and reallife data sets show its very good performance in both accuracy and speed. The R package IDetect implementing the IsolateDetect method is available from CRAN. 
1 April 2019  Stephen Disney (Cardiff University)  When the Bullwhip Effect is an Increasing Function of the Lead Time We study the relationship between lead times and the bullwhip effect produced by the orderupto policy. The usual conclusion in the literature is that longer leadtime increase the bullwhip effect, we show that this is not always the case. Indeed, it seems to be rather rare. We achieve this by first showing that a positive demand impulse response leads to an always increasing in the lead time bullwhip effect when the orderupto policy is used to make supply chain inventory replenishment decisions. By using the zeros and poles of the ztransform of the demand process, we reveal when this demand impulse is positive. To make concrete our approach in a nontrivial example we study the ARMA(2,2) demand process. 
22 March 2019  Martina Testori (University of Southampton)  How group composition affects cooperation in fixed networks: can psychopathic traits influence group dynamics? Static networks have been shown to foster cooperation for specific costbenefit ratios and numbers of connections across a series of interactions. At the same time, psychopathic traits have been discovered to predict defective behaviours in game theory scenarios. This experiment combines these two aspects to investigate how group cooperation can emerge when changing group compositions based on psychopathic traits. We implemented a modified version of the Prisoner’s Dilemma game which has been demonstrated theoretically and empirically to sustain a constant level of cooperation over rounds. A sample of 190 undergraduate students played in small groups where the percentage of psychopathic traits in each group was manipulated. Groups entirely composed of low psychopathic individuals were compared to communities with 50% high and 50% low psychopathic players, to observe the behavioural differences at the group level. Results showed a significant divergence of the mean cooperation of the two conditions, regardless of the small range of participants’ psychopathy scores. Groups with a large density of high psychopathic subjects cooperated significantly less than groups entirely composed of low psychopathic players, confirming our hypothesis that psychopathic traits affect not only individuals’ decisions but also the group behaviour. This experiment highlights how differences in group composition with respect to psychopathic traits can have a significant impact on group dynamics, and it emphasizes the importance of individual characteristics when investigating group behaviours. 
18  Joe Paat  The proximity function for IPs Proximity between an integer program (IP) and a linear program (LP) measures the distance between an optimal IP solution and the closest optimal LP solution. In this talk, we consider proximity as a function that depends on the right hand side vector of the IP and LP. We analyze how this proximity function is distributed and create a spectrum of probabilisticlike results regarding its value. This work uses ideas from group theory and Ehrhart theory, and it improves upon a recent result of Eisenbrand and Weismantel in the average case. This is joint work with Timm Oertel and Robert Weismantel. The proximity functions for IPs. 
15 March 2019  Prof Philip Broadbridge (La Trobe University)  Shannon entropy as a diagnostic tool for PDEs in conservation form After normalization, an evolving real nonnegative function may be viewed as a probability density. From this we may derive the corresponding evolution law for Shannon entropy. Parabolic equations, hyperbolic equations and fourthorder “diffusion” equations evolve information in quite different ways. Entropy and irreversibility can be introduced in a selfconsistent manner and at an elementary level by reference to some simple evolution equations such as the linear heat equation. It is easily seen that the 2nd law of thermodynamics is equivalent to loss of Shannon information when temperature obeys a general nonlinear 2nd order diffusion equation. With the constraint of prescribed variance, this leads to the central limit theorem. With fourth order diffusion terms, new problems arise. We know from applications such as thin film flow and surface diffusion, that fourth order diffusion terms may generate ripples and they do not satisfy the Second Law. Despite this, we can identify the class of fourth order quasilinear diffusion equations that increase the Shannon entropy. 
4 March 2019  Dr. Emrah Demir (Cardiff Business School)  Creating Green Logistics Value through Operational Research Green logistics is related to producing and dispatching goods in a sustainable way, while playing attention to environmental factors. In a green context, the objectives are not only based on economic considerations, but also aim at minimising other detrimental effects on society and on the environment. A conventional focus on planning the associated activities, particularly for the freight transportation, is to reduce expenses and, consequently, increase profitability by considering internal transportation costs. With an evergrowing concern about the environment by governments, markets, and other private entities worldwide, organizations have started to realize the importance of the environmental and social impacts associated with transportation on other parties or the society. Efficient planning of freight transportation activities requires a comprehensive look at wide range of factors in the operation and management of transportation to achieve safe, fast, and environmentally suitable movement of goods. Over the years, the minimization of the total travelled distance has been accepted as the most important objective in the field of vehicle routing and intermodal transportation. However, the interaction of operational research with mechanical and traffic engineering shows that there exist factors which are critical to explain fuel consumption. This triggered the birth of the green vehicle routing and green intermodal studies in operational research. In recent years, the number, quality and the flexibility of the models have increased considerably. This talk will discuss green vehicle routing and green intermodal transportation problems along with models and algorithms which truly represent the characteristics of green logistics. 
25  Oded Lachish (Birkbeck, University of London)  Smart queries versus property independent queries In the area of property testing, a central goal is to design algorithms, called tests, that decide, with high probability, whether a word over a finite alphabet is in a given property or far from the property. A property is a subset of all the possible words over the alphabet. For instance, the word can be a book, and the property can be the set of all the books that are written in English  a book is 0.1 far from being written in English if at least 0.1 of its words are not in English. The 0.1 is called the distance parameter and it can be any value in [0,1]. The input of a test is the distance parameter, the length of the input word and access to an oracle that answers queries of the sort: please give me the i'th letter in the word. The quality of a test is measured by it query complexity, which is the maximum number of queries it uses as a function of the input word length and the distance parameter, ideally this number does not depend on the input length. Tests that achieve this ideal for specific properties have been discovered for numerous properties. In general, tests that achieve the ideal for different properties differ in the manner in which they select their queries. That is, the choice of queries depends on the property. In this talk, we will see that for the price of a significant increase in the number of queries it is possible to get rid of this dependency. We will also give scenarios in which this tradeoff is beneficial. 
18 February 2019 (Time 13:10  14:00)  Prof. Giles Stupfler (University of Nottingham)  Asymmetric least squares techniques for extreme risk estimation Financial and actuarial risk assessment is typically based on the computation of a single quantile (or ValueatRisk). One drawback of quantiles is that they only take into account the frequency of an extreme event, and in particular do not give an idea of what the typical magnitude of such an event would be. Another issue is that they do not induce a coherent risk measure, which is a serious concern in actuarial and financial applications. In this talk, I will explain how, starting from the formulation of a quantile as the solution of an optimisation problem, one may come up with two alternative families of risk measures, called expectiles and extremiles. I will give a broad overview of their properties, as well as of their estimation at extreme levels in heavytailed models, and explain why they constitute sensible alternatives for risk assessment using some real data applications. This is based on joint work with Abdelaati Daouia, Irène Gijbels and Stéphane Girard. 
21 January 2019  Stefano Coniglio (University of Southampton)  Bilevel programming and the computation of pessimistic singleleadermultifollower equilibria in Stackelberg games We give a very broad overview of bilevel programming problems and their relationship with Stackelberg games, with focus on two classical limitations of this paradigm: the presence of a single follower and the assumption of optimism.

11 December 2018  Anatoly Zhigljavsky (University of Cardiff)  Multivariate dispersion 
3 December 2018  Dr Ilaria Prosdocimi (University of Bath)  Detecting coherent changes in flood risk in Great Britain Flooding is a natural hazard which has affected the UK throughout history, with significant costs for both the development and maintenance of flood protection schemes and for the recovery of the areas affected by flooding. The recent large repeated floods in Northern England and other parts of the country raise the question of whether the risk of flooding is changing, possibly as a result of climate change, so that different strategies would be needed for the effective management of flood risk. To assess whether any change in flood risk can be identified, one would typically investigate the presence of some changing patterns in peak flow records for each station across the country. Nevertheless, the coherent detection of any clear pattern in the data is hindered by the limited sample size of the peak flow records, which typically cover about 45 years. We investigate the use of multilevel hierarchical models to better use the information available at all stations in a unique model which can detect the presence of any sizeable change in the peak flow behaviour at a larger scale. Further, we also investigate the possibility of attributing any detected change to naturally varying climatological variables. 
26  Prof Benjamin Gess (Max Planck Institute)  Random dynamical systems for stochastic PDE with nonlinear noise In this talk we will revisit the problem of generation of random dynamical systems by solutions to stochastic PDE. Despite being at the heart of a dynamical system approach to stochastic dynamics in infinite dimensions, most known results are restricted to stochastic PDE driven by affine linear noise, which can be treated via transformation arguments. In contrast, in this talk we will address instances of stochastic PDE with nonlinear noise, with particular emphasis on porous media equations driven by conservative noise. This class of stochastic PDE arises in particular in the analysis of stochastic mean curvature motion, mean field games with common noise and is linked to fluctuations in nonequilibrium statistical mechanics. 