Statistics Research Group
The group is very active both in applications of statistical techniques and in theory.
The main areas of research within the current group are:
 time series analysis
 multivariate data analysis
 applications to market research
 search algorithms and stochastic global optimisation
 probabilistic number theory
 optimal experimental design
 stochastic processes and random fields with weak and strong dependence
 diffusion processes and PDE with random data
 anomalous diffusion
 Burgers and KPZ turbulence;fractional ordinary and PDE, and statistical inference with higherorder information
 extreme value analysis.
Various topics in fisheries and medical statistics are also considered, such as errors in variables regression.
Collaborations
Statisticians within the School have been prominent in collaborating with researchers in other disciplines. There are strong links with the School of Medicine working on applications of multivariate statistics and time series analysis in bioinformatics; with the School of Engineering in the areas of image processing and stochastic global optimisation of complex systems; and with the Business School in the field of analysis of economics time series.
Ongoing international collaborations exist with many Universities including Columbia, Taiwan, Queensland, Aarhus, Roma, Cleveland, Pau, Hokkaido, Boston, Caen, Calambria, Maine, Trento, Nice, Bratislava, Linz, St.Petersburg, Troyes, Vilnius, Siegen, Mannheim, and Copenhagen.
Industrial sponsorship
Significant industrial sponsorship has been obtained from:
 Procter and Gamble (USA) working on statistical modelling in market research
 the Biometrics unit of SmithKline Beecham collaborating on different aspects of pharmaceutical statistics
 ACNielsen/BASES (USA) on applications of mixed Poisson models in studying marketing consumer behaviour
 General Electric HealthCare on environmental statistics.
Our main areas of research within the current group are:
 time series analysis
 multivariate data analysis
 applications to market research
 search algorithms and stochastic global optimisation
 probabilistic number theory
 optimal experimental design
 stochastic processes and random fields with weak and strong dependence
 diffusion processes and PDE with random data
 anomalous diffusion
 Burgers and KPZ turbulence
 fractional ordinary and PDE, and statistical inference with higherorder information.
In focus
Time series analysis
In recent years a powerful technique of time series analysis has been developed and applied to many practical problems. This technique is based on the use of the Singularvalue decomposition of the socalled trajectory matrix obtained from the initial time series by the method of delays. It is aimed at an expansion of the original time series into a sum of a small number of 'independent' and 'interpretable' components.
Also, the spatial analogies of the popular ARMA type stochastic time series have been developed based on the fractional generalizations of the Laplacian with two fractal indices. These models describe important features of processes of anomalous diffusions such as strong dependence and/or intermittency.
Multivariate statistics
The objective is development of a methodology of exploratory analysis of temporalspatial data of complex structure with the final aim of construction of suitable parametric models.
The applications include various medical, biological, engineering and economical data. Several market research projects where the development of statistical models was a substantial part have taken place.
Stochastic global optimisation
Let ƒ be a function given on an ddimensional compact set X and belonging to a suitable functional class F of multiextremal continuous functions.
We consider the problem of its minimization, that is approximation of a point x' such that ƒ(x')=min ƒ(x), using evaluations of ƒ at specially selected points.
Probabilistic methods in search and number theory
Several interesting features of the accuracy of diophantine approximations can be expressed in probabilistic terms.
Many diophantine approximation algorithms produce a sequence of sets F(n), indexed by n, of rational numbers p/q in [0,1]. Famous examples of F(n) are the Farey sequence, the collection of rationals p/q in [0,1] with q<=n, and the collection of all nth continued fraction convergents.
Stochastic processes
New classes of stochastic processes with student distributions and various types of dependence structure have been introduced and studied. A particular motivation is the modelling of risk assets with strong dependence through fractal activity time.
The asymptotic theory of estimation of parameters of stochastic processes and random fields has been developed using higherorder information (that is, information on the higherorder cumulant spectra). This theory allows analysis of nonlinear and nonGaussian models with both short and longrange dependence.
Burgers turbulence problem
Explicit analytical solutions of Burgers equation with quadratic potential has been derived and used to handle scaling laws results for the Burgers turbulence problem with quadratic potential and random initial conditions of OrnsteinUhlenbeck type driven by Levy noise.
Results have considerable potential for stochastic modelling of observational series from a wide range of fields, such as turbulence or anomalous diffusion.
Topics in medical statistics
A number of topics that have been associated with medical statistics presently researched in Cardiff include timespecific reference ranges, and errors in variables regression. Current research focuses on the search for a unified methodology and approach to the errors in variables problem.
Extreme Value Analysis
Extreme value analysis is a branch of probability and statistics that provides nonparametric procedures for extrapolation beyond the range of data (as good as possible and depending on the quality of data, knowing the limits is also an important issue). Its methods are usually relevant for institutions that are exposed to high risks, for instance, financial services and insurance companies or environmental engineering institutions.
Group leader
Professor Anatoly Zhigljavsky
Chair in Statistics
 Email:
 zhigljavskyaa@cardiff.ac.uk
 Telephone:
 +44 (0)29 2087 5076
Academic staff
Dr Andreas Artemiou
Lecturer
 Email:
 artemioua@cardiff.ac.uk
 Telephone:
 +44 (0)29 2087 0616
Dr Bertrand Gauthier
Lecturer
 Email:
 gauthierb@cardiff.ac.uk
 Telephone:
 +44(0)29 2087 5544
Dr Jonathan Gillard
Senior Lecturer in Statistics / Director of Admissions
 Email:
 gillardjw@cardiff.ac.uk
 Telephone:
 +44 (0)29 2087 0619
Professor Nikolai Leonenko
Professor
 Email:
 leonenkon@cardiff.ac.uk
 Telephone:
 +44 (0)29 2087 5521
Dr Andrey Pepelyshev
Lecturer
 Email:
 pepelyshevan@cardiff.ac.uk
 Telephone:
 +44 (0)29 2087 5530
Dr Kirstin Strokorb
Lecturer
 Email:
 strokorbk@cardiff.ac.uk
 Telephone:
 02920 688833
All seminars will commence at 12:10pm in room M/0.34, The Mathematics Building, Cardiff University, Senghennydd Road (unless otherwise stated).
Please contact Dr Iskander Aliev for more details regarding Operational Research/WIMCS lectures and Dr Andrey Pepelyshev for more details regarding Statistics lectures.
Seminars
Date  Speaker  Seminar 

10 July 2017  Dr Kirstin Strokorb (Cardiff School of Mathematics)  Stability and Dependence in Extreme Value Theory Extreme value theory is a branch of probability and statistics that aims to provides theoretically sound procedures for extrapolation beyond the range of available data (and also to understand the limits of such procedures). In the first part of this talk I will explain: (i) why stability properties such as maxstability or thresholdstability are naturally considered as modelling assumptions to justify two nonparametric approaches to such problems In the second part I will present joint work with Ilya Molchanov, where we identify some connections between Extreme value theory, Random sets and Properties of risk measures enabling us to recover and extend several results from the literature from a unifying perspective. 
5 July 2017  Dr Tom Beach (Cardiff School of Engineering)  TBA 
14 June 2017  Dmitrii Silvestrov (Stockholm)  Asymptotic Expansions for Stationary and QuasiStationary Distributions of Nonlinearly Perturbed Markov Chains and semiMarkov Processes New algorithms for computing asymptotic expansions for stationary and quasistationary distributions of nonlinearly perturbed Markov chains and semiMarkov processes are presented. The algorithms are based on special techniques of sequential phase space reduction and some kind of “operational calculus” for Laurent asymptotic expansions applied to moments of hitting times for perturbed semiMarkov processes. These algorithms have a universal character. They can be applied to nonlinearly perturbed semiMarkov processes with an arbitrary asymptotic communicative structure of the phase space. Asymptotic expansions are given in two forms, without and with explicit bounds for remainders. The algorithms are computationally effective, due to a recurrent character of the corresponding computational procedures. The related references are [12]. References [1] Gyllenberg, M., Silvestrov, D.S. (2008). QuasiStationary Phenomena in Nonlinearly Perturbed Stochastic Systems. De Gruyter Expositions in Mathematics, 44, Walter de Gruyter, Berlin, ix+579 pp. [2] Silvestrov, D., Silvestrov, S. (2016). Asymptotic expansions for stationary distributions of perturbed semiMarkov processes. In: Silvestrov, S., Rancic, M. (Eds). Engineering Mathematics II. Algebraic, Stochastic and Analysis Structures for Networks, Data Classification and Optimization, Chapter 10. Springer Proceedings in Mathematics & Statistics 179, Springer, Cham, 151  222. 
12 June 2017  Nikita Zvonarev (St. Petersburg State University)  Fisher Scoring in LowRank Signal Estimation The problem of signal estimation is considered. We suggest an algorithm of construction of MLE/LS signal estimation. The algorithm is based on lowrank approximation and suggests a stable solution of the corresponding optimization problem. 
31 May 2017  Prof. Anatoly Zhigljavsky (Cardiff University)  Simplicial variances, distances and correlations with a view towards big data I will describe a brandnew approach for defining variances, distances and correlations for highdimensional data and argue that the new concepts may have serious advantages over the classical ones, especially if points of a multivariate sample lie close to a linear subspace. 
10 May 2017  Martin Kunc (Warwick)  TBA 
19 April 2017  Martin Lotz (University of Manchester)  Randomized dimension reduction in optimization and topological data analysis This talk reviews some recent and current work on extracting information from random projections of data. We first discuss a phase transition phenomenon in convex optimization: the probability that convex regularization succeeds in recovering a signal from few random measurements jumps sharply from zero to one as the number of measurements exceeds a certain threshold. After explaining the geometric and probabilistic ideas behind this phenomenon, we show how they can also be applied to problems in other fields such as topological data analysis. 
22 March 2017  James Woodcock (Cambridge)  Modelling active transport and health: state of the science and future directions The last decade have seen a growth of studies modelling how a shift to active transport can affect population health. Results have generally shown the potential for substantial benefits but uncertainties remain around impacts for different subgroups and on the generalisability of the results to lower and middle income settings. Correspondingly as the science matures there is a move to more rigorous methods and use of new data. In this talk I will consider how research is addressing these issues with a focus on how we model population physical activity exposures and risks, and how we estimate changes in injury risk with changes in active travel volume. 
Past seminars
Date  Speaker  Seminar 

15 February 2017  Alexander Stewart (UCS)  Evolution and the dynamics of social behavior Abstract: Understanding the dynamics of social behavior is very much a puzzle for our time. On the one hand, this puzzle is critical to understanding how human behavior is changing as we update and expand our social networks in unprecedented ways. On the other hand, these complex dynamics can (perhaps) be understood, due to recent expansion in empirical data and computational power. An evolutionary perspective is vital to this effort, providing a framework for understanding how and when behavioral change will spread through a population. However, in order to provide this perspective we must develop mechanistic models of behavioral changes at the individual level that explain the emergence of social phenomena, such as social norms or collective identities, at the level of groups and populations. I will discuss ongoing attempts to model some of the complex social behaviors found in human such as cooperation, using evolutionary game theory, and some insights this approach yields about the emergence and stability of social norms. I will then briefly discuss how these game theoretic models can be connected to experiments to test qualitative predictions about human social behaviour. 
30 November 2016  Ivan Papić (Osijek University)  Heavytailed fractional Pearson diffusions Heavytailed fractional Pearson diffusions are a class of subdiffusions with marginal heavytailed Pearson distributions: reciprocal gamma, FisherSnedecor and Student distributions. They are governed by the timefractional diffusion equations with polynomial coefficients depending on the parameters of the corresponding Pearson distribution. We present the spectral representation of transition densities of fractional FisherSnedecor and reciprocal  gamma diffusions, which depend heavily on the structure of the spectrum of the infinitesimal generator of the corresponding nonfractional Pearson diffusion. Also, we present the strong solutions of the Cauchy problems associated with heavytailed fractional Pearson diffusions and the correlation structure of these diffusions.

16 November 2016  Joerg Kalcsics (University of Edinburgh)  Structural Properties of Voronoi Diagrams in Facility Location Problems with Continuous Demand Authors: Igor Averbakh, Oded Berman, Jörg Kalcsics, Dmitry Krass In most facility location models customer demand is assumed to be discrete and aggregated to a relatively small number of points. However, in many urban applications the number of potential customers can be in the millions and representing every customer residence as a separate demand point is usually infeasible. Thus, it may be more accurate to represent customer demand as spatially distributed over some region. We consider the conditional market share problem where locations of n facilities are fixed and we seek to find the optimal location for the (n+1)st facility with the objective of maximizing its market share. We assume that demand is uniformly distributed over a convex polygon, facilities can be located anywhere in the polygon, and customers obtain service from the closest open facility under rectilinear distances. Once the locations of all facilities are specified, the market area of a facility is given by its Voronoi cell in the Voronoi diagram generated by the facilities. The main difficulty when optimizing the location of the new facility is that it is generally impossible to represent the objective function in closed form; in fact, the representation depends on the structure of the Voronoi diagram, i.e., the position and the geometry of the cell boundaries. Unfortunately, this structure can change drastically with the location of the new facility. In this talk we derive structural properties of Voronoi diagrams for the rectilinear distances and show how to use them to identify regions where the resulting Voronoi diagram is "structurally identical" for every point in a region. Given such regions, we can derive a parametric representation of the objective function which is valid for any location in the region. This, in turn, allows us to optimize the location of the new facility over this region using classical nonlinear programming techniques. While the optimization techniques are specific to the particular model being considered, the structural results we derive, as well as our general approach, are quite universal, and can be applied to many other location models as well. 
2 November 2016  Laszlo Vegh (LSE)  Rescaled coordinate descent methods for Linear Programming Simple coordinate descent methods such as von Neumann's algorithm or Perceptron, both developed in the 50s, can be used to solve linear programming feasibility problems. Their convergence rate depends on the condition measure of the problem at hand, and is typically not polynomial. Recent work of Chubanov (2012, 2014), related to prior work of Betke (2004), has gathered renewed interest in the application of these methods in order to obtain polynomial time algorithms for linear programming. We present two algorithms that fit into this line of research. Both our algorithms alternate between coordinate descent steps and rescaling steps, so that either the descent step leads to a substantial improvement in terms of the convergence, or we can infer that the problem is ill conditioned and rescale in order to improve the condition measure. In particular, both algorithms are based on the analysis of a geometrical invariant of the LP problem, used as a proxy for the condition measure, that appears to be novel in the literature. This is joint work with Daniel Dadush (CWI) and Giacomo Zambelli (LSE). 
19 October 2016  Oleg Klesov (National University of Ukraine)  Generalized renewal processes and Pseudoregularly varying functions. 
13 July 2016  Lenny Fukshansky (Claremont McKenna)  Lattices from Abelian groups, spherical designs, and packing density maxima Lattices are fundamentally important in discrete geometry, cryptography, discrete optimisation and computer science. The lattice sphere packing problem asks for a lattice that maximises the packing density function among all lattices in a given dimension. A systematic approach to this problem involves understanding what properties a lattice should have to be a strong candidate for such a maximum. I will discuss such properties, including a connection to spherical designs, which are certain highly symmetric point configurations on spheres. I will then exhibit a construction of a family of lattices from finite Abelian groups, which have interesting geometric properties and produce local maxima for the packing density function in infinitely many dimensions. This is joint work with A. Böttcher, S. R. Garcia and H. Maharaj. 
1 June 2016  Chenlei Leng (University of Warwick)  Distributed sparse regression by decorrelating features An attractive approach for downscaling a Big Data problem is to first partition the dataset into subsets and then fit using distributed algorithms. Feature space partitioning can be effective for analysing datasets with a large number of features, but suffers from the failure of not taking correlations into account if done naively. In this paper, we propose a new embarrassingly parallel framework named DECO for distributed variable selection and parameter estimation. The trick is to apply a simple decorrelation step before performing sparse regression on each subset. The theoretical and computational attractiveness of DECO will be illustrated. This is joint work with Xiangyu Wang and David Dunson. 
25 May 2016  Murad Taqqu (Boston University, USA)  A unified approach to selfnormalized block sampling The inference procedure for the mean of a stationary time series is usually quite different under various model assumptions because the partial sum process behaves differently depending on whether the time series is short or longrange dependent, or whether it has a light or heavytailed marginal distribution. We develop an asymptotic theory for the selfnormalized block sampling, and prove that the corresponding block sampling method can provide a unified inference approach for the aforementioned different situations in the sense that it does not require the a priori estimation of auxiliary parameters. This is joint work with Shuyang Bai and Ting Zhang. 
4 May 2016  Yuzhi Cai (Swansea)  Density forecasting with TGARCH models for financial returns This talk presents a novel density forecasting method based on a threshold GARCH (TGARCH) model and an associated value at risk (VaR) model for financial returns. Without estimating the TGARCH model directly, we proposed a Bayesian approach based on a working likelihood to the estimation of the VaR model. This estimation method allows a model that has multiple thresholds to be dealt with easily. As a result, the forecasting method allows us to obtain multiplestep ahead density forecasts for financial returns without using any information on the distribution of the innovation term of a TGARCH model. Since our method produces density forecasts, it also enables us to study the effect of the past financial returns on the entire distribution of the current return. Hence, any predictive quantity of interest can be obtained as byproducts of our method. Compared with other models, much improved density forecasts have been obtained from our method. 
13 April 2016  Matthias Ehrgott (Lancaster)  Generating deliverable treatment plans and beam angle optimisation in the presence of multiple objectives in radiotherapy treatment planning External radiation therapy is one of the major forms of treatment of cancer. Planning a radiation treatment for a patient involves making tradeoffs between the main goals of radiotherapy, namely to irradiate the tumour according to some prescription to affect tumour control and to avoid damage to surrounding healthy tissue. This conflict permeates all aspects of treatment planning from the selection of beam angles to the optimisation of fluence maps to the sequencing of the gantry for treatment delivery. In this talk I will first describe a matheuristic approach to incorporate multiobjective fluence optimisation in the beam angle optimisation problem. The second part of the talk will cover a column generation approach to the multiobjective fluence map optimisation problem, which achieves a reduction of the number of apertures and total fluence required to deliver a Paretooptimal treatment. 
16 March 2016  Mathias Henze (FU Berlin)  Tight asymptotic estimates for the quantitative discrete Hellynumber of the integral lattice Given a discrete subset $S$ of $\mathbb{R}^n$, let $c(S,k)$ be the smallest number $t$ such that for every finite system of linear inequalities that has exactly $k$ solutions in $S$, there exists a subsystem of at most $t$ inequalities that still has exactly $k$ solutions in $S$. This number was introduced for $S=\mathbb{Z}^n$ by Aliev et al.~(2014) in order to study a quantitative version of Doignon's integer variant of the classical Helly theorem. K. Clarkson was the fi rst to notice that using Doignon's theorem one can obtain a probabilistic algorithm for integer linear programming. Following this approach, the quantitative version has been applied to finding the l best integer feasible points of an integer program. The initial estimate $c(\mathbb{Z}^n,k)\in O(k)$ by Aliev et al.~was recently improved to a sublinear bound by Chestnut et al., who also determined the asymptotic growth of $c(\mathbb{Z}^2,k)$ to be $k^\frac13$. Via the theory of Hellynumbers of set families we show that $c(S,k)$ admits a variety of different interpretations, for example, by the maximum number of vertices of a polytope containing $k$ nonvertex points of $S$, or by the maximum number of facets of an inclusionmaximal polyhedron containing $k$ points of $S$ in its interior. Based on these interpretations, we show that for the integral lattice the asymptotic behavior of $c(\mathbb{Z}^n,k)$ is given by $\Theta(k^\frac{n1}{n+1})$, and we determine $c(\mathbb{Z}^n,k)$ for small values of $k$ exactly. For general discrete sets $S$ we derive an upper bound of $c(S,k)\in O(k\cdot h(S))$, where $h(S)$ is the Hellynumber of $S$. This is joint work with G. Averkov, B. Gonz\'{a}lez Merino, I. Paschke, and S. Weltge. 
2 March 2016  Paul Smith (Southampton)  Calibration estimators in official statistics Modelassisted estimation has been in use in surveys for a long time under different names. I will trace some examples showing its evolution, and give a summary of modern calibration estimation as used by National Statistical Institutes in the production of official statistics. I will consider the reasons for calibration and the properties of the resulting estimates from both theoretical and user points of view, and give a range of examples demonstrating particular challenges and issues, and some developments where calibration estimation may continue to improve official outputs. 
24 February 2016  Professor Indrajit Ray (Cardiff Business School)  InformationRevelation and Coordination using Cheap Talk: Theory and Experiment 
17 February 2016  Professor Theodore Turocy (East Anglia)  Twobidder allpay auctions with interdependent valuations: We present results from two related papers. In the first paper, In the second paper, we focus on the relationship between monotonic equilibrium and those "highly competitive" cases. Having a high assessment of the value of the prize is good news, but only if the other participants in the contest are not too likely to 
10 February 2016  Evangelos Evangelou (Bath)  The Value of Information for Correlated GLM In portfolio optimisation, one could potentially invest in several projects with an uncertain amount of revenue depending on the outcome of each project. Initially, the investor has a preliminary estimate about the outcome of each project and a prior value for the whole investment but is given the option to purchase some information (i.e. data) from some projects. When the projects are correlated, these data can be used to update the expected outcome of all projects and derive a posterior value for the investment. The information provided by the data has some value to the investor who must decide whether it is worth purchasing them. In this talk we will consider the case where the outcome of each project is modelled by an exponential family. When the distribution is nonGaussian, the value of information does not have a closed form expression. Using Laplace's approximation for intergrals we will derive an approximation to the value of information and examine the sensitivity of the approximation under different parameter settings and distributions. The method will be illustrated using a spatial decision problem. Joint work with Jo Eidsvik (NTNU) 