Machine Learning for Dimension Reduction in High‐Dimensional Datasets
This research project is in competition for funding with one or more projects available across the EPSRC Doctoral Training Partnership (DTP). Usually the projects which receive the best applicants will be awarded the funding. Find out more information about the DTP and how to apply.
Application deadline: 15 March 2019
Start date: 1 October 2019
This project will focus on the improvement of existing methodology for more accurate and computationally faster estimation algorithms to achieve sufficient dimension reduction (SDR).
Analysing high-dimensional data needs sophisticated statistical methods as classic statistical methodology was developed when datasets were much smaller. Sufficient dimension reduction (SDR) is a class of methods for feature extraction in regression and classification problems.
The improvement of existing methodology for more accurate and computationally faster estimation algorithms to achieve SDR. Among the most interesting suggestions in the literature uses machine learning algorithms like Support Vector Machines (SVM).
The method although powerful can be improved in different directions. A few examples are:
- to derive new SDR methodology robust to outliers
- to derive Sparse SDR methodology
- to derive SDR methodology when we have missing predictors
- to derive SDR methodology for functional data
- and many more.
Moreover, there are many modern applications (like text data analysis) where the data are really high-dimensional and not derived from a Gaussian distribution. In those cases, the literature is rather thin in computationally effective methods for efficient dimension reduction.
We are looking into developing both supervised and unsupervised dimension reduction computationally efficient and accurate methods (like non-Gaussian PCA, non-Gaussian CCA etc). You can look into a number of directions sparse methodology, real time algorithms or applications to real datasets.
Project aims and methods
The project will enable you to develop the theoretical understanding of statistical methodology since it will be expected that you will develop statistical methodology with sound theoretical foundations. It will also enable you to develop an appreciation for the computational challenges we face when handling high-dimensional datasets. This will also help you to develop the necessary computational skills and learn very good programming principles. Finally, you will develop an appreciation for reproducibility in research.
You will be part of a research group that has six PhD students (three researching Maths, two researching Medical themes and one researching Biosciences). In addition to these, you will have the opportunity to meet collaborators of the supervisor in other sciences (Engineering, Medicine, Biosciences, Computer Science among others) and have a hands-on experience with analysing real dataset and understand the challenges faced by the practitioners.
You will be expected to read general literature on the topic of dimension reduction provided by the supervisor and then decide to focus on one of the many directions a project in this influential topic can take. You will be expected to develop your skills in identifying gaps in the literature and propose appropriate methodology for this.