Machine Learning for Prediction in Alzheimer's Disease: Identifying Novel Biologically Valid Diagnostic Categories to Inform Precision Medicine
Employing machine learning (ML) algorithms is a promising way of exploring the complex architecture of big data in genetics.
Several methods exist for solving classification problems like case-control and outcome groups in ML - support vector machines (SVMs), random forests and neural networks (NN) can all account for non-linear effects, but have different strengths and drawbacks.
You will investigate the ability of SVMs, NN and other machine learning methods for improving biologically-based classification of cases and controls in Alzheimer’s Disease (AD) and will make use the rich phenotype information available in UK Biobank to improve predictions of AD-associated outcomes and other dementia related phenotypes.
Current diagnostic categories do not map onto directly underlying biology and are at odds with the continuous nature of many disease phenotypes. If we are to relate pathology to underlying mechanisms, we need to move towards constructs that are biologically valid; these are likely to stratify within, and cut across, existing diagnostic categories.
There is evidence for shared genetic risk across neurodegenerative disorders and genetic strata within disorders. These have been at the vanguard of challenging existing categorical diagnostic classifications and of re-conceptualising the relationships between disorders.
However, the findings do not yet point to clinical strata that are useful for predicting outcomes or treatments. Genetics only explains a portion of the risk for developing AD, so the accuracy achievable for prediction on genetics alone is limited. The set of features will therefore be expanded to include phenotype measures such as cognitive scores and life style related variables, and the best performing regression models will be compared with other frequently highly-performing classification methods, namely SVMs, random forests and neural networks.
The inclusion of phenotyping measures, such as cognitive scores, to improve the models is of high interest and the best final model will likely include a combination of genetic, genomic (gene-expression), epigenetic, clinical and environmental features.