Skip to content
Skip to navigation menu


Dr Thomas Connor 

Population and Comparative Genomics

Fig 1
 Fig 1. Population structure of Salmonella bongori, and its context in relation to Salmonella enterica image from Fookes M, Schroeder GN, Langridge GC, Blondel CJ, Mammina C, Connor TR et al (2011) Plos Pathog 7(8)e1002191


Whole genome sequences provide us with the complete blueprint for the organisms that we are investigating. To understand our organisms of interest, we consider how their genomes vary between organisms (comparative genomics) and how they have changed/evolved over time (population genomics).

Unlike eukaryotic organisms, bacteria have highly variable genomes; they can gain and loose genes at a very high frequency, and members of the same named species may have fewer than half of their genes in common. This genomic plasticity is hugely important, as the genes that vary between strains are often the genes that are associated with characteristics of interest – such as virulence or antimicrobial resistance. Using whole genome sequence data we perform comparative genomics to

  • Work out how pathogens are related, in terms of the gene content they share
  • Work out how they vary in their gene content
  • Work out how their genetic variation relates to differences in their phenotype (basically their behaviour – such as the seriousness of disease that they cause)

We complement comparative genomics with phylogenetics, which enables us to determine the relationships between isolates, and by integrating the results from these in silico analyses with phenotypic data produced from in vitro and in vivo experimental work, we are able to derive a better understanding of how, and why our organisms of interest cause disease.


Fig 2
 Fig 2. Venn diagram showing the genes in common and different between three members of the same E coli pathovar; emphasising that although all of the isolates cause the same sort of disease in the same host, the way in which they do it is different.  Image from Dziva F, Hauser H*, Connor TR*, van Diemen PM, Prescott G, Langridge GC et al, (2013) Infect Immun 81(3) 838-849


While the comparative genomics work is focused on examining the similarities and differences between organisms, and how this relates to the phenotype of organisms, we supplement this by performing population genetic analyses to identify structure within the population, and to infer the recent evolutionary history of strains of interest.  This work has been underpinned by a strong, longstanding collaboration with Professor Jukka Corander of the University of Helsinki, with whom I have developed a number of population genetic approaches to analyse bacterial genome-scale datasets (Cheng et al. 2011, Cheng et al. 2013, Marttinen et al. 2012).

I have developed considerable expertise using these approaches and to date I have applied these approaches to datasets including those comprising Vibrio cholerae (Mutreja et al. 2011), Salmonella Typhimurium (Mather et al. 2013, Okoro et al. 2012) and Clostridium difficile (He et al. 2013).  In these cases, using a population genetic framework called BEAST, we reconstructed the evolutionary history of these organisms not in evolutionary time, but in human-understandable calendar units – years/days. Using this data I have been able to contribute significantly to answering key questions about how, and when outbreaks have begun, as well as being able to identify key events in the evolution of the pathogens of interest.


Fig 1
 Fig 3. Showing the BEAST tree from Okoro CK*, Kingsley RA*, Connor TR, Harris SR, Parry CM, Al-Mashhadani MN et al, Nat Genet 44 (11) 1215-1221 where we used BEAST to produce a dated phylogeny for the two clusters of invasive Non-Typhoidal Salmonella that are prevelant in sub-Saharan Africa


Bacteria do not respect borders; and local outbreaks can, and sadly sometimes do, lead to global epidemics. By combining population genomic approaches with excellent metadata, we are able to move beyond simple dated phylogenies towards a greater understanding of how bacteria move in time and space. I have worked extensively in projects that have examined the phylogeography of bacterial pathogens such as Vibrio cholerae, Salmonella Typhimurium and Clostridium difficile, deploying approaches to combine strain metadata and genomic information to derive insight into how and when pathogens of interest have spread around the world.


Fig 4
 Fig 4. Showing the transmission events associated with the spread of 7th Pandemic Vibrio cholerae around the world from Mutreja A*, Kim DW*, Thomson NR*, Connor TR, Lee JH, Kariuki S et al (2011) Nature 477 (7365) 462-465


Cheng L, Connor T R, Aanensen D M, Spratt B G and Corander J (2011) Bayesian semi-supervised classification of bacterial samples using MLST databases. BMC Bioinformatics 12 302.

Cheng L, Connor T R, Siren J, Aanensen D M and Corander J (2013) Hierarchical and spatially explicit clustering of DNA sequences with BAPS software. Mol Biol Evol 30 (5) 1224-1228.

Dziva F, Hauser H*, Connor T R*, van Diemen P M, Prescott G, Langridge G C, Eckert S, Chaudhuri R R, Ewers C, Mellata M, Mukhopadhyay S, Curtiss R, 3rd, Dougan G, Wieler L H, Thomson N R, Pickard D J and Stevens M P (2013) Sequencing and functional annotation of avian pathogenic Escherichia coli serogroup O78 strains reveal the evolution of E. coli lineages pathogenic for poultry via distinct mechanisms. Infect Immun 81 (3) 838-849.

Fookes M, Schroeder G N, Langridge G C, Blondel C J, Mammina C, Connor T R, Seth-Smith H, Vernikos G S, Robinson K S, Sanders M, Petty N K, Kingsley R A, Baumler A J, Nuccio S P, Contreras I, Santiviago C A, Maskell D, Barrow P, Humphrey T, Nastasi A, Roberts M, Frankel G, Parkhill J, Dougan G and Thomson N R (2011) Salmonella bongori provides insights into the evolution of the Salmonellae. PLoS Pathog 7 (8) e1002191.

He M, Miyajima F, Roberts P, Ellison L, Pickard D J, Martin M J, Connor T R, Harris S R, Fairley D, Bamford K B, D'Arc S, Brazier J, Brown D, Coia J E, Douce G, Gerding D, Kim H J, Koh T H, Kato H, Senoh M, Louie T, Michell S, Butt E, Peacock S J, Brown N M, Riley T, Songer G, Wilcox M, Pirmohamed M, Kuijper E, Hawkey P, Wren B W, Dougan G, Parkhill J and Lawley T D (2013) Emergence and global spread of epidemic healthcare-associated Clostridium difficile. Nat Genet 45 (1) 109-113.

Marttinen P, Hanage W P, Croucher N J, Connor T R, Harris S R, Bentley S D and Corander J (2012) Detection of recombination events in bacterial genomes from large population samples. Nucleic Acids Res 40 (1) e6.

Mather A E, Reid S W, Maskell D J, Parkhill J, Fookes M C, Harris S R, Brown D J, Coia J E, Mulvey M R, Gilmour M W, Petrovska L, de Pinna E, Kuroda M, Akiba M, Izumiya H, Connor T R, Suchard M A, Lemey P, Mellor D J, Haydon D T and Thomson N R (2013) Distinguishable epidemics of multidrug-resistant Salmonella Typhimurium DT104 in different hosts. Science 341 (6153) 1514-1517.

Mutreja A, Kim D W, Thomson N R, Connor T R, Lee J H, Kariuki S, Croucher N J, Choi S Y, Harris S R, Lebens M, Niyogi S K, Kim E J, Ramamurthy T, Chun J, Wood J L, Clemens J D, Czerkinsky C, Nair G B, Holmgren J, Parkhill J and Dougan G (2011) Evidence for several waves of global transmission in the seventh cholera pandemic. Nature 477 (7365) 462-465.

Okoro C K, Kingsley R A, Connor T R, Harris S R, Parry C M, Al-Mashhadani M N, Kariuki S, Msefula C L, Gordon M A, de Pinna E, Wain J, Heyderman R S, Obaro S, Alonso P L, Mandomando I, MacLennan C A, Tapia M D, Levine M M, Tennant S M, Parkhill J and Dougan G (2012) Intracontinental spread of human invasive Salmonella Typhimurium pathovariants in sub-Saharan Africa. Nat Genet 44 (11) 1215-1221.