240 mammal genomes open for all in groundbreaking collaboration

11 November 2020

Earlham Institute Researchers have contributed to the launch of 240 mammal genomes, the first ever consortium-driven genomics project that allows anyone to access the publicly available data prior to analysis. The resource - which includes 120 completely new genome sequences - will vastly accelerate our understanding of human health and conservation genomics.

The data, published today in Nature, will allow a much more detailed understanding of the human genome - including the genes causing common and rare human diseases - that may lead to both better diagnostics and more specific treatment options.

The publicly available data has already proven useful in understanding the genetic basis of virus transmission in different species, including humans, bats and pangolins - an analysis which was done quickly in the wake of the COVID-19 pandemic.

“The objective is to provide a multitool for comparative genomics. To increase the coverage of the mammalia at the family level,” says Dr Will Nash, a postdoctoral researcher in the Haerty Group at Earlham Institute (EI). “The project covers about 80% of mammal families, which means our understanding of the mammals is more broad.

“We now have a wide, comparative dataset that allows us to understand the DNA bases that are under evolutionary conservation - that haven’t changed in a very long time in about 110 million years of mammalian evolution. We can really zoom in on single nucleotides that have stayed at that exact same position in the DNA sequence over all this time.”

Previously, researchers could only confidently look at preserved regions of about 12 DNA bases, 12 letters of the DNA code. With this new dataset - because there is such a wealth of data to compare from across this highly interrelated group of animals - we drill down to a single letter. We’re looking at the genome in much greater resolution. It’s like your new 12-megapixel iPhone camera images compared to the pixelated ones generated by an early smartphone.

To emphasise the usefulness of this, Dr Wilfried Haerty and collaborators at the University of Oxford are already using the new data to investigate changes in the DNA sequence of the regulatory regions in a type of calcium channel, mutations in which have been linked to schizophrenia and bipolar disorders. Understanding precisely where the changes are conserved, or specific for a certain condition, can help us to guide therapeutic interventions.

“The main driving force for the 200+ mammal genomes project was to get that resolution from 12 to 1 nucleotide,” says Dr Wilfried Haerty, group leader at EI, who says that while the resource is of great use for studying human disease, we can learn a whole lot more besides by shifting our perspective. “But, as is very much highlighted in the paper, instead of focusing on humans this allows us to stop thinking in such an anthropocentric manner.”

Among the 240 mammal genomes are 34 species of bat, 16 species of cetacean and more than 60 rodents. The power of the new resource will be to empower researchers to look at all species of interest with respect to any of the others, not just in relation to humans. That aspect will be of particular benefit to researchers looking to save species on the brink of extinction, which are undergoing severe genetic bottlenecks.

“Look at every genome project, every alignment available. It’s a human reference based alignment, or a mouse reference based alignment. It’s not based on anything else,” Haerty explains. “With the release of the 252 mammal genomes, it’s a different way of thinking. We can put the focus on other species.

“This is a dataset we’ll be using for years that will very much revolutionise the way we are thinking. It’s very exciting, both in terms of human health and for conservation genomics.”