When the Human Genome Project launched in 1990, it was hailed as one of the greatest scientific endeavors of all time. The 13-year project identified about 20,000 genes and gave researchers a genetic blueprint to transform modern medicine. Doctors can now use genetic information to better diagnose diseases and debilitating conditions, such as linking a rare case of leg pain to a single mutation. The research also ushered in hope for an age of precision medicine, where every treatment would be tailored to the individual. There was only one problem—the work wasn’t really finished.
That’s because humans are 99.9 percent identical. But the 0.1 percent in genetic differences explains our uniqueness, and can also account for why some people are more susceptible to disease. Having one map of a single genome, which the 90s-era project produced, does not adequately represent the breadth of the human population.
An international study published today in Nature is filling in these gaps by analyzing a much more diverse set of genetic sequences. “We’re retooling the foundation of genomics to create a diverse and inclusive representation of human variation as the fundamental reference structure,” says senior study author Benedict Paten, an associate director at the University of California, Santa Cruz Genomics Institute.
[Related: The benchmark for human diversity is based on one man’s genome. A new tool could change that.]
By eliminating bias and analyzing more inclusive genomic data, geneticists will have a better understanding of how mutations affect a person’s genes and move us closer to a future with equitable healthcare.
What is a pangenome?
The research focused on creating a pangenome—a collection of DNA sequences within a single species. Past work focused on a reference genome, built from a few individuals, that was supposed to represent a broader set of genes. A pangenome, on the other hand, is created from multiple people worldwide to more accurately reflect our genetic diversity.
It’s not as though past geneticists did not want to sequence more genetic variations—they just couldn’t. Erich Jarvis, a genetic professor at Rockefeller University Howard Hughes Medical Institute and a co-author of the study, says technology in the 90s and early 2000s did not allow researchers to see large variations between haplotypes—groups of genes inherited together from a single parent—within and across individuals.
The focus of a pangenome is to study the genetic differences among individuals from across the world. Jarvis says knowing about genomic variations is important, because some mutations are associated with different traits and diseases. For example, the lipoprotein (a) gene has a complex structure that has not been sequenced in humans. But variations in the gene are known to be associated with an increased risk of heart disease among Black people. By sequencing the entire gene and understanding its variations, doctors may be able to revisit and treat previously unexplained cases of coronary heart disease.
“This paper helps us to understand that DNA [is] more than a sequence of letters; DNA is structurally organized, and human variation that structure is important for genomic function and trait diversity,” says Sarah Fong, a postdoctoral scholar studying human population variation at the University of California, San Francisco who was not involved in the study.
What does the first draft reveal?
The authors collected data on 47 genetically diverse individuals. About half came from Africa, with the others representing four other continents (excluding Australia and Antarctica). The genomic information added information on 119 million base pairs and 1,115 duplications—mutations where a portion of DNA on a gene is repeated. As expected, more than 99 percent of the genetic sequences were similar across individuals. But by including the less than one percent of variations in this new pangenome draft, the authors found that structural changes to genes explained 90 million of the identified base pairs.
[Related: What we might learn about embryos and evolution from the most complete human genome map yet]
“By moving beyond a single, arbitrary, and linear representation of the genome, the work by the Pangenome Reference Consortium more accurately describes the diversity that exists in our species,” says Rajiv McCoy, an assistant professor of biology at Johns Hopkins University who was not involved in the current study but was recently involved in the first complete sequencing of the human genome.
With the latest pangenome model, it may become easier for geneticists to detect and characterize hard-to-find genetic mutations. When the authors analyzed a separate set of genetic information using the pangenome draft as a reference, they detected 104 percent more structural variants. They also improved the accuracy of the comparison sequence, reducing the variant error rate by 34 percent.
Still a work in progress
Creating the first draft of the pangenome is only phase one of this two-part project. The second phase will take a couple of years, as the authors build collaborations among other international researchers and perform community outreach in areas where there is less genomic data, such as including members of indigenous cultures.
It might take decades before we see the drafts finalized into a complete picture of the human genome. There are several challenges to address, Fong says, such as the development of an efficient strategy to compare multiple human genomes and a concrete plan for testing for genetic variations in the medical field.
Still, Fong says the benefits will be worth the effort. Having complete, diverse human genomes will advance the way genetics is studied, and create a future where people’s genes are more fully considered when treating diseases.