Huazhong Agricultural University releases the first cotton pan-genome map to help molecular breeding
Recently, the cotton genetic improvement team of Huazhong Agricultural University published a research paper entitled "Cotton pan-genome retrieves the lost sequences and genes during domestication and selection The research paper entitled "Cotton pan-genome retrieves the lost sequences and genes during domestication and selection" published the richest cotton genetic variation dataset in terms of variation types so far. The article also resolved the genomic basis of cotton domestication and improvement from multiple scales, providing new genetic loci for biological studies on the formation of important traits in cotton and new ideas for precise improvement of important traits in cotton from a pangenomics perspective.Currently, cotton is an important cash crop widely grown in the world and is a major source of natural woven fiber. Breeding cotton with high fiber quality, high yield, pest and disease resistance, high temperature tolerance, and ideal plant type has been a goal pursued by breeders. Recent cotton genomics studies have generated a large amount of genomic data, resolved the genetic contribution of artificial domestication to cotton trait improvement, and identified a number of agronomic trait-related loci.
The cotton genetic improvement team assembled high-quality reference genomes of land cotton TM-1 and sea island cotton 3-79 in a previous study (Nature Genetics, 2019), which provided good reference sequences for genomic variation analysis of large-scale populations and identification of superior alleles. However, relying on a single reference genome analysis misses much genetic variation, so it is necessary to comprehensively dissect the genetic diversity among different materials of land and island cotton from a population genome (pangenome) perspective.
In the study, the cotton team constructed a genetic variome (Variome) of 1913 cotton samples, containing 63 million single nucleotide polymorphisms (SNPs), 4.9 million small insertion/deletion variants (InDel), and 290,000 structural variants (SV). We comprehensively analyzed cotton population characteristics from multiple scales, dissected genomic divergence in domestication and improvement, and identified 162 QTL associated with 16 traits including fiber quality, yield, and flowering time.
The research team constructed a pangenome (3388 Mb sequences) of land cotton based on the reference genome alignment strategy, containing 63489 (61.8%) core genes and 39278 (38.2%) variable genes. Meanwhile, a pangenome (2575 Mb sequence) was constructed for island cotton, containing 68789 (85.8%) core genes and 11359 (14.2%) variable genes. Gene frequency analysis in wild and cultivated populations showed that a total of 6231 genes were selectively retained and lost during domestication and improvement.
Finally, the researchers used pangenomic data to analyze the frequency changes of several genes associated with traits such as fiber quality during domestication and improvement.