Construction of SSR Fingerprint Library and Comprehensive Evaluation for Approved Cotton Varieties in China
[Objective]Cotton,a heterotetraploid crop with a complex genome structure,faces challenges in achieving high homozygosity due to frequent cross-pollination.The absence of effective technical supervision in the cotton seed market and the persistence of disordered varieties have a negative impact on the consistency of fiber quality.The objectives of this study are threefold:to establish a DNA fingerprint database for approved cotton varieties in China over the past 20 years,to explore a high-throughput SSR identification model for cotton varieties,and to provide a basis for the authentication of existing varieties and the specific identification of new cotton varieties.Additionally,we aim to analyze the genetic diversity and population differentiation among approved varieties.Ultimately,our goal is to provide a theoretical framework for identifying cotton varieties that are well-suited to different ecological regions and for developing varieties that can adapt to new environments.[Method]Based on multiplex PCR technology and capillary electrophoresis detection method,using 60 SSR markers screened to construct a DNA fingerprint library of 1 015 standard samples of cotton approved varieties.Through the plant variety DNA fingerprint library management system,the SSR fingerprints of approved varieties were compared pairwise to analyze the genetic differences of approved varieties and screen the core SSR loci for variety identification.Cluster analysis and population structure analysis were used to analyze the genetic diversity of 1 015 cotton approved varieties and calculate the genetic differentiation index between populations.[Result]60 SSR markers amplified 216 allelic variations in 1 015 approved varieties,with an average of 3.6 allelic variations and a mean PIC value of 0.37.When the SSR fingerprints of the 1 015 approved varieties were compared,a total of 513591 pairwise results were generated,with a maximum of 58 different loci between samples.The percentage of different loci was mainly concentrated at 41%-70%,involving 428 115 groups,accounting for 83.36%.Among them,when the percentage of different loci was at 51%-60%,the largest number of groups was involved,accounting for 197 829 groups,accounting for 38.52%.When the percentage of different loci between varieties was greater than 20%,it accounted for more than 99%of all pairwise comparison groups,and the pairwise comparison results with a percentage of different loci lower than 20%only accounted for 0.58%.Based on the combination identification method,a set of cores SSR loci containing 10 SSR loci was selected,and the discrimination ability among the 1 015 varieties reached 99%.Clustering results and population structure analysis showed that the 1 015 varieties were clearly divided into five subpopulations.G1(n=240)was an early-maturing cotton subpopulation,mainly distributed in northern and inland regions of China.This subpopulation had the most abundant genetic diversity among varieties,with an average genetic distance of 0.419 between varieties.G2(n=277)was a medium-maturing cotton subpopulation,distributed in the Yangtze River Basin.This subpopulation had more hybrids,with an average genetic distance of 0.309 within the subpopulation.G3(n=1 09)belonged to early-maturing and medium-maturing cotton subpopulations,distributed in Hebei'sHeilonggang region.This subpopulation had relatively simple genetic components,with the smallest average genetic distance among upland cotton subpopulations at only 0.150.G4(n=254)belonged to a medium-early maturing cotton subpopulation,mainly distributed in the Yellow River Basin.The average genetic distance within this subpopulation was 0.307.G5(n=37)consisted of 37 sea island cotton samples,with the smallest average genetic distance within the population at only 0.149.The genetic differentiation level between sea island cotton and upland cotton was the highest,with an average FST value of 0.503.Among upland cotton populations,the genetic differentiation level between G3 and other subpopulations was the highest,with FST values ranging from 0.193 to 0.242.The genetic differentiation level between the Yangtze River Basin and the Yellow River Basin was the lowest,with an FST value of 0.112.[Conclusion]A DNA fingerprint library of standard samples of 1 015 approved varieties in China over the past 20 years was constructed.A set of cores SSR loci containing 10 SSR loci was selected to clearly identify more than 99%of the varieties.A high-throughput cotton identification model of"core loci+extended loci"was created.The 1 015 varieties were divided into five subpopulations,and upland cotton had obvious geographical distribution characteristics.