Transcriptome Data Analysis of Coreopsis tinctoria Based on High-through-put Sequencing
In this study,we obtanied about 7.53 Gb data in total from leaves transcriptome of Coreopsis tinctoria'Roulette'by Illumina high-throughput sequencing technology.After assembling and removing redundancy,the study obtained 36 739 unigenes with an average length of 1 197 bp.The study compared unigene sequences with seven functional databases,and a total of 27 372(74.50%)were annotated successfully,in which 20 454 in NR and 20 849 in Swiss-Prot.18 971 unigenes were annotated in GO database and divided into 55 function group of 3 cat-egories.According to the function comparing with the KOG database,these unigenes from the obtained transcrip-tome were divided into 25 group.Moreover,7 697 unigenes were broadly divided into 5 categories and 19 sub-cat-egories,and could be further annotated to 129 KEGG pathway.22 780 CDS were obtained based on homologous sequence comparison,and 9 148 CDS were obtained by prediction with ESTScan.Based on homologous sequence comparison,22 780 CDS were obtained,and 9 148 CDS were obtained after the prediction by ESTScan.7 627 SSR were detected from 6 150 unigenes sequences by using MISA software.This transcriptome date analysis can provide a theoretical reference for the gene expression,development of SSR molecular markers,and the improve-ment and cultivation of new varieties in Coreopsis tinctoria.