Comparison of Cell Type Identification Methods for Single-cell RNA-sequencing Data
Single-cell RNA-sequencing technology provides gene expression profiles with single cell resolution,which helps to reveal cellular heterogeneity more accurately.Clustering is the main method to identify cell types in biological tissues.Selecting a suitable clustering algorithm can improve the performance of single-cell transcriptome sequencing data analysis.In this paper,eight typical sin-gle-cell clustering methods are elaborated,including k-means,hierarchical clustering(HC),Leiden,SC3,SCENA,LAK,SIMLR,and dropClust,and compared on 12 single-cell transcriptome sequencing datasets with real labels.Eight evaluation indexes including contour coefficient,Calinski-Harabasz index,adjusted Rand index,adjusted mutual information,FMI index,V-measure,Jaccard coefficient and coefficient of variation are used to analyze and evaluate the performance of eight clustering algorithms.According to the experimental results,it is found that HC,SC3,k-means and SCENA have the best generalization and robustness of clustering perfor-mance,and SIMLR has the best clustering performance on large-scale data sets.Leiden algorithm has the best performance on small data sets,but it has the problem of dependence on neighbor node parameters and low stability.dropClust algorithm is the worst in terms of generalization and robustness.In addition,the performance of the eight clustering methods is related to the quality of the data.When the coefficient of variation of the data is low,the score of the clustering algorithm generally increases,and vice versa.
Single-cell RNA-sequencingClusteringCell type identificationData qualityPerformance evaluation