Intelligent Single-Cell Classification Based on Multisource Domain Adaptation
Single-cell Ribonucleic Acid(RNA)sequencing technology has proven effective in generating high-resolution cell maps of human tissues and organs,thereby enhancing researchers'comprehension of cellular heterogeneity in human disease tissues.Cell annotation stands as a crucial step in single-cell RNA sequencing data analysis.While many conventional models rely on a labeled single-cell reference dataset to annotate the target dataset,certain cell types within the target dataset may not be represented in the reference dataset.Consequently,integrating multiple reference datasets can offer broader coverage of cell types in the target dataset.Nevertheless,batch effects arise between multiple reference datasets and the target dataset due to disparities in sequencing technologies and other factors.To mitigate this issue,this study introduces a single-cell classification model based on multisource domain adaptation.This model leverages multiple reference datasets,each annotated with cell types,to undergo adversarial training with an unlabeled target dataset,thereby mitigating batch effects.Additionally,virtual adversarial training is employed to bolster the model's predictive robustness against minor perturbations or noise around data points,thus preventing overfitting.Experimental findings across multiple single-cell datasets demonstrate that this model enhances cell recognition accuracy by a minimum of 5 percentage points compared to current mainstream models,offering new avenues and benchmarks for identifying newly sequenced single-cell identities.