首页|基于多特征融合的细胞特异性lncRNA的亚细胞定位预测

基于多特征融合的细胞特异性lncRNA的亚细胞定位预测

扫码查看
长链非编码RNA(long non-coding RNA,lncRNA)在细胞生物学过程和疾病发展中扮演着关键性角色.由于lncRNA的亚细胞定位和其生物学功能密切相关,因此确定lncRNA的亚细胞定位具有重要意义.目前已有一些基于机器学习的方法来识别lncRNA的亚细胞位置,但在识别人类lncRNA的细胞特异性定位方面的相关工作仍然有限.该模型对人类细胞系lncRNA亚细胞定位问题进行了研究,提取了 k-mer、CKSNAP、SRS和TSS特征信息,并对各类特征信息进行了融合,基于XGBoost和LightGBM结合的算法来预测人类细胞系lncRNA的亚细胞位置,并通过10倍交叉检验对模型进行了评估.结果表明,该模型预测人类细胞系lncRNA亚细胞定位的方法与现有的预测方法相比,预测成功率均有一定改进,其基准数据集的AUROC值最高达到92.26%.
Prediction of Cell-Specific Subcellular Localization of lncRNA Based on Multi-Feature Fusion
Long non-coding RNA(lncRNA)plays a crucial role in cellular biological processes and disease development.Due to the close correlation between the subcellular localization of lncRNA and its biological functions,determining the subcellular localization of lncRNA is of significant importance.Currently,there are some machine learning-based methods for identifying the subcellu-lar localization of lncRNA.However,there is still limited research on the cell-specific subcellular localization of lncRNA in humans.This study investigated the subcellular localization of lncRNA in human cell lines and extracted features such as k-mer,CKSNAP,SRS,and TSS.The different types of features were fused together,and an algorithm combining XGBoost and LightGBM was used to predict the subcellular localization of lncRNA in human cell lines.The model was evaluated using 10-fold cross-validation.The results showed that compared to existing prediction methods,this algo-rithm improved the prediction success rate in predicting the subcellular localization of lncRNA in human cell lines,with the highest AUROC value on the benchmark dataset reaching 92.26%.

cell line specificlong non-coding RNAsecondary structurefeature fusiongradi-ent lifting decision tree

杨佳宏、陈颖丽、盖智敏、刘姝含

展开 >

内蒙古大学物理科学与技术学院,呼和浩特 010021

细胞系特异性 长链非编码RNA 二级结构 特征融合 梯度提升决策树

国家自然科学基金国家自然科学基金

6236104732160216

2024

内蒙古大学学报(自然科学版)
内蒙古大学

内蒙古大学学报(自然科学版)

CSTPCD
影响因子:0.346
ISSN:1000-1638
年,卷(期):2024.55(2)
  • 20