首页|基于BERT和CNN的致病剪接突变预测方法

基于BERT和CNN的致病剪接突变预测方法

扫码查看
遗传诊断中的一个关键挑战是评估与剪接相关的致病遗传突变.现有致病剪接突变预测工具大多基于传统的机器学习方法,主要依赖人工提取的剪接特征,从而限制预测性能的提升,尤其对于非经典剪接突变,性能较差.因此,文中提出基于 BERT(Bidirectional Encoder Representations from Transformers)和 CNN(Convolutional Neural Network)的致病剪接突变预测方法(BERT and CNN-Based Deleterious Splicing Mutation Prediction Method,BCs-plice).BCsplice中BERT模块可全面提取序列的上下文信息,与提取局部特征的CNN结合后,可充分学习序列的语义信息,预测剪接突变致病性.非经典剪接突变的影响往往更依赖序列上下文的深层语义信息,通过CNN将BERT的多级别语义信息进行组合提取,可获得丰富的信息表示,有助于识别非经典剪接突变.对比实验表明BCs-plice性能较优,尤其是在非经典剪接区表现出一定性能优势,有助于识别致病剪接突变和临床遗传诊断.
BERT and CNN-Based Deleterious Splicing Mutation Prediction Method
A key challenge in genetic diagnosis is the assessment of pathogenic genetic mutations related to splicing.Existing predictive tools for pathogenic splicing mutations are mostly based on traditional machine learning methods,heavily relying on manually extracted splicing features.Thereby the predictive performance is limited,especially for non-canonical splicing mutation producing poor performance.Therefore,a bidirectional encoder representations from transformers(BERT)and convolutional neural network(CNN)-based deleterious splicing mutation prediction method(BCsplice)is proposed.The BERT module in BCsplice comprehensively extracts contextual information of sequences.While combined with CNN that extracts local features,BERT module can adequately learn the semantic information of sequences and predict the pathogenicity of splicing mutations.The impact of non-canonical splicing mutations often relies more on deep semantic information of sequence context.By combining and extracting the multi-level semantic information of BERT through CNN,rich information representations can be obtained,aiding in the identification of non-canonical splicing mutations.Comparative experiments demonstrate the superior performance of BCsplice,especially exhibiting certain performance advantages in non-canonical splicing regions,and it contributes to the identification of pathogenic splicing mutations and clinical genetic diagnosis.

Deleterious Splicing MutationDeep LearningPrediction ModelPathogenicity Predic-tion

宋程程、赵依然、李晓艳、夏俊峰

展开 >

安徽大学物质科学与信息技术研究院 合肥 230601

致病剪接突变 深度学习 预测模型 致病性预测

国家自然科学基金

U22A2038

2024

模式识别与人工智能
中国自动化学会,国家智能计算机研究开发中心,中国科学院合肥智能机械研究所

模式识别与人工智能

CSTPCD北大核心
影响因子:0.954
ISSN:1003-6059
年,卷(期):2024.37(2)
  • 36