基于Edlib的启发式生物序列聚类算法

A heuristic biological sequence clustering algorithm based on Edlib

扫码查看

原文链接

维普
万方数据

中文摘要：目的提出一种基于Edlib的启发式序列聚类算法:EdClust,以降低目前启发式序列聚类算法普遍存在的聚类数量过估计和聚类种子序列质量低的问题.方法 EdClust首先读取第一条序列并作为第一个聚类单元的种子;然后读取下一条序列,通过Edlib计算序列与种子序列的相似性,如果相似性大于给定阈值,则对其进行聚类,否则,创建一个新的聚类单元并作为其种子序列;重复以上步骤,直到所有序列完成聚类.结果 2组实验测试表明,EdClust在聚类数量和种子序列质量上均取得较好效果.结论 EdClust采用Edlib进行序列比对,可以快速得到待比对序列与种子序列间的相似性,提高了聚类种子质量,降低了聚类数量过估计.

外文摘要：Purposes—To develop a new heuristic sequence clustering heuristic(EdClust)based on Edlib,with the aim of addressing overestimation of inferred clusters and low seed quality in numerous heuristic clustering algorithm.Methods—In EdClust,the first input sequence becomes the seed for the first cluster.The next input sequence is compared against all existing seeds by using the Edlib C/C++library of sequence alignment.If the similarity is greater than the given threshold,this sequence is added to the corresponding cluster.Otherwise,a new cluster is created,and the sequence becomes the seed.The previous processes are repeated until all the sequences are clustered.Results—EdClust is tested on two widely used databases,demonstrating that EdClust can obtain fewer clusters and a-chieve higher clustering sensitivity.Conclusions—In EdClust,Edlib is used to perform pairwise align-ment,which can find the most similar region at any part of the seed for a query sequence.It's demon-strated that EdClust improves the seed quality and reduces the overestimation of clusters.

外文关键词：

sequence clusteringheuristic clusteringclustering qualityhigh-throughput sequen-cing

作者：

卫泽刚、陈旭、张小丹、胡婉靖、刘飞

展开 >

作者单位：

宝鸡文理学院物理与光电技术学院,陕西宝鸡 721016

关键词：

序列聚类启发式聚类聚类质量高通量测序

基金：

国家自然科学基金青年项目宝鸡文理学院校级研究生创新科研项目陕西省科技厅项目陕西省教育厅项目陕西基础科学(数学、物理学)研究院科研计划项目2023年教育部产学合作协同育人项目宝鸡文理学院第十七批校级本科教学改革研究项目宝鸡文理学院2023年大学生创新创业训练计划项目

项目编号：

62402010YJSCX23YB372024SF-YBXM-13423JK028723JSQ05123070521117561822JGYB37S202310721033

出版年：

2024

DOI：

10.13467/j.cnki.jbuns.2024.03.008

宝鸡文理学院学报(自然科学版)

宝鸡文理学院

宝鸡文理学院学报(自然科学版)

影响因子：0.356

ISSN：1007-1261

年,卷(期)：2024.44(3)