大语言模型在蛋白质设计中的应用综述
A Review for the Application of Large Language Model in Protein Design
张锦雄 1孟雪莉 1陈燕 2韦松键 1吕丽兰 3胡小春4
作者信息
- 1. 广西大学计算机与电子信息学院,南宁,530004
- 2. 广西大学计算机与电子信息学院,南宁,530004;广西大学工商管理学院,南宁,530004
- 3. 广西壮族自治区亚热带作物研究所,南宁,530001
- 4. 广西财经大数据重点实验室,南宁,530003
- 折叠
摘要
在蛋白质设计领域,人工智能技术的应用已经催生了一些大模型.蛋白质的计算设计是指利用计算机技术辅助确定蛋白质的氨基酸序列,实现预设的结构和功能的过程.基于计算的蛋白质设计可进行改造设计或从头设计.特定功能的蛋白质快速生成,对生物医学研究、药物开发和生物工程等领域的发展具有重要意义.本文首先从传统计算方法、机器学习方法和深度学习方法对蛋白质的计算设计进行了梳理概述,然后介绍大语言模型的核心架构Transformer,重点分类介绍了蛋白质大语言模型的研究应用,最后对未来的研究重点进行了展望.
Abstract
In the field of protein design,the application of artificial intelligence technology has spawned some large models.The com-putational design of proteins refers to the process of using computer technology to assist in determining the amino acid sequence of pro-teins and achieving preset structures and functions.Computational protein design can be conducted through redesign or de novo design.The rapid generation of proteins with specific functions is of great significance to the development of biomedical research,drug develop-ment,and bioengineering.This article first provides an overview of computational protein design from traditional computational methods,machine learning methods,and deep learning methods.Then,it introduces the core architecture of large language models,Transformer,and focuses on introducing the research and application of protein large language models.Finally,it looks forward to the future research priorities.
关键词
蛋白质序列/蛋白质结构/大语言模型/Transformer架构Key words
Protein sequence/Protein structure/Large language model/Transformer architecture引用本文复制引用
基金项目
国家自然科学基金项目(62362004)
广西重点研发计划项目(桂科AB24010031)
出版年
2024