首页|大语言模型在蛋白质设计中的应用综述

大语言模型在蛋白质设计中的应用综述

扫码查看
在蛋白质设计领域,人工智能技术的应用已经催生了一些大模型.蛋白质的计算设计是指利用计算机技术辅助确定蛋白质的氨基酸序列,实现预设的结构和功能的过程.基于计算的蛋白质设计可进行改造设计或从头设计.特定功能的蛋白质快速生成,对生物医学研究、药物开发和生物工程等领域的发展具有重要意义.本文首先从传统计算方法、机器学习方法和深度学习方法对蛋白质的计算设计进行了梳理概述,然后介绍大语言模型的核心架构Transformer,重点分类介绍了蛋白质大语言模型的研究应用,最后对未来的研究重点进行了展望.
A Review for the Application of Large Language Model in Protein Design
In the field of protein design,the application of artificial intelligence technology has spawned some large models.The com-putational design of proteins refers to the process of using computer technology to assist in determining the amino acid sequence of pro-teins and achieving preset structures and functions.Computational protein design can be conducted through redesign or de novo design.The rapid generation of proteins with specific functions is of great significance to the development of biomedical research,drug develop-ment,and bioengineering.This article first provides an overview of computational protein design from traditional computational methods,machine learning methods,and deep learning methods.Then,it introduces the core architecture of large language models,Transformer,and focuses on introducing the research and application of protein large language models.Finally,it looks forward to the future research priorities.

Protein sequenceProtein structureLarge language modelTransformer architecture

张锦雄、孟雪莉、陈燕、韦松键、吕丽兰、胡小春

展开 >

广西大学计算机与电子信息学院,南宁,530004

广西大学工商管理学院,南宁,530004

广西壮族自治区亚热带作物研究所,南宁,530001

广西财经大数据重点实验室,南宁,530003

展开 >

蛋白质序列 蛋白质结构 大语言模型 Transformer架构

国家自然科学基金项目广西重点研发计划项目

62362004桂科AB24010031

2024

基因组学与应用生物学
广西大学

基因组学与应用生物学

CSTPCD北大核心
影响因子:1.108
ISSN:1674-568X
年,卷(期):2024.43(8)