首页|Benchmarking text-integrated protein language model embeddings and embedding fus ion on diverse downstream tasks

Benchmarking text-integrated protein language model embeddings and embedding fus ion on diverse downstream tasks

扫码查看
By a News Reporter-Staff News Editor at Robotics & Machine Learning DailyNews Daily News - According to news reporting based on a preprint abstract, our journalists obtained thefollowing quote sourced from bi orxiv.org:“Protein language models (pLMs) have traditionally been trained in an unsupervis ed manner usinglarge protein sequence databases with an autoregressive or maske d-language modeling training paradigm.Recent methods have attempted to enhance pLMs by integrating additional information, in the form oftext, which are refer red to as \”text+protein\” l anguage models (tpLMs). We evaluate and compare sixtpLMs (OntoProtein, ProteinD T, ProtST, ProteinCLIP, ProTrek, and ESM3) against ESM2, a baselinetext-free pL M, across six downstream tasks designed to assess the learned protein representa tions. Wefind that while tpLMs outperform ESM2 in five out of six benchmarks, n o tpLM was consistently the best.Thus, we additionally investigate the potentia l of embedding fusion, exploring whether the combinationsof tpLM embeddings can improve performance on the benchmarks by exploiting the strengths of multiple tpLMs. We find that combinations of tpLM embeddings outperform single tpLM embedd ings in five out ofsix benchmarks, highlighting its potential as a useful strat egy in the field of machine-learning for proteins.

BioinformaticsBiotechnologyBiotechno logy - BioinformaticsCyborgsEmerging TechnologiesInformation TechnologyM achine Learning

2024

Robotics & Machine Learning Daily News

Robotics & Machine Learning Daily News

ISSN:
年,卷(期):2024.(Sep.5)