首页|ULEO:表示合成实验规程的实验操作统一语言

ULEO:表示合成实验规程的实验操作统一语言

扫码查看
[目的]面对智能科研与科学机器人对高质量实验规程数据的需求,解决合成实验规程中的实验操作词统一表示问题.[方法]综合利用数据和专家知识协同驱动的方式,从合成相关的论文与专利文本中识别并标准化实验操作词.实验操作词识别主要选用较为先进的开源大模型ChatGLM2-6B,实验操作词标准化则混合应用Wu-Palmer和余弦相似度,辅以专家经验知识判别分类的准确性.[结果]分别获取149个无机合成实验操作词和141个有机合成实验操作词,两者交集124个词.经判定在两类合成实验中分别出现的操作词中多数并不具备鲜明的类别特色,因此可取两类合成实验操作词的并集,共计166个,用于统一表示有机、无机及其杂化合成实验操作.[局限]仅使用基础的提示工程来激发大模型识别实验操作词,准确率有待提升;所用的数据主要源于当前免费公开的数据集,不够全面、丰富;仅关注合成、工程和基础步骤中涉及的操作词,未涉及动态、分析与命名反应中的操作词.[结论]本文构建一套表示合成实验操作的统一语言,用于表示有机、无机及其杂化合成反应中的实验操作,不同类型的合成实验操作词在表示上差异不大,在使用频次和倾向上确有不同,今后可据此优先选择研制科学机器人相应的实验操作功能.
ULEO:Unified Language of Experiment Operations for Representation of Synthesis Protocols
[Objective]This study addresses the unified representation issue of experimental operation verbs in synthetic experiment protocols,which provides high-quality experimental protocol data for science intelligence and robotics.[Methods]We utilized a collaborative approach driven by data and expert knowledge to identify and standardize experimental operation verbs from literature and patent texts related to synthesis.First,we used advanced open-source large models like ChatGLM2-6B to identify experimental operation verbs.Then,we combined Wu-Palmer and cosine similarity to standardize these verbs.Finally,we assessed their classification accuracy with expert knowledge.[Results]The study identified 149 operation verbs for inorganic synthetic experiments and 141 operation verbs for organic synthetic experiments.Expert judgment revealed that many of the 124 operation terms appearing in both groups do not possess distinct category characteristics.Therefore,we merged the two categories to have 166 experimental operation verbs representing the operations in organic,inorganic,and hybrid synthesis experiments.[Limitations]The study only employed basic prompt engineering techniques to direct the large model to recognize experimental operation verbs from publicly accessible datasets.This study focused on operation terms involved in synthesis,engineering,and basic steps without considering operation terms in dynamic,analytical,and name reactions.[Conclusions]This study establishes a unified language for representing experimental operations in synthesis,applicable to organic,inorganic,and hybrid synthesis reactions.It could inform the future development of scientific robotics experiments.

Unified Language of Experiment OperationsAI for ScienceSynthesis Experimental ProtocolsExperiment OperationsScience Robotics

付芸、朱丽雅、李丹、孙蒙鸽、张建锋、刘细文

展开 >

中国科学院文献情报中心 北京 100190

中国科学院大学经济与管理学院信息资源管理系 北京 100190

中国科学院过程工程研究所 北京 100190

实验操作统一语言 智能科研 合成实验规程 实验操作 科学机器人

国家自然科学基金重点项目

72234005

2024

数据分析与知识发现
中国科学院文献情报中心

数据分析与知识发现

CSTPCDCSSCICHSSCD北大核心EI
影响因子:1.452
ISSN:2096-3467
年,卷(期):2024.8(1)
  • 27