SQL-to-text模型的组合泛化能力评估方法

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
万方数据

中文摘要：数据库的结构化查询语言(SQL)到自然语言的翻译(SQL-to-text)能提高关系数据库的易用性.近年来该领域主要使用机器学习的方法进行研究并已取得一定进展,然而现有翻译模型的能力仍不足以投入实际应用.由于组合泛化能力是SQL-to-text模型在实际应用中提升翻译效果的必要能力,且目前缺少对此类模型组合泛化能力的研究,因此提出一种SQL-to-text模型的组合泛化能力评估方法.基于现有的SQL-to-text数据集生成大量SQL和对应的自然语言翻译(SQL-自然语言对),并按SQL-自然语言对所含SQL子句的个数将其划分为训练数据与测试数据,使测试数据中的SQL子句皆以不同的组合方式在训练数据中出现,从而得到可评估模型组合泛化能力的新数据集.评估结果表明,该方法对查询知识的使用程度较高,划分数据的方式更加合理,所得数据集符合评估组合泛化能力的需求且贴近模型的实际应用场景,受到原始数据集的限制程度更低,并证实现有模型的组合泛化能力仍需提升,其中针对SQL-to-text任务设计的关系感知图转换器模型组合泛化能力最弱,表明原有的SQL-to-text数据集对组合泛化能力的考察存在欠缺.

外文标题：Combinatorial Generalization Ability Evaluation Method of SQL-to-text Model

外文摘要：Translating from Structured Query Language(SQL)to natural language can improve the usability of a database.Some progress is currently being made in this research,which mainly uses machine learning models.However,the capabilities of the existing translation models are still insufficient for practical applications.Because combinatorial generalization is a necessary ability for an SQL-to-text model to improve the translation effect in practical applications,and there is currently a lack of research on this ability for such models,a combination of SQL-to-text models is proposed as a generalization ability assessment method.This method generates a large amount of SQL and corresponding natural-language translations(referred to as SQL-natural language pairs)based on an existing SQL-to-text dataset.These SQL-natural language pairs are then divided into training and test data according to the number of SQL clauses they contain.Thus,the SQL clauses in the test data appear in the training data in different combinations,which produces a new data set that can be used to evaluate the generalization ability of the model combination.The evaluation results show that this method has a higher degree of query-knowledge use.It utilizes a more reasonable method to divide data,and the obtained data set meets the requirements for the evaluation of combinatorial generalization ability.It is close to the actual application scenario of the model,and is less restricted by the original data set.The combinatorial generalization ability of the existing models still needs to be further improved.Among them,the relationship-aware graph converter model designed for SQL-to-text tasks has the weakest combinatorial generalization ability,indicating that the original SQL-to-text data set is insufficient for the investigation of the combinatorial generalization ability.

外文关键词：

Structured Query Language(SQL)compositional generalizationmachine translationdatabaseLong Short-Term Memory(LSTM)model

作者：

陈琳、范元凯、何震瀛、刘晓清、杨阳、汤路民

展开 >

作者单位：

复旦大学计算机科学技术学院,上海 200433

星环信息科技(上海)股份有限公司,上海 200233

关键词：

结构化查询语言组合泛化机器翻译数据库长短期记忆模型

出版年：

2024

DOI：

10.19678/j.issn.1000-3428.0067251

计算机工程

华东计算技术研究所　上海市计算机学会

计算机工程

CSTPCD北大核心

影响因子：0.581

ISSN：1000-3428

年,卷(期)：2024.50(3)

参考文献量32