首页|一种利用词典扩展数据库模式信息的Text2SQL方法

一种利用词典扩展数据库模式信息的Text2SQL方法

扫码查看
现有Text2SQL方法严重依赖表名和列名在自然语言查询中的显式提及,在同物异名的实际应用场景中准确率急剧下降。此外,这些方法仅仅依赖数据库模式捕捉数据库建模的领域知识,而数据库模式作为结构化的元数据,其表达领域知识的能力是非常有限的,即使有经验的程序员也很难仅从数据库模式完全领会该数据库建模的领域知识,因此程序员必须依赖详细的数据库设计文档才能构造SQL语句以正确地表达特定的查询。为此,本文提出一种利用词典扩展数据库模式信息的Text2SQL方法,该方法从数据库表名和列名解析出其中的单词或短语,查询词典获取这些单词或短语的语义解释,将这些解释看成是相应表名或列名的扩展内容,与表名、列名及其他数据库模式信息(主键、外键等)相结合,作为模型的输入,从而使模型能够更全面地学习数据库建模的应用领域知识。在Spider-syn和Spider数据集上进行的实验说明了所提出方法的有效性,即使自然语言查询中使用的表名和列名与数据库模式中对应的表名和列名完全不同,本文方法也能够得到较好的SQL翻译结果,明显优于最新提出的抗同义词替换攻击的方法。
A Text2SQL method utilizing database schema information expanded by dictionary
The existing Text2SQL methods rely heavily on the explicit mention of tables and columns in natural language queries,which causes the accuracy rate drops sharply in real-world scenarios when the same object has different names.In addition,these methods only use the database schema to capture the domain knowledge of database modeling,but the database schema,as structured metadata,has a very limited ability to express domain knowledge.This makes it difficult even for experienced programmers to fully comprehend the domain knowledge of database modeling only from the database schema,so pro-grammers require detailed database design documents to construct SQL statements to correctly express specific queries.Therefore,we propose a Text2SQL model that uses dictionaries to expand database schema information,which parses out words or phrases in the tables and columns,queries the dictionary to obtain the semantic interpretations of these words or phrases.These semantic interpretations and the corresponding tables or columns,combined with the tables,columns and other database schema infor-mation such as primary key,foreign key are introduced to the model to learn the application field knowl-edge of database modeling more comprehensively.Experiments on Spider-syn and Spider dataset illus-trate the effectiveness of our method,even if the table and column names used in the natural language queries are completely different from the corresponding tables and columns in the database schema,our method can get better SQL translation results,which significantly better than the latest proposed meth-od against synonym substitution.

Database schemaSemantic extensionInterpretation informationText2SQL

于晓昕、何东、叶子铭、陈黎、于中华

展开 >

四川大学计算机学院,成都 610065

数据库模式 语义扩展 解释信息 Text2SQL

四川省重点研发项目

2023YFG0265

2024

四川大学学报(自然科学版)
四川大学

四川大学学报(自然科学版)

CSTPCD北大核心
影响因子:0.358
ISSN:0490-6756
年,卷(期):2024.61(1)
  • 35