首页|基于依存关系图注意力网络的SQL生成方法

基于依存关系图注意力网络的SQL生成方法

扫码查看
研究基于自然语言问题的结构化查询语言(SQL)生成问题(Text-to-SQL)。提出两阶段框架,旨在解耦模式链接和SQL生成过程,降低SQL生成的难度。第 1 阶段通过基于关系图注意力网络的模式链接器识别问题中提及的数据库表、列和值,利用问题的语法结构和数据库模式项之间的内部关系,指导模型学习问题与数据库的对齐关系。构建问题图时,针对Text-to-SQL任务的特点,在原始句法依存树的基础上,合并与模式链接无关的关系,添加并列结构中的从属词与句中其他成分间的依存关系,帮助模型捕获长距离依赖关系。第 2 阶段进行SQL生成,将对齐信息注入T5 的编码器,对T5 进行微调。在Spider、Spider-DK和Spider-Syn数据集上进行实验,实验结果显示,该方法具有良好的性能,尤其是对中等难度以上的Text-to-SQL问题具有良好的表现。
SQL generation method based on dependency relational graph attention network
The problem of generating structured query language(SQL)from natural language questions(Text-to-SQL)was analyzed.A two-stage framework was proposed to decouple the processes of schema linking and SQL generation in order to reduce the complexity of SQL generation.Database tables,columns,and values mentioned in the question were identified by a schema linker based on relational graph attention network in the first stage.The syntactic structure of the question and the internal relationships between database schema items were used to guide the model in learning the alignment between the question and the database.The original syntactic dependency tree was modified by merging relationships irrelevant to schema linking and adding dependencies between subordinating conjunctions in parallel structures and other elements in the sentence in view of the characteristics of Text-to-SQL task when constructing the question graph,which helps the model capture long-distance dependencies.SQL generation was performed by injecting the alignment information into the T5 encoder and fine-tuning it in the second stage.Experiments were conducted on the Spider,Spider-DK and Spider-Syn datasets.Results showed that the method performed well,especially for Text-to-SQL problems of medium difficulty and above.

Text-to-SQLnatural language querydependency parsingrelational graph attention network

舒晴、刘喜平、谭钊、李希、万常选、刘德喜、廖国琼

展开 >

江西财经大学信息管理学院,江西南昌 330013

江西农业大学软件学院,江西南昌 330013

Text-to-SQL 自然语言查询 依存句法分析 关系图注意力网络

国家自然科学基金国家自然科学基金国家自然科学基金国家自然科学基金江西省自然科学基金江西省教育厅科学技术研究项目江西省研究生创新专项

6207611262272205622722066227220720232ACB202008GJJ190255YC2023-B185

2024

浙江大学学报(工学版)
浙江大学

浙江大学学报(工学版)

CSTPCD北大核心
影响因子:0.625
ISSN:1008-973X
年,卷(期):2024.58(5)
  • 29