Intelligent interaction of railway passenger transportation's marketing data based on NL2SQL
In order to solve the problems of high query threshold and low utilization rate in railway passenger transportation marketing data which caused by the lack of professional knowledge,unfamiliarity with the structure of database,and unable to use the structured query language.An intelligent interaction model for railway passenger transport data based on NL2SQL was proposed.First,an experimental database which contains multiple tables and involved lots of railway passenger transportation's marketing data was established based on the high-frequency query requirements.And 2000 common query statements experimental data were marked manually.Next,the pre-trained model Chinese-RoBerta-wwm-ext was fine-tuned by the P-tuning technology based on the relevant corpus of railway passenger transportation marketing business.This realized the digital expression of unstructured text data,and established a dynamic word embedding model specialized in railway passenger transportation marketing business.And then,according to the structural characteristics of SQL syntax,seven sub models were established based on the bidirectional long short-term memory network,including keyword prediction,aggregation prediction,operation prediction,condition prediction,sorting prediction,aggregation condition prediction and column prediction.And the seven sub models were fused to an SQL prediction model based on the correlation relationships between these modules.At last,the SQL prediction model was used as a downstream task for the fine-tuned Chinese-RoBerta-wwm-ext model,which constructed a SQL prediction model based on dynamic word embedding and SQL abstract syntax tree.And the model was trained and tested by a mixed dataset which consists the labeled marketing data and the CSpider data.After experiments and verification,the logic form accuracy and execution accuracy of the model is 68.4% and 75.9%,respectively,which can predict the column names,table names,operators and conditions accurately.There is a significant accuracy improvement compared to the model used fixed word embedding model based on GLoVe (base model) and the model fine tuned the pre-trained module by pooling layer fine-tuning method (contrast model).The application of the model plays an important role in lower the bar on the use of railway passenger transportations marketing databases and sending the data more convenient to use for decision-making.