在文旅领域智能问答中,用户问句文本表征稀疏、口语化表达、一词多义及特定领域词汇的识别困难使得常见的匹配模型难以将用户问句与标准问句进行精准匹配.针对此问题,本文构建了文旅客服问句匹配数据集和相应的领域词典,在此基础上提出一种融合领域词典的文旅问句匹配模型SBIDD(Improved SBERT Model for Integrating Domain Dictionaries).模型利用Sen-tence-BERT对问句进行向量化表示,在孪生网络模型中融入领域词典,增强问句的领域词权重,使得模型对领域词汇的识别能力大幅提升.在自建数据集和公开数据集ATEC 2018 NLP上分别进行实验.结果表明,构建的模型与5种经典文本匹配模型DSSM、BiMPM、ESIM、IMAF、TSFR-RM及基线模型SBERT相比效果更优,F1 值达到95.65%,比基线模型提升了2.75%,且模型在检索任务上表现出更高的适配性和鲁棒性.
A matching model for culture and tourism customer service questions based on domain dictionary fusion
In culture and tourism intelligent question answering,the sparse representation,collo-quial expression,polysemy of a word,and difficulty in recognizing specific domain vocabulary make it difficult for common matching models to accurately match user questions with standard questions.In response to this issue,firstly a dataset of customer service question matching for cul-tural and tourism and corresponding domain dictionaries were constructed.Then a cultural and tourism question matching model SBIDD(Improved SBERT Model for Integrating Domain Dic-tionaries)integrating domain dictionaries was proposed.The model utilizes SBERT to vectorize questions and incorporates a domain dictionary into the twin network model to enhance the do-main word weight of the questions,greatly improving it's ability to recognize domain vocabulary.Experiments were conducted on both self-built dataset and the public dataset ATEC 2018 NLP.The results show that compared with the classic text matching models such as DSSM,BiMPM,ESIM,IMAF,TSFR-RM,and baseline model SBERT,SBIDD has better performance,with F1 value reaching 95.65%,an increase of 2.75%compared to the baseline model,and shows higher adaptability and robustness in retrieval tasks.
question matchingculture and tourism customer serviceSentence-BERTdomain dictionaryintelligent question and answersearch based Q&A