Open-Domain Question Answering Task Enhanced by Multiple Documents Refinement and Fusion
Open-domain question answering(OpenQA)is a challenging task in natural language processing,the conventional machine learn-ing and deep learning techniques are commonly used to retrieve many candidate document fragments related to the question from the raw cor-pus for answer extraction.However,the candidate document fragments retrieved by current methods tend to include considerable noise and ir-relevant information to the question,and the previous OpenQA model falls short in accurately responding to questions that necessitate multiple document fragments as correlative evidence.Therefore,this paper proposes an open-domain question answering method based on refinement and fusion of multiple documents(RFMD).Specifically,RFMD designs a Transformer-based document refinement module during the retrieval stage to reduce noise information in the candidate documents.In the reading comprehension stage,RFMD employs a text generation-focused question answering module.By constructing a global attention mechanism across document fragments,it integrates information from multiple relevant document fragments to accurately answer questions that require multiple document fragments as supporting evidence.RFMD achieves EM scores of 45.8%and 63.4%on the NaturalQuestions and TriviaQA datasets respectively,verifying the effectiveness and superiority of the model in OpenQA tasks.