Most of the current problem generation models are based on the Transformer structure,but as the text length increases,the KV caching mechanism of the Transformer leads to a linear increase in GPU occupancy,a decrease in throughput,and an increase in inference cost.To solve this problem,RetNet model was used to construct RetNet-Bert problem generation model.The model uses the multi-scale holding mechanism instead of the multi-head attention mechanism,and has the dual form of parallel and cyclic,which improves the inference efficiency.Experiments prove that RetNet-Bert performs better on long sequence modeling,while achieving training parallelism,low-cost deployment and efficient inference,and achieves a high level of feasibility and effectiveness on the building municipal information generation problem.
关键词
问题生成模型/RetNet模型/长序列建模/建筑市政信息
Key words
problem generation model/RetNet model/long sequence modeling/construction and municipal information