Standardization of Chinese Medical Terminology Based on Multi-Strategy Comparison Learning
[Objective]To address the challenges of short texts,high similarity,and single and multiple entailments in the standardization of Chinese medical terminology,this paper proposes a research framework based on the fusion of multiple strategy comparison learning for recall-ranking-quantity prediction.[Methods]Firstly,we integrated text statistical and deep semantic features to retrieve candidate entities.Based on similarity scores,we obtained the candidate set.Secondly,we combined candidate ranking with original terms,standard entities,and candidate entities from recall by training vector representations with pre-trained models and contrastive learning strategies,followed by reordering based on cosine similarity.Next,we updated the vector representations of original terms through multi-head attention to predict the number of standard entities from the original terms.Finally,we selected the standard entities based on the quantity prediction results by integrating the similarity scores of candidate recall and ranking.[Results]We examined the new model on the Chinese medical terminology normalization dataset Yidu-N7k.Compared with statistical models and mainstream deep learning models,the proposed framework achieved an accuracy of 92.17%.This represents an improvement of at least 0.94%over the pre-trained binary classification baseline model.Additionally,on a dataset of 150 expert-labeled reports of mammography examinations for female breast cancer,the new framework's accuracy reached 97.85%,achieving the best performance.[Limitations]The experiments are only conducted on medical datasets,and the effectiveness in other domains needs further exploration.[Conclusions]A multi-strategy candidate recall can comprehensively consider text information to address the challenge of short text.Contrastive learning candidate rank can capture subtle textual differences to address the challenge of high similarity.Quantity prediction with multi-head attention can enhance vector representation and address the challenges of single and multiple entailments.The proposed method provides the potential for promoting medical information mining and clinical research.
Medical Terminology NormalizationMulti-Strategy Candidate RecallContrastive LearningBreast Cancer MammographyExamination Report