In order to improve the accuracy of structured classification of medical records,the conditional random field(CRF)semi supervised dictionary segmentation algorithm and the implicit Dirichlet distribution(LDA)medical records text classifica-tion algorithm ard used to build a post structured system of medical records based on CRF mechanism and LDA.The results show that:when the number of topics is 40,the minimum perplexity of LDA topic modeling is-6.97,and compared with the initial perplexity,the perplexity of LDA topic modeling decreases by 9.76%.When the number of topics is 3,the minimum consistency value obtained is 0.361;when the number of topics is 40,the maximum consistency value obtained is 0.442,and compared with the minimum value,the increase in LDA topic modeling consistency value is 22.44%.In summary,it can be seen that the CRF mechanism combined with LDA's structured medical record system has good application effects.
关键词
条件随机场/半监督词典/隐式狄利克雷分布/病历文书/文本分类
Key words
conditional random field/semi supervised dictionary/implicit Dirichlet distribution/medical record document/text classification