Automated Kaomoji Extraction Based on Large-scale Danmaku Texts
As a new type of emoticon symbol that emerged in the Internet age,kaomoji not only enjoys popularity among Internet users and mainstream social media but also has indispensable value in emotional expression,cultural promotion,and other as-pects.Considering that kaomoji carries rich semantic and emotional information,studying them in the context of Internet texts can promote the analysis and understanding of such texts,thus improving the effectiveness of various natural language processing tasks.Detecting and extracting kaomoji from texts are the primary steps in analyzing texts with kaomoji.However,due to the flexible structure,diverse types,and rapid evolution of kaomoji,most existing works lack a comprehensive analysis of kaomoji,re-sulting in limitations such as low accuracy,difficulty in determining boundaries,and poor timeliness.In this paper,through an in-depth analysis of kaomoji features,a kaomoji detection and extraction algorithm called Emoly based on a large-scale danmaku text dataset is proposed.It extracts preliminary candidate strings through preprocessing methods,combines various improved statisti-cal indicators and filtering rules to select the final candidate strings,and ranks them based on text similarity to produce the final results.Experimental results show that the Emoly algorithm achieves a recall rate of 91%in a dataset of millions of danmaku texts,effectively and accurately detecte and extracte kaomoji from the texts.It demonstrates robustness,superiority,and generali-ty.Additionally,the proposed algorithm provides new ideas and methods for tasks such as Chinese word segmentation,sentiment analysis,and input method dictionary updates,offering broad application value.
Natural language processingData analysisKaomojiVideo danmaku