摘要
随着社交媒体的飞速发展,幽默识别任务受到研究者广泛关注.其目标是判断给定文本是否具有幽默表达.现有方法主要基于幽默产生理论,采用规则或神经网络模型提取多种幽默相关特征,如不一致性、情感和语音等.然而,这些方法未充分捕捉文本内部的情感特征,忽视了隐含在幽默文本中的情感表达,影响了幽默识别的准确性.为解决此问题,该文提出了CMSOR方法,以动态常识和多维语义特征为驱动.首先,利用外部常识信息从文本中动态推理说话者的隐式情感表达;然后,引入 WordNet词典计算词级语义距离,捕捉不一致性,并计算模糊性特征;最后,基于这三个特征维度构建幽默语义,实现幽默识别.实验证明,CMSOR模型相对于当前基准模型在三个公开数据集上的识别性能均有显著提升.
Abstract
As an emerging topic in NLP,humor recognition is to discriminate whether a given text expresses humor.To fully capture emotional features within the text,we propose CMSOR method based on dynamic commonsense reasoning and multi-dimension semantic features.It adopts the commonsense to infer latent emotion feature of speakers from the text,then leverages WordNet lexicon to calculate word level distances as the inconsistent features and the ambiguous features.We make use of these three humor-specific features to construct humor semantics.Ex-periments on three publicly available benchmarks demonstrate that CMSOR is superior to state-of-the-art models.