CatRevenge: towards efective revenge text detection in online social media with paragraph embedding and CATBoost

扫码查看

原文链接

NETL
NSTL

外文摘要：Huge amount of internet data are produced and consumed by internet users, where most of the data are in natural language and they express their feelings, emotions and thoughts on social media. It is the responsibility of the social media provider to provide healthy com- munication system among users. It is very challenging job to detect revenge from the social media text due to long sentences where semantic relation dissolves between tokens. Due to that, the social media providers did not provide any attention towards identifying the users spreading revenge. This article propose a novel model named as CatRevenge which identi- fes both active and passive revenge. This model preprocess with Slangzy internet slang meaning dictionary to detect revenge text more efciently. CatRevenge assigns impact weight on each of parts of speech in the sentences based on its relevance and TF-IDF score of the words. The novel CatRevenge model also considers the paragraph embedding model for contextual semantic analysis of revenge text. In addition, this research applies gradi- ent boosting CATBoost classifer with categorical features to reduce model overftting. This feature ranking method can able to reduce the dimensionality of data by ranking the most signifcant feature. This research considers the revenge posts English language dataset from the Reddit social media where it evaluated with binary and multiclass classifcation. Results demonstrate achievable performance with a 6-10% increase in binary and a 2.5 -5% increase in multiclass with weighted F1 metric.

外文关键词：

Online Social NetworkParagraph EmbeddingText ClassifcationCATBoostNatural Language Processing

作者：

Sayani Ghosal、Amita Jain

展开 >

作者单位：

NSUT East Campus(Erstwhile A.I.A.C.T.R.),Guru Gobind Singh Indraprastha University, Dwarka,Delhi,India||KIET Group of Institutions,Ghaziabad Delhi-NCR,India

Netaji Subhas University of Technology,New Delhi,India

出版年：

2024

DOI：

10.1007/s11042-024-18791-y

Multimedia tools and applications

EISCI

ISSN：1380-7501

年,卷(期)：2024.83(42)