首页|An Approach to Identify the Complete Reduplicated Multiword Expressions in Digital Bengali Text

An Approach to Identify the Complete Reduplicated Multiword Expressions in Digital Bengali Text

扫码查看
This work presents an approach recognizing the complete reduplication of bi-word multiword expressions in a robust Bengali dataset. Reduplication, denoting the repetition of any language unit in linguistic studies, is a crucial aspect of identifying multiword expressions. The proposed method performs in two stages: the first stage includes pre-processing activities, and the second involves identifying bi-gram word pairs using two different methods and a comprehensive validation to find the accuracy of the proposed system. The proposed approach, employing the Levenshtein distance method, achieves a significant accuracy of 99% for three categories of bi-gram combinations of complete reduplicated multiword expressions. It exhibits a notable improvement of 1%, surpassing the result of the related work.

BCRMEComplete reduplicationMultiword expressionsPart-of-speech taggingNatural language processing

Subrata Pan

展开 >

Department of Information Technology, Bankura Unnayani Institute of Engineering, Bankura 722146, West Bengal, India

2025

Journal of The Institution of Engineers (India), Series B. Electrical eingineering, electronics and telecommunication engineering, computer engineering