Emad M. Al-ShawakfaQasem A. Al-RadaidehAhmed F. Al-Eroud
111-131页
查看更多>>摘要:Detecting and filtering e-mail alerts that are related to criminal or terrorist activities is of great interest for both security agencies and people. This paper evaluates and compares the performance of both the rule-based filter and Paul Graham statistical filter for detecting alerts in Arabic e-mail messages. To evaluate the two filters, a set of 1500 Arabic messages related to criminal activities were collected manually from some news websites such as Al-Jazeera Net and BBC Arabic news. The e-mails have been preprocessed, normalized, and then the relevant features were extracted from the collected e-mails by involving categorical proportional difference (CPD) and term frequency variance (TFV) as features weighting methods for the rule-based filter. To test the performance of the two filters, several experiments have been conducted and the result show that the Paul Graham statistical filter was more accurate. It was able to detect about 85% of the e-mail alerts used in the experiments. The rule-based filter has achieved 80% accuracy using the CPD method and 70% accuracy using the TFV method.
查看更多>>摘要:In this paper, we present a hybrid approach for Word Sense Disambiguation of Arabic Language (called WSD-AL), that combines unsupervised and knowledge-based methods. Some pre-processing steps are applied to texts containing the ambiguous words in the corpus (1500 texts extracted from the web), and the salient words that affect the meaning of these words are extracted. After that a Context Matching algorithm is used, it returns a semantic coherence score corresponding to the context of use that is semantically closest to the original sentence. The contexts of use are generated using the glosses of the ambiguous word and the corpus. The results found by the proposed system are satisfactory; we have achieved a precision of 79%.
查看更多>>摘要:This article focuses on the development of Natural Language Processing (NLP) tools for Computer Assisted Language Learning (CALL). First, we have developed some NLP tools: a labelled dictionary of Arabic (as complete as possible), a generator for morphological derivatives, a Conjugator and a morphological analyzer for Arabic. Second, we used these tools to create a number of educational applications for learning the Arabic language by using the proposed system SALA (an NLP-based authoring system, organized into three distinct layers: functions, scripts and activities).
Hanan N. Abu ObiedMaryam S. NuserMohammed N. Al-Kabi
167-188页
查看更多>>摘要:Romanization is used to phonetically translate names and technical terms from languages in non-Roman alphabets to languages in Roman alphabets. Because almost all dictionaries contain standard English forms for some Arabic names, this problem has been solved using machine transliteration. Several programs exist to deal with transliteration; they are based either on dictionary-based approach or on rule-based approach. In this study, a comparison between these two approaches is shown. Test data from the Yarmouk University library were used. Results show that while a rule-based Romanizer can romanize all names, a dictionary-based Romanizer romanizes (86%) of tested names. On the other hand, another kind of test was performed over the Romanization rules used by each Romanizer; the results show that the Romanization rules (in terms of accuracy and usability) used by the Dictionary-based Romanizer used in this study are better than the ones used by Rule-based Romanizer.
Bassam H. HammoAzzam SleitAladdin BaarahHani Abu-Salem...
189-206页
查看更多>>摘要:In this paper, we present a simple mining technique named the Quran Mining Technique (QMT) in an attempt to automatically classify the Suras (i.e. chapters) of the Quran based on predefined set of 10 themes. QMT is composed mainly of two phases: a preprocessing phase and a classification phase. In the first phase, we manually label a set of representative words for ten predefined themes. In the second phase we use the QMT on a set of 14 Suras (the total number of Suras is 30) using a scoring function (SF) to identify their themes. The results of QMT are compared with the results obtained from expert scholars in the field of Quranic studies, which we used as a benchmark. The average accuracy of the QMT classifier shows a result close to 79%.