Abdullah Gul University Reports Findings in Cancer (Building a challenging medic al dataset for comparative evaluation of classifier capabilities)

Abdullah Gul大学报告癌症研究结果（建立一个具有挑战性的医学数据集，用于分类器能力的比较评估）

扫码查看

摘要

一位新闻记者兼机器人与机器学习的新闻编辑每日新闻-一篇关于癌症的新研究是一篇报道的主题。根据Ne wsRx记者来自土耳其凯塞里的新闻报道，研究称：“自21世纪以来，数字化一直是我们生活中的一个关键转变。然而，数字化带来了大量非结构化文本数据需要处理，包括文章、临床记录、网页和共享的社交媒体帖子。”我们的新闻编辑从阿卜杜拉·居尔大学的研究中获得了一句话：“作为一项批判性分析，分类任务将给定的文本实体分类为正确的类别。对来自不同领域的文档进行分类很简单，因为这些实例不太可能包含相似的上下文。然而，由于共享相同的上下文，单一领域的文档分类更加复杂。因此，本文利用PubMed API收集的383,914篇关于四种常见癌症类型(白血病、非霍奇金淋巴瘤、膀胱癌和甲状腺癌)的医学文章,通过构建机器学习和深度学习模型,对四种常见癌症类型(白血病、非霍奇金淋巴瘤、膀胱癌和甲状腺癌)的医学文章进行分类,建立分类模型。我们使用了广泛使用的机器学习(Logistic回归、XGBoost、CatBoost和R andom森林分类器)和现代深度学习(卷积神经网络s-CNN、长短期记忆-LSTM和门控递归单元-GRU)模型,计算了机器学习的平均分类性能(精度、召回率、检索率和检索率。F分数（F-Score）评估10个不同数据集分割的模型。表现最好的DE EP学习模型产生了98%的优越F1分数。然而，传统机器学习模型也获得了相当高的F1分数，表现最差的情况为95%。

Abstract

By a News Reporter-Staff News Editor at Robotics & Machine Learning Daily News Daily News – New research on Cancer is the subject of a report. According to news reporting originating from Kayseri, Turkey, by Ne wsRx correspondents, research stated, “Since the 2000s, digitalization has been a crucial transformation in our lives. Nevertheless, digitalization brings a bul k of unstructured textual data to be processed, including articles, clinical rec ords, web pages, and shared social media posts.” Our news editors obtained a quote from the research from Abdullah Gul University , “As a critical analysis, the classification task classifies the given textual entities into correct categories. Categorizing documents from different domains is straightforward since the instances are unlikely to contain similar contexts. However, document classification in a single domain is more complicated due to sharing the same context. Thus, we aim to classify medical articles about four c ommon cancer types (Leukemia, Non-Hodgkin Lymphoma, Bladder Cancer, and Thyroid Cancer) by constructing machine learning and deep learning models. We used 383,9 14 medical articles about four common cancer types collected by the PubMed API. To build classification models, we split the dataset into 70% as t raining, 20% as testing, and 10% as validation. We b uilt widely used machine-learning (Logistic Regression, XGBoost, CatBoost, and R andom Forest Classifiers) and modern deep-learning (convolutional neural network s - CNN, long shortterm memory - LSTM, and gated recurrent unit - GRU) models. We computed the average classification performances (precision, recall, F-score) to evaluate the models over ten distinct dataset splits. The bestperforming de ep learning model(s) yielded a superior F1 score of 98%. However, t raditional machine learning models also achieved reasonably high F1 scores, 95% for the worst-performing case.”

Key words

Kayseri/Turkey/Eurasia/Cancer/Cyborg s/Emerging Technologies/Health and Medicine/Machine Learning/Oncology

引用本文复制引用

出版年

2024

Robotics & Machine Learning Daily News

ISSN：

段落导航