A stratified review of COVID-19 infection forecasting and an efficient methodology using multiple domain-based transfer learning

扫码查看

原文链接

NETL
NSTL
Elsevier

外文摘要：The initial outbreak of COVID-19 was reported in December 2019, China;. The pandemic has led to unforeseen challenges, causing unimaginable devastation of the economic and social disruption since its inception. An effective approach for forecasting infections will be beneficial for the health sector and administration in better strategic planning and proficient management of all necessary schemes towards preventive and curative treatments. Most existing studies consider image dataset for COVID-19 prediction, whereas studies involving structural data are very rare. Thus, initially the main focus of this paper is to provide an exhaustive review that discusses about COVID-19 forecasting papers with emphasis on structural data. Then, this paper introduces a pioneering approach to COVID-19 infection forecasting, utilizing structural datasets instead of traditional image datasets. It presents a novel multi-source transfer-learning framework to enhance prediction accuracy, integrating demographic, economic, and COVID-19 data for intra-provincial spread forecasts. The COVID-19 forecasting depends on several parameters such as its current statistics, geographical area, population density and economic status like GDP etc. However, the dataset generated for an individual province of a country is alone inadequate for the precise forecast, as it faces data scarcity. Thus, transfer learning helps in such cases, where the dataset has been collected from multiple provinces. Since, it is a time-series data, thus we also consider lagged features for efficient prediction of COVID cases. Thus, apart from the detailed review, this study also aims to develop robust machine learning models by proposing a novel and efficient multi-source transfer learning technique for accurate forecasting of COVID-19 in a province. The proposed approach has been evaluated over a wide range of datasets involving sixty-two different provinces belonging to a diverse set of countries. We also performed hyperparameter tuning using Bayesian optimisation to optimise the machine learning models used. Later, we performed Friedman and Nemenyi test to compare the results generated from different models. Empirical evidence proved that forecasting using the proposed approach is much more precise with the simpler models such as decision trees as compared to complex models. In cases of data scarcity, when target domain data could not be used for training/fine-tuning the models, simpler models are far more powerful due to their generalization capabilities than complex models. Hence, the proposed methodology is promising and valuable for governments and organizations to deal with the challenges of any pandemic outbreak for better healthcare planning and management, even when the data is in scarcity.

外文关键词：

COVID-19Gross domestic product (GDP)Infection forecastingMachine learning regression modelsMulti-source domain datasetMulti-source transfer learningProvince-specific data

作者：

Sandeep Kumar、Sonakshi Garg、Pranab K. Muhuri

展开 >

作者单位：

Department of Computer Science, South Asian University, Rajpur Road, Maidan Garhi, New Delhi 110068, India||Department of Genetics Genomics and Informatics, The University of Tennessee, Health Science Center, Memphis, TN 38163, USA

Department of Computer Science, South Asian University, Rajpur Road, Maidan Garhi, New Delhi 110068, India||Department of Computing Science, Umea University, 90187 Umea, Sweden

Department of Computer Science, South Asian University, Rajpur Road, Maidan Garhi, New Delhi 110068, India

出版年：

2025

DOI：

10.1016/j.eswa.2024.125277

Expert systems with applications

SCI

ISSN：0957-4174

年,卷(期)：2025.262(Mar.)

参考文献量92