摘要
临床多导睡眠数据的采集和标注耗时长且成本高,被测人群、采集设备和标注专家等因素的不同,使得采集的数据存在差异,增加了睡眠相关研究的难度和复杂度.与众多其他临床开源数据资源相同,面向睡眠研究的生理信号开源数据集的建立为全球相关研究者提供了丰富的数据资源和统一的对比平台,促进了睡眠医学领域研究的深入发展.为此,综述了在睡眠领域中常见的18个开源数据集的概况、特点及应用,这些数据集包括脑电图(EEG)、心电图(ECG)、眼电图(EOG)和肌电图(EMG)等生理信号以及涵盖睡眠障碍、心血管疾病和肥胖症等多个临床领域;总结了现有睡眠开源数据集在数据质量、数据标准、数据安全、样本代表性和外部有效性等方面存在的局限,提出了针对性的建议与展望.
Abstract
The collection and labeling of clinical polysomnography data are time-consuming and costly,and the differences between different populations,collection devices,and expert labeling create challenges for sleep-related research.The open-source datasets provide rich data resources and a unified comparison platform for global researchers to conduct sleep studies.This paper reviewed the characteristics and applications of 18 open-source datasets commonly used in the field of sleep.The datasets include electroencephalogram(EEG),electrocardiogram(ECG),electro-oculogram(EOG),electromyography(EMG),etc.,covering multiple clinical fields such as sleep disorders,cardiovascular diseases,obesity,etc.,promoting in-depth research in the field of sleep medicine.This paper also summarized the limitations of existing sleep open-source datasets in terms of data quality,data standards,data security,sample representation and external validity,and put forward specific suggestions and prospects.
基金项目
国家自然科学基金(62171123)
国家自然科学基金(62101129)
国家自然科学基金(62211530112)
国家重点研发计划项目(2023YFC3603600)