首页|R语言在环境流行病学数据处理中的应用——以空气污染健康影响研究为例

R语言在环境流行病学数据处理中的应用——以空气污染健康影响研究为例

扫码查看
目的 探索 R语言 tidyverse程序包(package)在环境流行病学数据处理中的应用,实现基于个人地址信息的空气污染个体暴露评估,交流 tidyverse程序包使用经验.方法 计算机模拟南京市 2017-2019 年心脑血管死亡数据,从网络在线获取南京市 2017-2019 年气象、环境污染监测数据,通过 tidyverse中 dplyr程序包进行数据的筛选、连接、汇总等操作;使用 tidyr程序包进行数据的变形和转换;使用 purrr实现遍历循环;使用经纬度计算最近监测站点暴露和反距离插值暴露.结果 使用rvest程序包的爬虫技术批量获取气象、环境污染物监测数据等数据;使用 tidy、purrr程序包进行数据清洗;使用 geosphere程序包处理空间数据,通过计算最近站点和反距离插值的方式评估个体暴露.结论 R语言 tidyverse相较于基础包拥有一致的语法、高效的数据处理能力、易于掌握等优点;在环境流行病学研究中使用 tidyverse进行数据清洗、汇总统计、暴露计算等数据处理能有效地提高效率;本研究提供了采用 R语言 tidyverse程序包进行反距离加权计算等数据处理的计算机代码,实现了对个体逐日空气污染物暴露的评估方法,为进行空气污染物暴露评估提供了有效的工具.
Application of R software in data analysis of environmental epidemiology:health effects of air pollution
Objective To implement individual exposure assessment of air pollution based on personal address information using the R language tidyverse package and exchange experience in the use of the method.Methods The data of cardiovascular and cerebro-vascular mortality in Nanjing from 2017 to 2019 were simulated with computer,and the meteorological and environmental pollutant mo-nitoring data in the same period were obtained online from the network.The data then were filtered,connected,and summarized through dplyr package in the R language tidyverse package,and then deformed and converted by the tidyr package,and achieved trav-ersal loops by the purrr package.The nearest environmental monitoring sites exposure and inverse distance weighted exposure were cal-culated by latitude and longitude method.Results Using the crawler technology of the rvest package meteorological data,environmen-tal pollutant monitoring data and others were obtained,and using tidy and purrr packages for data cleaning,using geosphere packages to process spatial data,to assess the individual exposure by calculating the nearest site and inverse distance interpolation.Conclusion Compared with the base package,the R language tidyverse has the advantages of consistent syntax,efficient data processing ability,and being easy to master.It could be improved effectively by using tidyverse for data cleaning,summary statistics,exposure calculation and other data processing in environmental epidemiological studies.This study provided the code for data processing by using the R lan-guage tidyverse package for inverse distance weighting calculation,and realized a method to evaluate individual daily air pollutants ex-posure,which provided an effective tool for conducting air pollutants exposure assessment.

environment epidemiologyR languagetidyverse packagedata processinginverse distance weighted

汤文斌、李徐凤、王玉斐、杨亮、郑浩

展开 >

常州市金坛区疾病预防控制中心,常州 213200

江苏省疾病预防控制中心

环境流行病学 R语言 tidyverse程序包 数据处理 反距离插值

江苏省卫计委预防医学课题

Y2018025

2024

环境卫生学杂志
中国疾病预防控制中心

环境卫生学杂志

CSTPCD
影响因子:0.735
ISSN:2095-1906
年,卷(期):2024.14(1)
  • 4