首页期刊导航|Methods of information in medicine
期刊信息/Journal information
Methods of information in medicine
Schattauer
Methods of information in medicine

Schattauer

0026-1270

Methods of information in medicine/Journal Methods of information in medicineSCIAHCIISTP
正式出版
收录年代

    Security and Privacy in Distributed Health Care Environments

    Stephen V. FlowerdayChristos Xenakis
    2页
    查看更多>>摘要:There is an increasing demand for distributed health care systems. Nevertheless, distributed health care environments do not come without risks. At the same time that distributed health care systems are growing, so are the cybersecurity threats targeting them. Additionally, the demand for compliance to new regulations increases as these distributed health care systems hold sensitive patient data. The use of data-driven technologies presents a promising opportunity for significant advances in the field toward improved health care for patients and the general public.1,2 Several recent studies have highlighted the importance and the necessity of developing a data-driven approach where patient data are collected, analyzed, and leveraged for medical research purposes with the help of different types of artificial intelligence. To address the privacy-related challenges, novel methods, such as protection of personal health information, ensuring compliance, guaranteeing FAIR information processing, and building of trust, are required. In this issue, new paradigms and prominent applications are presented for secure, trustworthy, and privacy-preserving data sharing and knowledge representation to address the emerging needs.

    A Semi-Automated Term Harmonization Pipeline Applied to Pulmonary Arterial Hypertension Clinical Trials

    Ryan J. UrbanowiczJohn H. HolmesDina ApplebyVanamala Narasimhan...
    8页
    查看更多>>摘要:Objective Data harmonization is essential to integrate individual participant data from multiple sites, time periods, and trials for meta-analysis. The process of mapping terms and phrases to an ontology is complicated by typographic errors, abbreviations, truncation, and plurality. We sought to harmonize medical history (MH) and adverse events (AE) term records across 21 randomized clinical trials in pulmonary arterial hypertension and chronic thromboembolic pulmonary hypertension. Methods We developed and applied a semi-automated harmonization pipeline for use with domain-expert annotators to resolve ambiguous term mappings using exact and fuzzy matching. We summarized MH and AE term mapping success, including map quality measures, and imputation of a generalizing term hierarchy as defined by the applied Medical Dictionary for Regulatory Activities (MedDRA) ontology standard. Results Over 99.6% of both MH (N = 37,105) and AE (N = 58,170) records were successfully mapped to MedDRA low-level terms. Automated exact matching accounted for 74.9% of MH and 85.5% of AE mappings. Term recommendations from fuzzy matching in the pipeline facilitated annotator mapping of the remaining 24.9% of MH and 13.8% of AE records. Imputation of the generalized MedDRA term hierarchy was unambiguous in 85.2% of high-level terms, 99.4% of high-level group terms, and 99.5% of system organ class in MH, and 75% of high-level terms, 98.3% of high-level group terms, and 98.4% of system organ class in AE. Conclusion This pipeline dramatically reduced the burden of manual annotation for MH and AE term harmonization and could be adapted to other data integration efforts.

    Ambiguous and Incomplete: Natural Language Processing Reveals Problematic Reporting Styles in Thyroid Ultrasound Reports

    Peggy L. PeissigEneida A. MendoncaDavid F. SchneiderPriya H. Dedhia...
    8页
    查看更多>>摘要:Objective Natural language processing (NLP) systems convert unstructured text into analyzable data. Here, we describe the performance measures of NLP to capture granular details on nodules from thyroid ultrasound (US) reports and reveal critical issues with reporting language. Methods We iteratively developed NLP tools using clinical Text Analysis and Knowledge Extraction System (cTAKES) and thyroid US reports from 2007 to 2013. We incorporated nine nodule features for NLP extraction. Next, we evaluated the precision, recall, and accuracy of our NLP tools using a separate set of US reports from an academic medical center (A) and a regional health care system (B) during the same period. Two physicians manually annotated each test-set report. A third physician then adjudicated discrepancies. The adjudicated "gold standard" was then used to evaluate NLP performance on the test-set. Results A total of 243 thyroid US reports contained 6,405 data elements. Inter-annotator agreement for all elements was 91.3%. Compared with the gold standard, overall recall of the NLP tool was 90%. NLP recall for thyroid lobe or isthmus characteristics was: laterality96%and size 95%. NLP accuracy for nodule characteristics was: laterality 92%, size 92%, calcifications 76%, vascularity 65%, echogenicity 62%, contents 76%, and borders 40%. NLP recall for presence or absence of lymphadenopa-thy was 61%. Reporting style accounted for 18% errors. For example, the word "heterogeneous" interchangeably referred to nodule contents or echogenicity. While nodule dimensions and laterality were often described, US reports only described contents, echogenicity, vascularity, calcifications, borders, and lymphadenopathy, 46, 41, 17, 15, 9, and 41% of the time, respectively. Most nodule characteristics were equally likely to be described at hospital A compared with hospital B. Conclusions NLP can automate extraction of critical information from thyroid US reports. However, ambiguous and incomplete reporting language hinders performance of NLP systems regardless of institutional setting. Standardized or synoptic thyroid US reports could improve NLP performance.

    A Comparison of Methods to Detect Changes in Prediction Models

    Erin M. SchnellingerWei YangMichael O. HarhayStephen E. Kimmel...
    10页
    查看更多>>摘要:Background Prediction models inform decisions in many areas of medicine. Most models are fitted once and then applied to new (future) patients, despite the fact that model coefficients can vary over time due to changes in patients'clinical characteristics and disease risk. However, the optimal method to detect changes in model parameters has not been rigorously assessed. Methods We simulated data, informed by post-lung transplant mortality data and tested the following two approaches for detecting model change: (1) the "Direct Approach," it compares coefficients of the model refit on recent data to those at baseline; and (2) "Calibration Regression," it fits a logistic regression model of the log-odds of the observed outcomes versus the linear predictor from the baseline model (i.e., the log-odds of the predicted probabilities obtained from the baseline model) and tests whether the intercept and slope differ from 0 and 1, respectively. Four scenarios were simulated using logistic regression for binary outcomes as follows: (1) we fixed all model parameters, (2) we varied the outcome prevalence between 0.1 and 0.2, (3) we varied the coefficient of one of the ten predictors between 0.2 and 0.4, and (4) we varied the outcome prevalence and coefficient of one predictor simultaneously. Results Calibration regression tended to detect changes sooner than the Direct Approach, with better performance (e.g., larger proportion of true claims). When the sample size was large, both methods performed well. When two parameters changed simultaneously, neither method performed well. Conclusion Neither change detection method examined here proved optimal under all circumstances. However, our results suggest that if one is interested in detecting a change in overall incidence of an outcome (e.g., intercept), the Calibration Regression method may be superior to the Direct Approach. Conversely, if one is interested in detecting a change in other model covariates (e.g., slope), the Direct Approach may be superior.

    Identifying Pneumonia Subtypes from Electronic Health Records Using Rule-Based Algorithms

    Harshad HegdeIngrid GlurichAloksagar PannyJayanth G. Vedre...
    9页
    查看更多>>摘要:Background The International Classification of Disease (ICD) coding for pneumonia classification is based on causal organism or use of general pneumonia codes, creating challenges for epidemiological evaluations where pneumonia is standardly subtyped by settings, exposures, and time of emergence. Pneumonia subtype classification requires data available in electronic health records (EHRs), frequently in nonstructured formats including radiological interpretation or clinical notes that complicate electronic classification. Objective The current study undertook development of a rule-based pneumonia subtyping algorithm for stratifying pneumonia by the setting in which it emerged using information documented in the EHR. Methods Pneumonia subtype classification was developed by interrogating patient information within the EHR of a large private Health System. ICD coding was mined in the EHR applying requirements for "rule of two" pneumonia-related codes or one ICD code and radiologically confirmed pneumonia validated by natural language processing and/or documented antibiotic prescriptions. A rule-based algorithm flow chart was created to support subclassification based on features including symptomatic patient point of entry into the health care system timing of pneumonia emergence and identification of clinical, laboratory, or medication orders that informed definition of the pneumonia subclassification algorithm. Results Data from 65,904 study-eligible patients with 91,998 episodes of pneumonia diagnoses documented by 380,509 encounters were analyzed, while 8,611 episodes were excluded following Natural Language Processing classification of pneumonia status as "negative" or "unknown." Subtyping of 83,387 episodes identified: community-acquired (54.5%), hospital-acquired (20%), aspiration-related (10.7%), health care-acquired (5%), and ventilator-associated (0.4%) cases, and 9.4% cases were not classifiable by the algorithm. Conclusion Study outcome indicated capacity to achieve electronic pneumonia subtype classification based on interrogation of big data available in the EHR. Examination of portability of the algorithm to achieve rule-based pneumonia classification in other health systems remains to be explored.

    A Methodological Approach to Validate Pneumonia Encounters from Radiology Reports Using Natural Language Processing

    AlokSagar PannyHarshad HegdeIngrid GlurichFrank A. Scannapieco...
    8页
    查看更多>>摘要:Introduction Pneumonia is caused by microbes that establish an infectious process in the lungs. The gold standard for pneumonia diagnosis is radiologist-documented pneumonia-related features in radiology notes that are captured in electronic health records in an unstructured format. Objective The study objective was to develop a methodological approach for assessing validity of a pneumonia diagnosis based on identifying presence or absence of key radiographic features in radiology reports with subsequent rendering of diagnostic decisions into a structured format. Methods A pneumonia-specific natural language processing (NLP) pipeline was strategically developed applying Clinical Text Analysis and Knowledge Extraction System (cTAKES) to validate pneumonia diagnoses following development of a pneumonia feature-specific lexicon. Radiographic reports of study-eligible subjects identified by International Classification of Diseases (ICD) codes were parsed through the NLP pipeline. Classification rules were developed to assign each pneumonia episode into one of three categories: "positive," "negative," or "not classified: requires manual review" based on tagged concepts that support or refute diagnostic codes. Results A total of 91,998 pneumonia episodes diagnosed in 65,904 patients were retrieved retrospectively. Approximately 89% (81,707/91,998) of the total pneumonia episodes were documented by 225,893 chest X-ray reports. NLP classified and validated 33% (26,800/81,707) of pneumonia episodes classified as "Pneumonia-positive," 19% as (15401/81,707) as "Pneumonia-negative," and 48% (39,209/81,707) as "episode classification pending further manual review." NLP pipeline performance metrics included accuracy (76.3%), sensitivity (88%), and specificity (75%). Conclusion The pneumonia-specific NLP pipeline exhibited good performance comparable to other pneumonia-specific NLP systems developed to date.

    Automated Identification of Immunocompromised Status in Critically Ill Children

    Swaminathan KaswamyEvan W. OrensteinElizabeth QuincerAlfred J. Fernez...
    9页
    查看更多>>摘要:Background Easy identification of immunocompromised hosts (ICHs) would allow for stratification of culture results based on host type. Methods We utilized antimicrobial stewardship program (ASP) team notes written during handshake stewardship rounds in the pediatric intensive care unit (PICU) as the gold standard for host status; clinical notes from the primary team, medication orders during the encounter, problem list, and billing diagnoses documented prior to the ASP documentation were extracted to develop models that predict host status. We calculated performance for three models based on diagnoses/medications, with and without natural language processing from clinical notes. The susceptibility of pathogens causing bacteremia to commonly used empiric antibiotic regimens was then stratified by host status. Results We identified 844 antimicrobial episodes from 666 unique patients; 160 (18.9%) were identified as ICHs. We randomly selected 675 initiations (80%) for model training and 169 initiations (20%) for testing. A rule-based model using diagnoses and medications alone yielded a sensitivity of 0.87 (08.6-0.88), specificity of 0.93 (0.92-0.93), and positive predictive value (PPV) of 0.74 (0.73-0.75). Adding clinical notes into XGBoost model led to improved specificity of 0.98 (0.98-0.98) and PPV of 0.9 (0.88-0.91), but with decreased sensitivity 0.77 (0.76-0.79). There were 77 bacteremia episodes during the study period identified and a host-specific visualization was created. Conclusions An electronic health record-based phenotype based on notes, diagnoses, and medications identifies ICH in the PICU with high specificity.

    Predicting Hospital Readmissions from Health Insurance Claims Data: A Modeling Study Targeting Potentially Inappropriate Prescribing

    Alexer GerharzCarmen RuffLucas WirbkaFelicitas Stoll...
    6页
    查看更多>>摘要:Background Numerous prediction models for readmissions are developed from hospital data whose predictor variables are based on specific data fields that are often not transferable to other settings. In contrast, routine data from statutory health insurances (in Germany) are highly standardized, ubiquitously available, and would thus allow for automatic identification of readmission risks. Objectives To develop and internally validate prediction models for readmissions based on potentially inappropriate prescribing (PIP) in six diseases from routine data. Methods In a large database of German statutory health insurance claims, we detected disease-specific readmissions after index admissions for acute myocardial infarction (AMI), heart failure (HF), a composite of stroke, transient ischemic attack or atrial fibrillation (S/AF), chronic obstructive pulmonary disease (COPD), type-2 diabetes mellitus(DM), and osteoporosis (OS). PIP at the index admission was determined by the STOPP/START criteria (Screening Tool of Older Persons' Prescriptions/Screening Tool to Alert doctors to the Right Treatment) which were candidate variables in regularized prediction models for specific readmission within 90 days. The risks from disease-specific models were combined ("stacked") to predict all-cause readmission within 90 days. Validation performance was measured by the c-statistics. Results While the prevalence of START criteria was higher than for STOPP criteria, more single STOPP criteria were selected into models for specific readmissions. Performance in validation samples was the highest for DM (c-statistics: 0.68 [95% confidence interval (CI): 0.66-0.70]), followed by COPD (c-statistics: 0.65 [95% CI: 0.64-0.67]), S/AF (c-statistics: 0.65 [95% CI: 0.63-0.66]), HF (c-statistics: 0.61 [95% CI: 0.60-0.62]), AMI (c-statistics: 0.58 [95% CI: 0.56-0.60]), and OS (c-statistics: 0.51 [95% CI: 0.47-0.56]). Integrating risks from disease-specific models to a combined model for all-cause readmission yielded a c-statistics of 0.63 [95% CI: 0.63-0.64]. Conclusion PIP successfully predicted readmissions for most diseases, opening the possibility for interventions to improve these modifiable risk factors. Machine-learning methods appear promising for future modeling of PIP predictors in complex older patients with many underlying diseases.