首页期刊导航|Journal of biomedical informatics.
期刊信息/Journal information
Journal of biomedical informatics.
Academic Press,
Journal of biomedical informatics.

Academic Press,

1532-0464

Journal of biomedical informatics./Journal Journal of biomedical informatics.
正式出版
收录年代

    Semantic transference for enriching multilingual biomedical knowledge resources

    10页
    查看更多>>摘要:Biomedical knowledge resources (KRs) are mainly expressed in English, and many applications using them suffer from the scarcity of knowledge in non-English languages. The goal of the present work is to take maximum profit from existing multilingual biomedical KRs lexicons to enrich their non-English counterparts. We propose to combine different automatic methods to generate pair-wise language alignments. More specifically, we use two well-known translation methods (GIZA++ and Moses), and we propose a new ad hoc method specially devised for multilingual KRs. Then, resulting alignments are used to transfer semantics between KRs across their languages. Transference quality is ensured by checking the semantic coherence of the generated alignments. Experiments have been carried out over the Spanish, French and German UMLS Metathesaurus counterparts. As a result, the enriched Spanish KR can grow up to 1,514,217 concepts (originally 286,659), the French KR up to 1,104,968 concepts (originally 83,119), and the German KR up to 1,136,020 concepts (originally 86,842). (C) 2015 Elsevier Inc. All rights reserved.

    A study of active learning methods for named entity recognition in clinical text

    8页
    查看更多>>摘要:Objectives: Named entity recognition (NER), a sequential labeling task, is one of the fundamental tasks for building clinical natural language processing (NLP) systems. Machine learning (ML) based approaches can achieve good performance, but they often require large amounts of annotated samples, which are expensive to build due to the requirement of domain experts in annotation. Active learning (AL), a sample selection approach integrated with supervised ML, aims to minimize the annotation cost while maximizing the performance of ML-based models. In this study, our goal was to develop and evaluate both existing and new AL methods for a clinical NER task to identify concepts of medical problems, treatments, and lab tests from the clinical notes.

    Exploiting the UMLS Metathesaurus for extracting and categorizing concepts representing signs and symptoms to anatomically related organ systems

    9页
    查看更多>>摘要:Objective: To develop a method to exploit the UMLS Metathesaurus for extracting and categorizing concepts found in clinical text representing signs and symptoms to anatomically related organ systems. The overarching goal is to classify patient reported symptoms to organ systems for population health and epidemiological analyses.

    A probabilistic topic model for clinical risk stratification from electronic health records

    9页
    查看更多>>摘要:Background and objective: Risk stratification aims to provide physicians with the accurate assessment of a patient's clinical risk such that an individualized prevention or management strategy can be developed and delivered. Existing risk stratification techniques mainly focus on predicting the overall risk of an individual patient in a supervised manner, and, at the cohort level, often offer little insight beyond a flat score-based segmentation from the labeled clinical dataset. To this end, in this paper, we propose a new approach for risk stratification by exploring a large volume of electronic health records (EHRs) in an unsupervised fashion.

    The cost of quality: Implementing generalization and suppression for anonymizing biomedical data with minimal information loss

    12页
    查看更多>>摘要:Objective: With the ARX data anonymization tool structured biomedical data can be de-identified using syntactic privacy models, such as k-anonymity. Data is transformed with two methods: (a) generalization of attribute values, followed by (b) suppression of data records. The former method results in data that is well suited for analyses by epidemiologists, while the latter method significantly reduces loss of information. Our tool uses an optimal anonymization algorithm that maximizes output utility according to a given measure. To achieve scalability, existing optimal anonymization algorithms exclude parts of the search space by predicting the outcome of data transformations regarding privacy and utility without explicitly applying them to the input dataset. These optimizations cannot be used if data is transformed with generalization and suppression. As optimal data utility and scalability are important for anonymizing biomedical data, we had to develop a novel method.

    A new cluster-based oversampling method for improving survival prediction of hepatocellular carcinoma patients

    11页
    查看更多>>摘要:Liver cancer is the sixth most frequently diagnosed cancer and, particularly, Hepatocellular Carcinoma (HCC) represents more than 90% of primary liver cancers. Clinicians assess each patient's treatment on the basis of evidence-based medicine, which may not always apply to a specific patient, given the biological variability among individuals. Over the years, and for the particular case of Hepatocellular Carcinoma, some research studies have been developing strategies for assisting clinicians in decision making, using computational methods (e.g. machine learning techniques) to extract knowledge from the clinical data. However, these studies have some limitations that have not yet been addressed: some do not focus entirely on Hepatocellular Carcinoma patients, others have strict application boundaries, and none considers the heterogeneity between patients nor the presence of missing data, a common drawback in healthcare contexts. In this work, a real complex Hepatocellular Carcinoma database composed of heterogeneous clinical features is studied. We propose a new cluster-based oversampling approach robust to small and imbalanced datasets, which accounts for the heterogeneity of patients with Hepatocellular Carcinoma. The preprocessing procedures of this work are based on data imputation considering appropriate distance metrics for both heterogeneous and missing data (HEOM) and clustering studies to assess the underlying patient groups in the studied dataset (K-means). The final approach is applied in order to diminish the impact of underlying patient profiles with reduced sizes on survival prediction. It is based on K-means clustering and the SMOTE algorithm to build a representative dataset and use it as training example for different machine learning procedures (logistic regression and neural networks). The results are evaluated in terms of survival prediction and compared across baseline approaches that do not consider clustering and/or oversampling using the Friedman rank test. Our proposed methodology coupled with neural networks outperformed all others, suggesting an improvement over the classical approaches currently used in Hepatocellular Carcinoma prediction models. (C) 2015 Elsevier Inc. All rights reserved.

    Comparison of machine learning classifiers for influenza detection from emergency department free-text reports

    10页
    查看更多>>摘要:Influenza is a yearly recurrent disease that has the potential to become a pandemic. An effective bio-surveillance system is required for early detection of the disease. In our previous studies, we have shown that electronic Emergency Department (ED) free-text reports can be of value to improve influenza detection in real time. This paper studies seven machine learning (ML) classifiers for influenza detection, compares their diagnostic capabilities against an expert-built influenza Bayesian classifier, and evaluates different ways of handling missing clinical information from the free-text reports. We identified 31,268 ED reports from 4 hospitals between 2008 and 2011 to form two different datasets: training (468 cases, 29,004 controls), and test (176 cases and 1620 controls). We employed Topaz, a natural language processing (NLP) tool, to extract influenza-related findings and to encode them into one of three values: Acute, Non-acute, and Missing. Results show that all ML classifiers had areas under ROCs (AUC) ranging from 0.88 to 0.93, and performed significantly better than the expert-built Bayesian model. Missing clinical information marked as a value of missing (not missing at random) had a consistently improved performance among 3 (out of 4) ML classifiers when it was compared with the configuration of not assigning a value of missing (missing completely at random). The case/control ratios did not affect the classification performance given the large number of training cases. Our study demonstrates ED reports in conjunction with the use of ML and NLP with the handling of missing value information have a great potential for the detection of infectious diseases. (C) 2015 Elsevier Inc. All rights reserved.

    Cluster-based query expansion using external collections in medical information retrieval

    10页
    查看更多>>摘要:Utilizing external collections to improve retrieval performance is challenging research because various test collections are created for different purposes. Improving medical information retrieval has also gained much attention as various types of medical documents have become available to researchers ever since they started storing them in machine processable formats. In this paper, we propose an effective method of utilizing external collections based on the pseudo relevance feedback approach. Our method incorporates the structure of external collections in estimating individual components in the final feedback model. Extensive experiments on three medical collections (TREC CDS, CLEF eHealth, and OHSUMED) were performed, and the results were compared with a representative expansion approach utilizing the external collections to show the superiority of our method. (C) 2015 Elsevier Inc. All rights reserved.

    Prediction of drug's Anatomical Therapeutic Chemical (ATC) code by integrating drug-domain network

    9页
    查看更多>>摘要:Predicting Anatomical Therapeutic Chemical (ATC) code of drugs is of vital importance for drug classification and repositioning. Discovering new association information related to drugs and ATC codes is still difficult for this topic. We propose a novel method named drug-domain hybrid (dD-Hybrid) incorporating drug-domain interaction network information into prediction models to predict drug's ATC codes. It is based on the assumption that drugs interacting with the same domain tend to share therapeutic effects. The results demonstrated dD-Hybrid has comparable performance to other methods on the gold standard dataset. Further, several new predicted drug-ATC pairs have been verified by experiments, which offer a novel way to utilize drugs for new purposes effectively. (C) 2015 Elsevier Inc. All rights reserved.

    Exploring methods for identifying related patient safety events using structured and unstructured data

    7页
    查看更多>>摘要:Most healthcare systems have implemented patient safety event reporting systems to identify safety hazards. Searching the safety event data to find related patient safety reports and identify trends is challenging given the complexity and quantity of these reports. Structured data elements selected by the event reporter may be inaccurate and the free-text narrative descriptions are difficult to analyze. In this paper we present and explore methods for utilizing both the unstructured free-text and structured data elements in safety event reports to identify and rank similar events. We evaluate the results of three different free-text search methods, including a unique topic modeling adaptation, and structured element weights, using a patient fall use case. The various search techniques and weight combinations tended to prioritize different aspects of the event reports leading to different search and ranking results. These search and prioritization methods have the potential to greatly improve patient safety officers, and other healthcare workers, understanding of which safety event reports are related. (C) 2015 Elsevier Inc. All rights reserved.