首页|A text mining framework for screening catalysts and critical process parameters from scientific literature - A study on Hydrogen production from alcohol

A text mining framework for screening catalysts and critical process parameters from scientific literature - A study on Hydrogen production from alcohol

扫码查看
Hydrogen production is an active area of research with a vast amount of available scientific literature. However, this data is unstructured and scattered, making its utilization difficult from an academic and industrial point of view. This work aims to develop a recommendation system to identify optimal process conditions and catalyst information using Natural Language Processing (NLP) tools. To this end, full-text articles were extracted using the Elsevier API key followed by a custom XML parser. Latent Dirichlet allocation (LDA) was applied on this dataset to form clusters of topics. The experimental section of each article is annotated using state-of-the-art sentiment analysis techniques and divided into four categories based on the presence of catalyst and process information. This dataset is used to develop a dedicated NLP model, 'Ex-SciBERT' by performing transfer learning on the 'Sci-BERT' model. This model performs classification followed by Named Entity Recognition (NER) to extract catalyst and process parameters. Ex-SciBERT model produces an accuracy score of 0.915 (train dataset) and 0.890 (test dataset) for the classification of sentences task and an excellent accuracy score of 0.998 (train dataset) and 0.997 (test dataset) for the NER task. Deployment of this model will automate and accelerate the screening of relevant information from literature by reducing manual efforts.

CatalystProcess parameterLDAHydrogenAlcoholNLPSciBERTClassificationNEREx-SciBERT

Avan Kumar、Swathi Ganesh、Divyanshi Gupta

展开 >

Department of Chemical Engineering, Indian Institute of Technology Delhi, 110016, India

Department of Chemical Engineering, Indian Institute of Technology Madras, 600036, India

2022

Chemical Engineering Research & Design

Chemical Engineering Research & Design

SCI
ISSN:0263-8762
年,卷(期):2022.184
  • 5
  • 59