期刊,Data & knowledge engineering 2025年158卷Jul.期_国家学术搜索

Modelling process durations with gamma mixtures for right-censored data: Applications in customer clustering, pattern recognition, drift detection, and rationalisation

Yang, LingkaiMcclean, SallyBurke, KevinDonnelly, Mark...

1.1-1.24页

查看更多>>摘要：Customer modelling, particularly concerning length of stay or process duration, is vital for identifying customer patterns and optimising business processes. Recent advancements in computing and database technologies have revolutionised statistics and business process analytics by producing heterogeneous data that reflects diverse customer behaviours. Different models should be employed for distinct customer categories, culminating in an overall mixture model. Furthermore, some customers may remain "alive" at the conclusion of the observation period, meaning their journeys are incomplete, resulting in right-censored (RC) duration data. This combination of heterogeneous and right-censored data introduces complexity to process duration modelling and analysis. This paper presents a general approach to modelling process duration data using a gamma mixture model, where each gamma distribution represents a specific customer pattern. The model is adapted to account for RC data by modifying the likelihood function during model fitting. The paper explores three key application scenarios: (1) offline pattern clustering, which categorises customers who have completed their journeys; (2) online pattern tracking, which monitors and predicts customer behaviours in real-time; and (3) concept drift detection and rationalisation, which identifies shifts in customer patterns and explains their underlying causes. The proposed method has been validated using synthetically generated data and real-world data from a hospital billing process. In all instances, the fitted models effectively represented the data and demonstrated strong performance across the three application scenarios.

原文链接:

NETL
NSTL
Elsevier

Application of digital shadows on different levels in the automation pyramid

Heithoff, MalteHopmann, ChristianKoebel, ThiloMichael, Judith...

1.1-1.21页

查看更多>>摘要：The concept of digital shadows helps to move from handling large amounts of heterogeneous data in production to the handling of task-and context-dependent aggregated data sets supporting a specific purpose. Current research lacks further investigations of characteristics digital shadows may have when they are applied to different levels of the automation pyramid. Within this paper, we describe the application of the digital shadow concept for two use cases in injection molding, namely geometry-dependent process configuration, and optimal production planning of jobs on an injection molding machine. In detail, we describe the creation process of digital shadows, relevant data needs for the specific purpose, as well as relevant models. Based on their usage, we describe specifics of their characteristics and discuss commonalities and differences. These aspects can be taken into account when creating digital shadows for further use cases.

原文链接:

NETL
NSTL
Elsevier

Modelling process durations with gamma mixtures for right-censored data: Applications in customer clustering, pattern recognition, drift detection, and rationalisation

Yang, LingkaiMcclean, SallyBurke, KevinDonnelly, Mark...

1.1-1.24页

查看更多>>摘要：Customer modelling, particularly concerning length of stay or process duration, is vital for identifying customer patterns and optimising business processes. Recent advancements in computing and database technologies have revolutionised statistics and business process analytics by producing heterogeneous data that reflects diverse customer behaviours. Different models should be employed for distinct customer categories, culminating in an overall mixture model. Furthermore, some customers may remain "alive" at the conclusion of the observation period, meaning their journeys are incomplete, resulting in right-censored (RC) duration data. This combination of heterogeneous and right-censored data introduces complexity to process duration modelling and analysis. This paper presents a general approach to modelling process duration data using a gamma mixture model, where each gamma distribution represents a specific customer pattern. The model is adapted to account for RC data by modifying the likelihood function during model fitting. The paper explores three key application scenarios: (1) offline pattern clustering, which categorises customers who have completed their journeys; (2) online pattern tracking, which monitors and predicts customer behaviours in real-time; and (3) concept drift detection and rationalisation, which identifies shifts in customer patterns and explains their underlying causes. The proposed method has been validated using synthetically generated data and real-world data from a hospital billing process. In all instances, the fitted models effectively represented the data and demonstrated strong performance across the three application scenarios.

原文链接:

NETL
NSTL
Elsevier

Application of digital shadows on different levels in the automation pyramid

Heithoff, MalteHopmann, ChristianKoebel, ThiloMichael, Judith...

1.1-1.21页

查看更多>>摘要：The concept of digital shadows helps to move from handling large amounts of heterogeneous data in production to the handling of task-and context-dependent aggregated data sets supporting a specific purpose. Current research lacks further investigations of characteristics digital shadows may have when they are applied to different levels of the automation pyramid. Within this paper, we describe the application of the digital shadow concept for two use cases in injection molding, namely geometry-dependent process configuration, and optimal production planning of jobs on an injection molding machine. In detail, we describe the creation process of digital shadows, relevant data needs for the specific purpose, as well as relevant models. Based on their usage, we describe specifics of their characteristics and discuss commonalities and differences. These aspects can be taken into account when creating digital shadows for further use cases.

原文链接:

NETL
NSTL
Elsevier

A graph theoretic approach to assess quality of data for classification task

Sadhukhan, PayelGupta, Samrat

1.1-1.17页

查看更多>>摘要：The correctness of predictions rendered by an AI/ML model is key to its acceptability. To foster researchers' and practitioners' confidence in the model, it is necessary to render an intuitive understanding of the workings of a model. In this work, we attempt to explain a model's working by providing some insights into the quality of data. While doing this, it is essential to consider that revealing the training data to the users is not feasible for logistical and security reasons. However, sharing some interpretable parameters of the training data and correlating them with the model's performance can be helpful in this regard. To this end, we propose a new measure based on Euclidean Minimum Spanning Tree (EMST) for quantifying the intrinsic separation (or overlaps) between the data classes. For experiments, we use datasets from diverse domains such as finance, medical, and marketing. We use state-of-the-art measure known as Davies Bouldin Index (DBI) to validate our approach on four different datasets from aforementioned domains. The experimental results of this study establish the viability of the proposed approach in explaining the working and efficiency of a classifier. Firstly, the proposed measure of class- overlap quantification has shown a better correlation with the classification performance as compared to DBI scores. Secondly, the results on multi-class datasets demonstrate that the proposed measure can be used to determine the feature importance so as to learn a better classification model.

原文链接:

NETL
NSTL
Elsevier

Editorial preface to the special issue on research challenges in information science (RCIS'2023)

Nurcan, SelminOpdahl, Andreas L.

1.1-1.3页

原文链接:

NETL
NSTL
Elsevier

A graph theoretic approach to assess quality of data for classification task

Sadhukhan, PayelGupta, Samrat

1.1-1.17页

查看更多>>摘要：The correctness of predictions rendered by an AI/ML model is key to its acceptability. To foster researchers' and practitioners' confidence in the model, it is necessary to render an intuitive understanding of the workings of a model. In this work, we attempt to explain a model's working by providing some insights into the quality of data. While doing this, it is essential to consider that revealing the training data to the users is not feasible for logistical and security reasons. However, sharing some interpretable parameters of the training data and correlating them with the model's performance can be helpful in this regard. To this end, we propose a new measure based on Euclidean Minimum Spanning Tree (EMST) for quantifying the intrinsic separation (or overlaps) between the data classes. For experiments, we use datasets from diverse domains such as finance, medical, and marketing. We use state-of-the-art measure known as Davies Bouldin Index (DBI) to validate our approach on four different datasets from aforementioned domains. The experimental results of this study establish the viability of the proposed approach in explaining the working and efficiency of a classifier. Firstly, the proposed measure of class- overlap quantification has shown a better correlation with the classification performance as compared to DBI scores. Secondly, the results on multi-class datasets demonstrate that the proposed measure can be used to determine the feature importance so as to learn a better classification model.

原文链接:

NETL
NSTL
Elsevier

Editorial preface to the special issue on research challenges in information science (RCIS'2023)

Nurcan, SelminOpdahl, Andreas L.

1.1-1.3页

原文链接:

NETL
NSTL
Elsevier

Accessibility in conceptual modeling-A systematic literature review, a keyboard-only UML modeling tool, and a research roadmap

Sarioglu, AylinMetin, HaydarBork, Dominik

1.1-1.25页

查看更多>>摘要：The reports on Disability by the World Health Organization show that the number of people with disabilities is increasing. Consequently, accessibility should play an essential role in information systems engineering research. While there is an increasingly rich set of available web accessibility guidelines, testing frameworks, and generally accessibility features in modern web-based software systems, software development frameworks, and Integrated Development Environments, this paper shows, based on a systematic review of the literature and current modeling tools, that accessibility is, so far, only scarcely focused in conceptual modeling research. With this paper, we assess the state of the art of accessibility in conceptual modeling, we identify current research gaps, and we delineate a vision toward more accessible conceptual modeling methods and tools. As a concrete step forward toward this vision, we present a generic concept of a keyboard-only modeling tool interaction that is implemented as a new module for the Graphical Language Server Platform (GLSP) framework. We show-using a currently developed UML modeling tool-how efficiently this module allows GLSP-based tool developers to introduce accessibility features into their modeling tools, thereby engaging physically disabled users in conceptual modeling.

原文链接:

NETL
NSTL
Elsevier

Fake news detection algorithms - A systematic literature review

Dal Forno, Ana JuliaRichetti, Graziela PiccoliKnaesel, Vinicius Heinz

1.1-1.15页

查看更多>>摘要：Social media and news platforms make available to their users, in real-time and simultaneously, access to a significant amount of content that may be true or false. It is remarkable that, with the evolution of Industry 4.0 technologies, the production and dissemination of fake news also increased in recent years. Some content quickly reaches considerable popularity because it is accessed and shared on a large scale, especially in social networks, thus having a potential for going viral. Thus, this study aimed to identify the algorithms and software used for fake news detection. The choice for this combination is justified because in Brazil this process is carried out manually by verification agencies and thus, based on the mapping of the algorithms identified in the literature, an architecture proposal will be developed using artificial intelligence. As a methodology, a systematic literature review (SLR) was conducted in the Science Direct and Scopus databases using the keywords "fake news" and "machine learning" to locate reviews and research articles published in Engineering fields from 2018 to 2023. A total of 24 articles were analyzed, and the results pointed out that Facebook and X1 were the social networks most used to disseminate fake news. Moreover, the main topics addressed were the COVID-19 pandemic and the United States presidential elections of 2016 and 2020. As for the most used algorithms, a predominance of neural networks was observed. The contribution of this study is in mapping the most used algorithms and their degree of assertiveness, as well as identifying the themes, countries and related researchers that help in the evolution of the fake news theme.

原文链接:

NETL
NSTL
Elsevier