首页|面向行为不变性的事件日志采样方法

面向行为不变性的事件日志采样方法

扫码查看
信息系统在执行过程中收集了大量的业务流程事件日志,模型发现旨在从事件日志的行为信息中发现流程模型为业务流程理解和改进提供事实依据.直接跟随活动关系(DF)作为事件日志中最基本的行为信息是模型发现算法的基础.根据是否考虑日志的DF频次特征,将已有模型发现算法分为考虑频次和不考虑频次两类.已有面向模型发现的日志采样方法注重于提高模型发现的效率,却损失了事件日志中DF频次信息,得到的样本日志在使用基于DF频次的模型发现算法时改变了原始日志的行为.因此,针对基于DF频次的模型发现算法,提出一种面向行为不变性的日志采样方法,具体而言,该方法包括通过按比率选取轨迹变体及频次、计算轨迹的DF权重和基于集合覆盖采样三个阶段,使得样本日志包含的行为信息与原始日志一致.通过公开事件日志数据集上的实验分析表明,与已有的日志采样方法比较本文方法得到的样本日志能更准确地保留原始日志中的DF频次信息,从而确保更高的模型挖掘质量.
Behavior invariance oriented event log sampling method
Considerable amounts of business process event logs are collected by information systems,model discovery aims to discover process models from event logs to provide evidence for business process improvement.As the most basic behavior information in the event log,Directly Follow relation(DF)is the basis of the model discovery algo-rithm.According to the frequency of the directly follow relation in the event log,the existing model discovery algo-rithms can be divided into two types:with frequency and without frequency.The existing log sampling methods for model discovery focus on improving the efficiency of model discovery,but lose the DF frequency information in the event log.The sample log obtained changes the behavior of the original log when using the DF frequency-based model discovery algorithm.Therefore,for the DF frequency-based model discovery algorithm,a behavior invariance-oriented event log sampling method was proposed,which included three-stage sampling process of reducing the fre-quency of trace variants,calculating the DF weight of the trace and one-time set coverage sampling method to ensure that the behavior of the process model mined with the sample event log and the original log was consistent.Through the experimental analysis on the public event log data set,compared with the existing log sampling methods,the proposed sample log could more accurately retain the DF frequency information in the original log,thus ensuring a higher quality of model mining.

event loglog samplingmodel discoverybehavior invariance

张帅鹏、刘聪、苏轩、闻立杰、宋容嘉、曾庆田

展开 >

山东理工大学计算机科学与技术学院,山东 淄博 255000

山东科技大学计算机科学与工程学院,山东 青岛 266590

清华大学 软件学院,北京 100084

杭州电子科技大学管理学院,浙江 杭州 310018

展开 >

事件日志 日志采样 模型发现 行为不变性

国家自然科学基金资助项目山东省泰山学者工程专项基金资助项目山东省泰山学者工程专项基金资助项目山东省自然科学基金优秀青年基金山东省高等学校青创科技计划创新团队资助项目

62472264ts20190936tsqn201909109ZR2021YQ452021KJ031

2024

计算机集成制造系统
中国兵器工业集团第210研究所

计算机集成制造系统

CSTPCD北大核心
影响因子:1.092
ISSN:1006-5911
年,卷(期):2024.30(8)