面向紧邻关系重发现的事件日志采样方法及其应用
Event log sampling approach towards directly-follows relation rediscoverability and its application
苏轩 1刘聪 2闻立杰 3孟晓亮 1李彩虹 1曾庆田4
作者信息
- 1. 山东理工大学计算机科学与技术学院,山东 淄博 255000
- 2. 山东理工大学计算机科学与技术学院,山东 淄博 255000;山东科技大学计算机科学与工程学院,山东 青岛 266590
- 3. 清华大学软件学院,北京 100084
- 4. 山东科技大学计算机科学与工程学院,山东 青岛 266590
- 折叠
摘要
事件日志采样作为近年来流程挖掘领域一个新的研究热点,旨在提高流程挖掘任务的效率,如模型发现、合规性检查、流程预测等.然而目前已有的采样方法不能很好地保证挖掘模型的质量,且针对大规模事件日志的采样效率低.任务紧邻关系作为事件日志中行为描述的基本单元,在各类流程挖掘任务中起到了关键作用.鉴于此,提出了一个通用的面向紧邻关系重发现的事件日志采样方法,该方法可保证紧邻关系的重发现性.为了验证该采样方法的有效性,将其应用于提高已有模型挖掘算法的效率,为了对挖掘模型质量定量评估,提出了基于流程树的模型相似度方法.所提出的采样方法已在开源流程挖掘工具平台ProM6和PM4PY实现,基于12个公开事件日志数据集,将所提出的面向紧邻关系重发现的采样方法与已有方法从模型挖掘质量方面进行了定量比较,实验结果表明所提方法可以在保证模型质量的前提下,大幅提高模型发现效率.
Abstract
As a new research hotspot in the field of process mining in recent years,event log sampling aims to im-prove the efficiency of process mining tasks,such as model discovery,conformance checking,process prediction,etc.However,the existing sampling methods cannot guarantee the quality of the mining model well,and the sam-pling efficiency for large-scale event logs is low.As the basic unit of behavior description in event logs,task directly-follows relation plays a key role in various process mining tasks.So a general sampling method towards directly-fol-lows relation rediscoverability was proposed,which could ensure the directly-follows relation rediscoverability.To verify the effectiveness of this sampling method,it was applied to improve the efficiency of model mining.To quan-titatively evaluate the quality of mining models,a model similarity evaluation based on process tree was pro-posed.The sampling method had been implemented in the open source process mining tool platform ProM6 and PM4PY platform.Based on 12 public event log datasets,a quantitative comparison was made between the proposed sampling method and existing sampling methods in terms of model mining quality.Experiments showed that the proposed event log sampling method towards directly-follows rediscoverability could greatly improves the log sam-pling efficiency on the premise of ensuring the quality of model.
关键词
事件日志采样/紧邻关系重发现/质量评估/模型相似度Key words
event log sampling/directly-follows relation rediscoverbility/quality measure/model similarity引用本文复制引用
基金项目
国家自然科学基金资助项目(62472264)
山东省泰山学者工程专项基金资助项目(ts20190936)
山东省泰山学者工程专项基金资助项目(tsqn201909109)
山东省自然科学基金优秀青年基金资助项目(ZR2021YQ45)
山东省高等学校青创科技计划创新团队项目(2021KJ031)
出版年
2024