Nhảy đến nội dung

Mining cost-effective patterns in event logs


Philippe Fournier-Viger*, Jiaxuan Li, Jerry Chun-Wei Lin, Tin Truong Chi*, R. Uday Kiran

Source title: 
Knowledge-Based Systems, 105241, 2019 (ISI)
Academic year of acceptance: 

High Utility Pattern Mining is a popular task for analyzing data. It consists of discovering patterns having a high importance in databases. A popular application of high utility pattern mining is to identify high utility (profitable) patterns in customer transaction data. Though such analysis can be useful to understand data, it does not consider the cost (e.g. effort, resources, money or time) required for obtaining the utility (benefits). In this paper, we argue that to discover interesting patterns in event sequences, it is useful to consider both a utility model and a cost model. For example, to identify cost-effective ways of treating patients from medical pathways data, it is desirable to consider not only the ability of treatments to inhibit symptoms or cure a disease (utility) but also the resources consumed and the time spent (cost) to provide these treatments. Based on this perspective, this paper defines a novel task of discovering Cost-Effective Event Sequences in event logs. In this task, cost is modeled as numeric values, while utility is represented either as binary or numeric values. Measures are proposed to evaluate the trade-off and correlation between cost and utility of patterns to identify cost-effective patterns (patterns having a low cost but providing a high utility). Three efficient algorithms called CEPB, corCEPB and CEPN are designed to extract these patterns. They rely on a tight lower-bound on the cost and a memory buffering technique to find patterns efficiently. Experiments show that the proposed algorithms achieve high efficiency, that proposed optimizations improve efficiency, and that insightful cost-effective patterns are found in real-life e-learning data.