Nhảy đến nội dung

EHAUSM: An efficient algorithm for high average utility sequence mining

Authors: 

Tin Truong, Hai Duong, Bac Le, Philippe Fournier-Viger

Source title: 
Information Sciences, 515: 302-323, 2020 (ISI)
Academic year of acceptance: 
2019-2020
Abstract: 

Identifying high utility sequences in a quantitative sequence database is an important data mining task. However, a key problem of current approaches is that extensions of a high utility sequence often have a high utility. Hence, traditional techniques are often biased toward finding long patterns. To circumvent this problem, this paper proposes techniques for the problem of high average-utility sequence (HAUS) mining (HAUSM). HAUSs are more meaningful than high utility sequences because the former are found using the average-utility measure, which consider the length of patterns in utility calculations. HAUSM is more general than high average-utility itemset mining but it is also a more difficult problem because the downward-closure property used for search space reduction does not hold for the average-utility. To overcome that challenge, this paper introduces two upper bounds and a weak upper bound on the average-utility measure, and four width pruning, depth pruning, reducing, and tightening strategies. These strategies are designed to eliminate candidate HAUSs to speed up HAUS mining. Based on these theoretical results, a novel algorithm named EHAUSM is proposed for HAUSM. Experiments on both real-life and synthetic quantitative sequence databases have confirmed its efficiency in terms of memory consumption and runtime.