Nhảy đến nội dung

Efficient Methods for Clickstream Pattern Mining on Incremental Databases


Bay Vo, Huy-Cuong Nguyen, Bao Huynh, Tuong Le

Source title: 
IEEE Access, 9: 161305-161317, 2021 (ISI)
Academic year of acceptance: 

Sequence pattern mining is a core task of data mining in various fields. Clickstream pattern mining, a variant of sequence pattern mining, is very useful in e-commerce, and is used to analyze, evaluate and predict online customer behaviors, attracting the interest of many researchers. Clickstream pattern mining has become important due to its wide range of applications. However, most previous approaches are not suitable to apply when clickstreams are inserted into a database, because this task is too time consuming. The challenge of this problem is how to find a solution to minimize the runtimes and reduce the number of times the original databases are scanned to reduce the computational cost in the mining process on incremental databases. In this paper, we propose two effective methods for mining clickstream patterns from incremental databases, named inCMUB and Eff-inCMUB, based on the pre-large concept. inCMUB inserts new clickstreams from an inserted database into the existing tree and mines all frequent clickstream patterns, while Eff-inCMUB is a new approach, and builds a new tree from the inserted database to find pre-large 1-patterns and then it updates the pre-large clickstream patterns mined from the original database to extract frequent clickstream patterns. The experiments show that our proposed methods outperform the SMUB algorithm in terms of runtimes, memory usage and scalability on real-word clickstream databases.