Mining frequent sequences using itemset-based extension

Ma, Zhixin; Xu, Yusheng; Dillon, Tharam S.; Chen, Xiaoyun

118188_9998_PUB-CBS-EEB-MC-47237.pdf (441.1Kb)

Access Status

Open access

Authors

Ma, Zhixin

Xu, Yusheng

Dillon, Tharam S.

Chen, Xiaoyun

Date

2008

Type

Conference Paper

Metadata

Show full item record

Citation

Ma, Zhixin and Xu, Yusheng and Dillon, Tharam S. and Chen, Xiaoyun. 2008. Mining frequent sequences using itemset-based extension, in Craig Douglas and Ping-Kong Alexander Wai (ed), International MultiConference of Engineers and Computer Scientists (IMECS 2008), Mar 19 2008, pp. 591-596.Hong Kong: IAENG

Source Title

Proceedings of the international multiconference of engineers and computer scientists (IMECS 2008)

Source Conference

International MultiConference of Engineers and Computer Scientists (IMECS 2008)

ISBN

9789889867188

Faculty

Curtin Business School

The Centre for Extended Enterprises and Business Intelligence (CEEBI)

School

Centre for Extended Enterprises and Business Intelligence

Remarks

The link to the International MultiConference of Engineers and Computer Scientists (IMECS 2008) is : http://www.iaeng.org/IMECS2008/

URI

http://hdl.handle.net/20.500.11937/9047

Collection

Curtin Research Publications

Abstract

In this paper, we systematically explore an itemset-based extension approach for generating candidate sequence which contributes to a better and more straightforward search space traversal performance than traditional item-based extension approach. Based on this candidate generation approach, we present FINDER, a novel algorithm for discovering the set of all frequent sequences. FINDER is composed oftwo separated steps. In the first step, all frequent itemsets are discovered and we can get great benefit from existing efficient itemset mining algorithms. In the second step, all frequent sequcnces with at least two frequent itemsets are detected by combining depth-first search and item set-based extension candidate generation together. A vertical bitmap data representation is adopted for rapidly support counting reason. Several pruning strategies are used to reduce the search space and minimize cost of computation. An extensive set ofexperiments demonstrate the effectiveness and the linear scalability of proposed algorithm.