Implications of frequent subtree mining using hybrid support definitions

Hadzic, Fedja; Tan, H.; Dillon, Tharam S.; Chang, Elizabeth

doi:10.2495/DATA070021

Access Status

Open access via publisher

Authors

Hadzic, Fedja

Tan, H.

Dillon, Tharam S.

Chang, Elizabeth

Date

2007

Type

Book Chapter

Metadata

Show full item record

Citation

Hadzic, Fedja and Tan, H. and Dillon, Tharam S. and Chang, Elizabeth. 2007. Implications of frequent subtree mining using hybrid support definitions, in Zanasi, A. and Brebbia, C.A. and Ebecken, N.F.F. (ed), Data mining VII: data, text, and web mining and their business applications, pp. 13-23. Southampton, UK: WIT Press.

Source Title

Data mining VII: data, text, and web mining and their business applications

DOI

10.2495/DATA070021

Faculty

Curtin Business School

School of Information Systems

School

Centre for Extended Enterprises and Business Intelligence

URI

http://hdl.handle.net/20.500.11937/17005

Collection

Curtin Research Publications

Abstract

Frequent subtree mining has found many useful applications in areas where the domain knowledge is presented in a tree structured form, such as bioinformatics, web mining, scientific knowledge management etc. It involves the extraction of a set of frequent subtrees from a tree structured database, with respect to the user specified minimum support. Up to date, the commonly used support definitions are occurrence match and transaction based support. There are some application areas where using either of these support definitions would not provide the desired information automatically, but instead further querying on the extracted patterns needs to take place. This has motivated us to develop a hybrid support definition that constrains the kind of patterns to be extracted and provides additional information not provided by previous support definitions. This would simplify some of the reasoning process which commonly takes place in certain applications. In this paper we demonstrate the need for the hybrid support definition by presenting some applications of tree mining where traditional support definitions would fall short in providing the desired information. We have extended our previous tree mining algorithms to mine frequent subtrees using the hybrid support definition. Using real-world and synthetic data sets we demonstrate the effectiveness of the method, and further implications for reasoning with the extracted patterns.