Model guided algorithm for mining unordered embedded subtrees
Access Status
Authors
Date
2010Type
Metadata
Show full item recordCitation
Source Title
ISSN
School
Collection
Abstract
Large amount of online information is or can be represented using semi-structured documents, such as XML. The information contained in an XML document can be effectively represented using a rooted ordered labeled tree. This has made the frequent pattern mining problem recast as the frequent subtree mining problem, which is a pre-requisite for association rule mining form tree-structured documents. Driven by different application needs a number of algorithms have been developed for mining of different subtree types under different support definitions. In this paper we present an algorithm for mining unordered embedded subtrees. It is an extension of our general tree model guided (TMG) candidate generation framework and the proposed U3 algorithm considers all support definitions, namely, transaction-based, occurrence-match and hybrid support. A number of experiments are presented on synthetic and real world data sets. The results demonstrate the flexibility of our general TMG framework as well as its efficiency when compared to the existing state-of-the-art approach.
Related items
Showing items related by title, author, creator and subject.
-
Tan, H.; Hadzic, Fedja; Dillon, T. (2012)The increasing need for representing information through more complex structures where semantics and relationships among data objects can be more easily expressed has resulted in many semi-structured data sources. Structure ...
-
Tan, H.; Dillon, Tharam S.; Hadzic, Fedja; Chang, Elizabeth (2006)Our work is focused on the task of mining frequent subtrees from a database of rooted ordered labelled subtrees. Previously we have developed an efficient algorithm, MB3 [12], for mining frequent embedded subtrees from a ...
-
Tan, H.; Dillon, Tharam S.; Hadzic, Fedja; Chang, Elizabeth; Feng, L. (2006)Tree mining has recently attracted a lot of interest in areas such as Bioinformatics, XML mining, Web mining, etc. We are mainly concerned with mining frequent induced and embedded subtrees. While more interesting patterns ...