Implications of frequent subtree mining using hybrid support definitions
Access Status
Authors
Date
2007Type
Metadata
Show full item recordCitation
Source Title
Faculty
School
Collection
Abstract
Frequent subtree mining has found many useful applications in areas where the domain knowledge is presented in a tree structured form, such as bioinformatics, web mining, scientific knowledge management etc. It involves the extraction of a set of frequent subtrees from a tree structured database, with respect to the user specified minimum support. Up to date, the commonly used support definitions are occurrence match and transaction based support. There are some application areas where using either of these support definitions would not provide the desired information automatically, but instead further querying on the extracted patterns needs to take place. This has motivated us to develop a hybrid support definition that constrains the kind of patterns to be extracted and provides additional information not provided by previous support definitions. This would simplify some of the reasoning process which commonly takes place in certain applications. In this paper we demonstrate the need for the hybrid support definition by presenting some applications of tree mining where traditional support definitions would fall short in providing the desired information. We have extended our previous tree mining algorithms to mine frequent subtrees using the hybrid support definition. Using real-world and synthetic data sets we demonstrate the effectiveness of the method, and further implications for reasoning with the extracted patterns.
Related items
Showing items related by title, author, creator and subject.
-
Hadzic, Fedja; Tan, H.; Dillon, Tharam S. (2010)Large amount of online information is or can be represented using semi-structured documents, such as XML. The information contained in an XML document can be effectively represented using a rooted ordered labeled tree. ...
-
Hadzic, Fedja; Dillon, Tharam S.; Sidhu, Amandeep; Chang, Elizabeth; Tan, H. (2006)In this paper we consider the 'Prions' database that describes protein instances stored for Human Prion Proteins. The Prions database can be viewed as a database of rooted ordered labeled subtrees. Mining frequent ...
-
Tan, H.; Hadzic, Fedja; Dillon, T. (2012)The increasing need for representing information through more complex structures where semantics and relationships among data objects can be more easily expressed has resulted in many semi-structured data sources. Structure ...