Curtin University Homepage
  • Library
  • Help
    • Admin

    espace - Curtin’s institutional repository

    JavaScript is disabled for your browser. Some features of this site may not work without it.
    View Item 
    • espace Home
    • espace
    • Curtin Research Publications
    • View Item
    • espace Home
    • espace
    • Curtin Research Publications
    • View Item

    Tree model guided candidate generation for mining frequent subtrees from XML

    116268_9877_PUB-CBS-EEB-MC-45176.pdf (2.287Mb)
    Access Status
    Open access
    Authors
    Tan, Henry
    Hadzic, Fedja
    Dillon, Tharam S.
    Chang, Elizabeth
    Feng, Ling
    Feng, L.
    Date
    2008
    Type
    Journal Article
    
    Metadata
    Show full item record
    Citation
    Tan, Henry and Hadzic, Fedja and Dillon, Tharam and Chang, Elizabeth and Feng, Ling. 2008. Tree model guided candidate generation for mining frequent subtrees from XML. ACM Transactions on Knowledge Discovery from Data 2 (2): pp. 1-43.
    Source Title
    ACM Transactions on Knowledge Discovery from Data
    Additional URLs
    http://doi.acm.org/10.1145/1376815.1376818
    ISSN
    15564681
    Faculty
    Curtin Business School
    Centre for Extended Enterprises and Business Intelligence
    Remarks

    © ACM, 2008. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Transactions on Knowledge Discovery from Data, {VOL 2, ISSN 15564681, (2008)} http://doi.acm.org/10.1145/1376815.1376818

    URI
    http://hdl.handle.net/20.500.11937/14717
    Collection
    • Curtin Research Publications
    Abstract

    Due to the inherent flexibilities in both structure and semantics, XML association rules mining faces few challenges, such as: a more complicated hierarchical data structure and ordered data context. Mining frequent patterns from XML documents can be recast as mining frequent tree structures from a database of XML documents. In this study, we model a database of XML documents as a database of rooted labeled ordered subtrees. In particular, we are mainly coneerned with mining frequent induced and embedded ordered subtrees. Our main contributions arc as follows. We describe our unique embedding list representation of the tree structure, which enables efficient implementation ofour Tree Model Guided (TMG) candidate generation. TMG is an optimal, non-redundant enumeration strategy which enumerates all the valid candidates that conform to the structural aspects of the data. We show through a mathematical model and experiments that TMG has better complexity compared to the commonly used join approach. In this paper, we propose two algorithms, MB3Miner and iMB3-Miner. MB3-Miner mines embedded subtrees. iMB3-Miner mines induced and/or embedded subtrees by using the maximum level of embedding constraint. Our experiments with both synthetic and real datasets against two well known algorithms for mining induced and embedded subtrees, demonstrate the effeetiveness and the efficiency of the proposed techniques.

    Related items

    Showing items related by title, author, creator and subject.

    • Mining Induced/Embedded Subtrees using the Level of Embedding Constraint
      Tan, H.; Hadzic, Fedja; Dillon, T. (2012)
      The increasing need for representing information through more complex structures where semantics and relationships among data objects can be more easily expressed has resulted in many semi-structured data sources. Structure ...
    • IMB3-Miner: Mining induced/embedded subtrees by constraining the level of embedding
      Tan, H.; Dillon, Tharam S.; Hadzic, Fedja; Chang, Elizabeth; Feng, L. (2006)
      Tree mining has recently attracted a lot of interest in areas such as Bioinformatics, XML mining, Web mining, etc. We are mainly concerned with mining frequent induced and embedded subtrees. While more interesting patterns ...
    • Razor: Mining distance-constrained embedded subtrees
      Tan, H.; Dillon, Tharam S.; Hadzic, Fedja; Chang, Elizabeth (2006)
      Our work is focused on the task of mining frequent subtrees from a database of rooted ordered labelled subtrees. Previously we have developed an efficient algorithm, MB3 [12], for mining frequent embedded subtrees from a ...
    Advanced search

    Browse

    Communities & CollectionsIssue DateAuthorTitleSubjectDocument TypeThis CollectionIssue DateAuthorTitleSubjectDocument Type

    My Account

    Admin

    Statistics

    Most Popular ItemsStatistics by CountryMost Popular Authors

    Follow Curtin

    • 
    • 
    • 
    • 
    • 

    CRICOS Provider Code: 00301JABN: 99 143 842 569TEQSA: PRV12158

    Copyright | Disclaimer | Privacy statement | Accessibility

    Curtin would like to pay respect to the Aboriginal and Torres Strait Islander members of our community by acknowledging the traditional owners of the land on which the Perth campus is located, the Whadjuk people of the Nyungar Nation; and on our Kalgoorlie campus, the Wongutha people of the North-Eastern Goldfields.