Curtin University Homepage
  • Library
  • Help
    • Admin

    espace - Curtin’s institutional repository

    JavaScript is disabled for your browser. Some features of this site may not work without it.
    View Item 
    • espace Home
    • espace
    • Curtin Research Publications
    • View Item
    • espace Home
    • espace
    • Curtin Research Publications
    • View Item

    Razor: Mining distance-constrained embedded subtrees

    20227_downloaded_stream_215.pdf (381.6Kb)
    Access Status
    Open access
    Authors
    Tan, H.
    Dillon, Tharam S.
    Hadzic, Fedja
    Chang, Elizabeth
    Date
    2006
    Type
    Conference Paper
    
    Metadata
    Show full item record
    Citation
    Tan, Henry and Dillon, Tharam and Hadzic, Fedja and Chang, Elizabeth. 2006. : Razor: Mining distance-constrained embedded subtrees, in Tsumota, Shusaku (ed), IEEE International Conference on Data Mining Workshops, Dec 18 2006, pp. 8-13. Hong Kong: IEEE.
    Source Title
    Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops
    Source Conference
    IEEE International Conference on Data Mining Workshops
    Faculty
    Curtin Business School
    School
    Centre for Extended Enterprises and Business Intelligence
    Remarks

    Copyright 2006 IEEE

    This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

    URI
    http://hdl.handle.net/20.500.11937/5006
    Collection
    • Curtin Research Publications
    Abstract

    Our work is focused on the task of mining frequent subtrees from a database of rooted ordered labelled subtrees. Previously we have developed an efficient algorithm, MB3 [12], for mining frequent embedded subtrees from a database of rooted labeled and ordered subtrees. The efficiency comes from the utilization of a novel Embedding List representation for Tree Model Guided (TMG) candidate generation. As an extension the IMB3 [13] algorithm introduces the Level of Embedding constraint. In this study we extend our past work by developing an algorithm, Razor, for mining embedded subtrees where the distance of nodes relative to the root of the subtree needs to be considered. This notion of distance constrained embedded tree mining will have important applications in web information systems, conceptual model analysis and more sophisticated ontology matching. Domains representing their knowledge in a tree structured form may require this additional distance information as it commonly indicates the amount of specific knowledge stored about a particular concept within the hierarchy. The structure based approaches for schema matching commonly take the distance among the concept nodes within a sub-structure into account when evaluating the concept similarity across different schemas. We present an encoding strategy to efficiently enumerate candidate subtrees taking the distance of nodes relative to the root of the subtree into account. The algorithm is applied to both synthetic and real-world datasets, and the experimental results demonstrate the correctness and effectiveness of the proposed technique.

    Related items

    Showing items related by title, author, creator and subject.

    • Mining Induced/Embedded Subtrees using the Level of Embedding Constraint
      Tan, H.; Hadzic, Fedja; Dillon, T. (2012)
      The increasing need for representing information through more complex structures where semantics and relationships among data objects can be more easily expressed has resulted in many semi-structured data sources. Structure ...
    • Mining unordered distance-constrained embedded subtrees
      Hadzic, Fedja; Tan, Henry; Dillon, Tharam S. (2008)
      Frequent subtree mining is an important problem in the area of association rule mining from semi-structured or tree structured documents, often found in many commercial, web and scientific domains. This paper presents the ...
    • Tree model guided candidate generation for mining frequent subtrees from XML
      Tan, Henry; Hadzic, Fedja; Dillon, Tharam S.; Chang, Elizabeth; Feng, Ling; Feng, L. (2008)
      Due to the inherent flexibilities in both structure and semantics, XML association rules mining faces few challenges, such as: a more complicated hierarchical data structure and ordered data context. Mining frequent ...
    Advanced search

    Browse

    Communities & CollectionsIssue DateAuthorTitleSubjectDocument TypeThis CollectionIssue DateAuthorTitleSubjectDocument Type

    My Account

    Admin

    Statistics

    Most Popular ItemsStatistics by CountryMost Popular Authors

    Follow Curtin

    • 
    • 
    • 
    • 
    • 

    CRICOS Provider Code: 00301JABN: 99 143 842 569TEQSA: PRV12158

    Copyright | Disclaimer | Privacy statement | Accessibility

    Curtin would like to pay respect to the Aboriginal and Torres Strait Islander members of our community by acknowledging the traditional owners of the land on which the Perth campus is located, the Whadjuk people of the Nyungar Nation; and on our Kalgoorlie campus, the Wongutha people of the North-Eastern Goldfields.