Curtin University Homepage
  • Library
  • Help
    • Admin

    espace - Curtin’s institutional repository

    JavaScript is disabled for your browser. Some features of this site may not work without it.
    View Item 
    • espace Home
    • espace
    • Curtin Research Publications
    • View Item
    • espace Home
    • espace
    • Curtin Research Publications
    • View Item

    X3-Miner: Mining patterns from XML database

    19376_downloaded_stream_468.pdf (467.8Kb)
    Access Status
    Open access
    Authors
    Chang, Elizabeth
    Tan, H.
    Dillon, Tharam S.
    Feng, L.
    Hadzic, Fedja
    Date
    2005
    Type
    Conference Paper
    
    Metadata
    Show full item record
    Citation
    Chang, Elizabeth and Tan, Henry and Dillon, Tharam and Feng, Ling and Hadzic, Fedja. 2005. : X3-Miner: Mining patterns from XML database, in Zanasi, A. and Brebbia, C.A. and Ebecken, N.F.F. (ed), 6th International Conference on Data Mining, Text Mining and their Business Applications, May 25 2005, pp. 287-298. Skiathos, Greece: WIT Press.
    Source Title
    Data Mining VI: Data mining, text mining and their business applications
    Source Conference
    6th International Conference on Data Mining, Text Mining and their Business Applications
    Additional URLs
    http://www.witpress.com
    http://library.witpress.com/pages/listpapers.asp?q_bid=327&q_subject=Computing%20_%20Information%20Management
    Faculty
    Curtin Business School
    School of Information Systems
    School
    Centre for Extended Enterprises and Business Intelligence
    Remarks

    Originally published by WIT Press, Southampton, UK.

    URI
    http://hdl.handle.net/20.500.11937/46300
    Collection
    • Curtin Research Publications
    Abstract

    An XML enabled framework for representation of association rules in databases was first presented in [Feng03]. In Frequent Structure Mining (FSM), there are techniques proposed to mine frequent patterns from complex trees and graphs databases. One of the popular approaches is to use graph matching. Graph matching algorithms use data structures such as the adjacency matrix [Inokuchi00] or adjacency list [FSG01]. Another approach represents semi-structured tree-like structures using a string representation, which is more space efficient and relatively easy for manipulation [Zaki02]. However, in the XML Era, mining association rules is faced with more challenges due to the inherent flexibilities of XML in both structure and semantics. The primary challenges include 1) a more complicated hierarchical data structure with tags and attributes; 2) an ordered data context; and 3) a much bigger data size. To tackle these challenges, in this paper, we propose an approach, X3-Miner, that efficiently extracts patterns from a large XML data set, and overcomes the challenges by:(1) Exploring the use of a model validating approach in deducing the number of candidates generated. The basic idea is that by taking into account of the semantics embedded in the tree-like structure in an XML database while generating candidates directly from the XML tree, we can obtain only valid (i.e., possibly existing) candidates out of the XML database;(2) Minimising I/O overhead by first trimming the infrequent 1-itemset in the XML database. The XML database is intersected with the frequent 1-itemset resulting in a smaller XML tree that contains only the frequent 1-itemset. The algorithm also progressively trims infrequent k-itemsets that contain infrequent (k-1)-itemsets.(3) Extending the notion of string representation of a tree structure proposed in [Zaki02] to xstring for describing an XML document in a flat format without loss of both structure and semantics. Such an extension enables an easier traversal of the tree-structured XML data during our model-validating candidate generation.Our experiments with both synthetic and real-life data sets demonstrate the effectiveness of the proposed model-validating approach in mining XML data.

    Related items

    Showing items related by title, author, creator and subject.

    • Tree model guided candidate generation for mining frequent subtrees from XML
      Tan, Henry; Hadzic, Fedja; Dillon, Tharam S.; Chang, Elizabeth; Feng, Ling; Feng, L. (2008)
      Due to the inherent flexibilities in both structure and semantics, XML association rules mining faces few challenges, such as: a more complicated hierarchical data structure and ordered data context. Mining frequent ...
    • Quality and interestingness of association rules derived from data mining of relational and semi-structured data
      Mohd Shaharanee, Izwan Nizal (2012)
      Deriving useful and interesting rules from a data mining system are essential and important tasks. Problems such as the discovery of random and coincidental patterns or patterns with no significant values, and the generation ...
    • Mining Induced/Embedded Subtrees using the Level of Embedding Constraint
      Tan, H.; Hadzic, Fedja; Dillon, T. (2012)
      The increasing need for representing information through more complex structures where semantics and relationships among data objects can be more easily expressed has resulted in many semi-structured data sources. Structure ...
    Advanced search

    Browse

    Communities & CollectionsIssue DateAuthorTitleSubjectDocument TypeThis CollectionIssue DateAuthorTitleSubjectDocument Type

    My Account

    Admin

    Statistics

    Most Popular ItemsStatistics by CountryMost Popular Authors

    Follow Curtin

    • 
    • 
    • 
    • 
    • 

    CRICOS Provider Code: 00301JABN: 99 143 842 569TEQSA: PRV12158

    Copyright | Disclaimer | Privacy statement | Accessibility

    Curtin would like to pay respect to the Aboriginal and Torres Strait Islander members of our community by acknowledging the traditional owners of the land on which the Perth campus is located, the Whadjuk people of the Nyungar Nation; and on our Kalgoorlie campus, the Wongutha people of the North-Eastern Goldfields.