X3-Miner: Mining patterns from XML database

Chang, Elizabeth; Tan, H.; Dillon, Tharam S.; Feng, L.; Hadzic, Fedja

dc.contributor.author	Chang, Elizabeth
dc.contributor.author	Tan, H.
dc.contributor.author	Dillon, Tharam S.
dc.contributor.author	Feng, L.
dc.contributor.author	Hadzic, Fedja
dc.date.accessioned	2017-01-30T15:26:17Z
dc.date.available	2017-01-30T15:26:17Z
dc.date.created	2008-11-12T23:21:42Z
dc.date.issued	2005
dc.identifier.citation	Chang, Elizabeth and Tan, Henry and Dillon, Tharam and Feng, Ling and Hadzic, Fedja. 2005. : X3-Miner: Mining patterns from XML database, in Zanasi, A. and Brebbia, C.A. and Ebecken, N.F.F. (ed), 6th International Conference on Data Mining, Text Mining and their Business Applications, May 25 2005, pp. 287-298. Skiathos, Greece: WIT Press.
dc.identifier.uri	http://hdl.handle.net/20.500.11937/46300
dc.description.abstract	An XML enabled framework for representation of association rules in databases was first presented in [Feng03]. In Frequent Structure Mining (FSM), there are techniques proposed to mine frequent patterns from complex trees and graphs databases. One of the popular approaches is to use graph matching. Graph matching algorithms use data structures such as the adjacency matrix [Inokuchi00] or adjacency list [FSG01]. Another approach represents semi-structured tree-like structures using a string representation, which is more space efficient and relatively easy for manipulation [Zaki02]. However, in the XML Era, mining association rules is faced with more challenges due to the inherent flexibilities of XML in both structure and semantics. The primary challenges include 1) a more complicated hierarchical data structure with tags and attributes; 2) an ordered data context; and 3) a much bigger data size. To tackle these challenges, in this paper, we propose an approach, X3-Miner, that efficiently extracts patterns from a large XML data set, and overcomes the challenges by:(1) Exploring the use of a model validating approach in deducing the number of candidates generated. The basic idea is that by taking into account of the semantics embedded in the tree-like structure in an XML database while generating candidates directly from the XML tree, we can obtain only valid (i.e., possibly existing) candidates out of the XML database;(2) Minimising I/O overhead by first trimming the infrequent 1-itemset in the XML database. The XML database is intersected with the frequent 1-itemset resulting in a smaller XML tree that contains only the frequent 1-itemset. The algorithm also progressively trims infrequent k-itemsets that contain infrequent (k-1)-itemsets.(3) Extending the notion of string representation of a tree structure proposed in [Zaki02] to xstring for describing an XML document in a flat format without loss of both structure and semantics. Such an extension enables an easier traversal of the tree-structured XML data during our model-validating candidate generation.Our experiments with both synthetic and real-life data sets demonstrate the effectiveness of the proposed model-validating approach in mining XML data.
dc.publisher	WIT Press
dc.relation.uri	http://www.witpress.com
dc.relation.uri	http://library.witpress.com/pages/listpapers.asp?q_bid=327&q_subject=Computing%20_%20Information%20Management
dc.subject	X3-Miner
dc.subject	Data Mining
dc.subject	Semantic Relationships
dc.subject	Association Mining
dc.subject	information systems
dc.subject	Algorithm
dc.subject	database mining
dc.subject	XML
dc.title	X3-Miner: Mining patterns from XML database
dc.type	Conference Paper
dcterms.source.startPage	287
dcterms.source.endPage	298
dcterms.source.title	Data Mining VI: Data mining, text mining and their business applications
dcterms.source.series	Data Mining VI: Data mining, text mining and their business applications
dcterms.source.conference	6th International Conference on Data Mining, Text Mining and their Business Applications
dcterms.source.conference-start-date	May 25 2005
dcterms.source.conferencelocation	Skiathos, Greece
dcterms.source.place	Southampton, Boston
curtin.note	Originally published by WIT Press, Southampton, UK.
curtin.department	Centre for Extended Enterprises and Business Intelligence
curtin.identifier	EPR-561
curtin.accessStatus	Open access
curtin.faculty	Curtin Business School
curtin.faculty	School of Information Systems

Files in this item

Name:: 19376_downloaded_stream_468.pdf
Size:: 467.8Kb
Format:: PDF

This item appears in the following Collection(s)

Curtin Research Publications

Show simple item record

X3-Miner: Mining patterns from XML database

Files in this item

This item appears in the following Collection(s)

Related items