XML document clustering using structure-preserving flat representation of XML content and structure

Hadzic, Fedja; Hecker, Michael; Tagerelli, A.

dc.contributor.author	Hadzic, Fedja
dc.contributor.author	Hecker, Michael
dc.contributor.author	Tagerelli, A.
dc.contributor.editor	Deyi Li
dc.contributor.editor	Bing Liu
dc.contributor.editor	Charu C Aggarwal
dc.date.accessioned	2017-01-30T10:43:09Z
dc.date.available	2017-01-30T10:43:09Z
dc.date.created	2012-03-07T20:00:59Z
dc.date.issued	2011
dc.identifier.citation	Hadzic, Fedja and Hecker, Michael and Tagerelli, Andrea. 2011. XML document clustering using structure-preserving flat representation of XML content and structure, in Li, Deyi and Liu, Bing and Aggarwal, Charu C. (ed), 7th International Conference on Advanced Data Mining and Applications (ADMA 2011), Dec 17-19 2011, pp. 403-416. Beijing, China: Springer.
dc.identifier.uri	http://hdl.handle.net/20.500.11937/4997
dc.description.abstract	With the increasing use of XML in many domains, XML document clustering has been a central research topic in semistructured data management and mining. Due to the semistructured nature of XML data, the clustering problem becomes particularly challenging, mainly because structural similarity measures specifically designed to deal with tree/graph-shaped data can be quite expensive. Specialized clustering techniques are being developed to account for this difficulty, however most of them still assume that XML documents are represented using a semistructured data model. In this paper we take a simpler approach whereby XML structural aspects are extracted from the documents to generate a flat data format to which well-established clustering methods can be directly applied. Hence, the expensive process of tree/graph data mining is avoided, while the structural properties are still preserved. Our experimental evaluation using a number of real world datasets and comparing with existing structural clustering methods, has demonstrated the significance of our approach.
dc.publisher	Springer
dc.title	XML document clustering using structure-preserving flat representation of XML content and structure
dc.type	Conference Paper
dcterms.source.title	Proceedings of the 7th international conference on advanced data mining and applications (ADMA 2011)
dcterms.source.series	Proceedings of the 7th international conference on advanced data mining and applications (ADMA 2011)
dcterms.source.conference	7th International Conference on Advanced Data Mining and Applications (ADMA 2011)
dcterms.source.conference-start-date	Dec 17 2011
dcterms.source.conferencelocation	Beijing, China
dcterms.source.place	Heidelberg
curtin.department	Digital Ecosystems and Business Intelligence Institute (DEBII)
curtin.accessStatus	Fulltext not available

Files in this item

Name:: 173281_41559_PUB-CBS-EEB-MC-63 ...
Size:: 331.0Kb
Format:: PDF

This item appears in the following Collection(s)

Curtin Research Publications

Show simple item record

XML document clustering using structure-preserving flat representation of XML content and structure

Files in this item

This item appears in the following Collection(s)

Related items