Curtin University Homepage
  • Library
  • Help
    • Admin

    espace - Curtin’s institutional repository

    JavaScript is disabled for your browser. Some features of this site may not work without it.
    View Item 
    • espace Home
    • espace
    • Curtin Research Publications
    • View Item
    • espace Home
    • espace
    • Curtin Research Publications
    • View Item

    XML document clustering using structure-preserving flat representation of XML content and structure

    Access Status
    Fulltext not available
    Authors
    Hadzic, Fedja
    Hecker, Michael
    Tagerelli, A.
    Date
    2011
    Type
    Conference Paper
    
    Metadata
    Show full item record
    Citation
    Hadzic, Fedja and Hecker, Michael and Tagerelli, Andrea. 2011. XML document clustering using structure-preserving flat representation of XML content and structure, in Li, Deyi and Liu, Bing and Aggarwal, Charu C. (ed), 7th International Conference on Advanced Data Mining and Applications (ADMA 2011), Dec 17-19 2011, pp. 403-416. Beijing, China: Springer.
    Source Title
    Proceedings of the 7th international conference on advanced data mining and applications (ADMA 2011) 
    Source Conference
    7th International Conference on Advanced Data Mining and Applications (ADMA 2011) 
    School
    Digital Ecosystems and Business Intelligence Institute (DEBII)
    URI
    http://hdl.handle.net/20.500.11937/4997
    Collection
    • Curtin Research Publications
    Abstract

    With the increasing use of XML in many domains, XML document clustering has been a central research topic in semistructured data management and mining. Due to the semistructured nature of XML data, the clustering problem becomes particularly challenging, mainly because structural similarity measures specifically designed to deal with tree/graph-shaped data can be quite expensive. Specialized clustering techniques are being developed to account for this difficulty, however most of them still assume that XML documents are represented using a semistructured data model. In this paper we take a simpler approach whereby XML structural aspects are extracted from the documents to generate a flat data format to which well-established clustering methods can be directly applied. Hence, the expensive process of tree/graph data mining is avoided, while the structural properties are still preserved. Our experimental evaluation using a number of real world datasets and comparing with existing structural clustering methods, has demonstrated the significance of our approach.

    Related items

    Showing items related by title, author, creator and subject.

    • Improving the relevance of web search results by combining web snippet categorization, clustering and personalization
      Zhu, Dengya (2010)
      Web search results are far from perfect due to the polysemous and synonymous characteristics of nature languages, information overload as the results of information explosion on the Web, and the flat list, “one size fits ...
    • Improving the relevance of search results via search-term disambiguation and ontological filtering
      Zhu, Dengya (2007)
      With the exponential growth of the Web and the inherent polysemy and synonymy problems of the natural languages, search engines are facing many challenges such as information overload, mismatch of search results, missing ...
    • Evidence-based evaluation of programme interventions to achieve positive community integration outcomes for adults with acquired brain injury
      Parvaneh, Shahriar (2010)
      Background. The growing population of people with acquired brain injury (ABI) requires a strong focus on clients to be integrated into the community in order to use their productive skills in society, to help them live ...
    Advanced search
    Browse
    Communities & CollectionsIssue DateAuthorTitleSubjectDocument TypeThis CollectionIssue DateAuthorTitleSubjectDocument Type

    My Account

    Admin

    Statistics

    Most Popular ItemsStatistics by CountryMost Popular Authors

    Follow Curtin

    • 
    • 
    • 
    • 
    • 

    CRICOS Provider Code: 00301JABN: 99 143 842 569TEQSA: PRV12158

    Copyright | Disclaimer | Privacy statement | Accessibility

    Curtin would like to pay respect to the Aboriginal and Torres Strait Islander members of our community by acknowledging the traditional owners of the land on which the Perth campus is located, the Whadjuk people of the Nyungar Nation; and on our Kalgoorlie campus, the Wongutha people of the North-Eastern Goldfields.