Sparse subspace representation for spectral document clustering
MetadataShow full item record
We present a novel method for document clustering using sparse representation of documents in conjunction with spectral clustering. An l1-norm optimization formulation is posed to learn the sparse representation of each document, allowing us to characterize the affinity between documents by considering the overall information instead of traditional pairwise similarities. This document affinity is encoded through a graph on which spectral clustering is performed. The decomposition into multiple subspaces allows documents to be part of a sub-group that shares a smaller set of similar vocabulary, thus allowing for cleaner clusters. Extensive experimental evaluations on two real-world datasets from Reuters-21578 and 20Newsgroup corpora show that our proposed method consistently outperforms state-of-the-art algorithms. Significantly, the performance improvement over other methods is prominent for this datasets.
Showing items related by title, author, creator and subject.
Improving the relevance of web search results by combining web snippet categorization, clustering and personalizationZhu, Dengya (2010)Web search results are far from perfect due to the polysemous and synonymous characteristics of nature languages, information overload as the results of information explosion on the Web, and the flat list, “one size fits ...
Budhaditya, S.; Phung, D.; Pham, DucSon; Venkatesh, S. (2012)We present a novel method for document clustering using sparse representation of documents in conjunction with spectral clustering. An ℓ1-norm optimization formulation is posed to learn the sparse representation of each ...
Hadzic, Fedja; Hecker, Michael; Tagerelli, A. (2011)With the increasing use of XML in many domains, XML document clustering has been a central research topic in semistructured data management and mining. Due to the semistructured nature of XML data, the clustering problem ...