A random finite set model for data clustering
MetadataShow full item record
The goal of data clustering is to partition data points into groups to optimize a given objective function. While most existing clustering algorithms treat each data point as vector, in many applications each datum is not a vector but a point pattern or a set of points. Moreover, many existing clustering methods require the user to specify the number of clusters, which is not available in advance. This paper proposes a new class of models for data clustering that addresses set-valued data as well as unknown number of clusters, using a Dirichlet Process mixture of Poisson random finite sets. We also develop an efficient Markov Chain Monte Carlo posterior inference technique that can learn the number of clusters and mixture parameters automatically from the data. Numerical studies are presented to demonstrate the salient features of this new model, in particular its capacity to discover extremely unbalanced clusters in data.
Funding and Sponsorship
Showing items related by title, author, creator and subject.
Kent, Peter; Kongsted, A. (2012)Background: Recently, there has been interest in using the short message service (SMS or text messaging), to gather frequent information on the clinical course of individual patients. One possible role for identifying ...
Li, Q.; Liu, Wan-Quan; Li, Ling (2018)© 2018 Elsevier Ltd Subspace clustering refers to the problem of finding low-dimensional subspaces (clusters) for high-dimensional data. Current state-of-the-art subspace clustering methods are usually based on spectral ...
Planck intermediate results. II. Comparison of Sunyaev-Zeldovich measurements from Planck and from the Arcminute Microkelvin Imager for 11 galaxy clustersHurley-Walker, N (2013)A comparison is presented of Sunyaev-Zeldovich measurements for 11 galaxy clusters as obtained by Planck and by the ground-based interferometer, the Arcminute Microkelvin Imager. Assuming a universal spherically-symmetric ...