Curtin University Homepage
  • Library
  • Help
    • Admin

    espace - Curtin’s institutional repository

    JavaScript is disabled for your browser. Some features of this site may not work without it.
    View Item 
    • espace Home
    • espace
    • Curtin Theses
    • View Item
    • espace Home
    • espace
    • Curtin Theses
    • View Item

    Unsupervised modeling of multiple data sources : a latent shared subspace approach

    170275_Gupta2011.pdf (1.187Mb)
    Access Status
    Open access
    Authors
    Gupta, Sunil Kumar
    Date
    2011
    Supervisor
    Prof. Svetha Venkatesh
    Type
    Thesis
    Award
    PhD
    
    Metadata
    Show full item record
    School
    Department of Computing
    URI
    http://hdl.handle.net/20.500.11937/2583
    Collection
    • Curtin Theses
    Abstract

    The growing number of information sources has given rise to joint analysis. While the research community has mainly focused on analyzing data from a single source, there has been relatively few attempts on jointly analyzing multiple data sources exploiting their statistical sharing strengths. In general, the data from these sources emerge without labeling information and thus it is imperative to perform the joint analysis in an unsupervised manner.This thesis addresses the above problem and presents a general shared subspace learning framework for jointly modeling multiple related data sources. Since the data sources are related, there exist common structures across these sources, which can be captured through a shared subspace. However, each source also has some individual structures, which can be captured through an individual subspace. Incorporating these concepts in nonnegative matrix factorization (NMF) based subspace learning, we develop a nonnegative shared subspace learning model for two data sources and demonstrate its application to tag based social media retrieval. Extending this model, we impose additional regularization constraints of mutual orthogonality on the shared and individual subspaces and show that, compared to its unregularized counterpart, the new regularized model effectively deals with the problem of negative knowledge transfer – a key issue faced by transfer learning methods. The effectiveness of the regularized model is demonstrated through retrieval and clustering applications for a variety of data sets. To take advantage from more than one auxiliary source, we extend above models generalizing two sources to multiple sources with an added flexibility of allowing sources having arbitrary sharing configurations. The usefulness of this model is demonstrated through improved performance, achieved with multiple auxiliary sources. In addition, this model is used to relate the items from disparate media types allowing us to perform cross-media retrieval using tags.Departing from the nonnegative models, we use a linear-Gaussian framework and develop Bayesian shared subspace learning, which not only models the mixed-sign data but also learns probabilistic subspaces. Learning the subspace dimensionalities for the shared subspace models has an important role in optimum knowledge transfer but requires model selection – a task that is computationally intensive and time consuming. To this end, we xii propose a nonparametric Bayesian joint factor analysis model that circumvents the problem of model selection by using a hierarchical beta process prior, inferring subspace dimensionalities automatically from the data. The effectiveness of this model is shown on both synthetic and real data sets. For synthetic data set, successful recovery of both shared and individual subspace dimensionalities is demonstrated, whilst for real data set, the model outperforms recent state-of-the-art techniques for text modeling and image retrieval.

    Related items

    Showing items related by title, author, creator and subject.

    • Regularised nonnegative shared subspace learning
      Gupta, Sunil; Phung, Dinh; Adams, Brett; Venkatesh, Svetha (2011)
      Joint modeling of related data sources has the potential to improve various data mining tasks such as transfer learning, multitask clustering, information retrieval etc. However, diversity among various data sources might ...
    • A bayesian framework for learning shared and individual subspaces from multiple data sources
      Gupta, Sunil; Phung, Dinh; Adams, Brett; Venkatesh, Svetha (2011)
      This paper presents a novel Bayesian formulation to exploit shared structures across multiple data sources, constructing foundations for effective mining and retrieval across disparate domains. We jointly analyze diverse ...
    • Nonnegative shared subspace learning and its application to social media retrieval
      Gupta, Sunil; Phung, Dinh; Adams, Brett; Tran, Truyen; Venkatesh, Svetha (2010)
      Although tagging has become increasingly popular in online image and video sharing systems, tags are known to be noisy, ambiguous, incomplete and subjective. These factors can seriously affect the precision of a social ...
    Advanced search

    Browse

    Communities & CollectionsIssue DateAuthorTitleSubjectDocument TypeThis CollectionIssue DateAuthorTitleSubjectDocument Type

    My Account

    Admin

    Statistics

    Most Popular ItemsStatistics by CountryMost Popular Authors

    Follow Curtin

    • 
    • 
    • 
    • 
    • 

    CRICOS Provider Code: 00301JABN: 99 143 842 569TEQSA: PRV12158

    Copyright | Disclaimer | Privacy statement | Accessibility

    Curtin would like to pay respect to the Aboriginal and Torres Strait Islander members of our community by acknowledging the traditional owners of the land on which the Perth campus is located, the Whadjuk people of the Nyungar Nation; and on our Kalgoorlie campus, the Wongutha people of the North-Eastern Goldfields.