Show simple item record

dc.contributor.authorNguyen, T.
dc.contributor.authorTran, The Truyen
dc.contributor.authorPhung, D.
dc.contributor.authorVenkatesh, S.
dc.date.accessioned2017-01-30T15:23:59Z
dc.date.available2017-01-30T15:23:59Z
dc.date.created2016-03-17T19:30:18Z
dc.date.issued2016
dc.identifier.citationNguyen, T. and Tran, T.T. and Phung, D. and Venkatesh, S. 2016. Graph-induced restricted Boltzmann machines for document modeling. Information Sciences. 328: pp. 60-75.
dc.identifier.urihttp://hdl.handle.net/20.500.11937/45888
dc.identifier.doi10.1016/j.ins.2015.08.023
dc.description.abstract

© 2015 Elsevier Inc. All rights reserved. Discovering knowledge from unstructured texts is a central theme in data mining and machine learning. We focus on fast discovery of thematic structures from a corpus. Our approach is based on a versatile probabilistic formulation - the restricted Boltzmann machine (RBM) - where the underlying graphical model is an undirected bipartite graph. Inference is efficient - document representation can be computed with a single matrix projection, making RBMs suitable for massive text corpora available today. Standard RBMs, however, operate on bag-of-words assumption, ignoring the inherent underlying relational structures among words. This results in less coherent word thematic grouping. We introduce graph-based regularization schemes that exploit the linguistic structures, which in turn can be constructed from either corpus statistics or domain knowledge. We demonstrate that the proposed technique improves the group coherence, facilitates visualization, provides means for estimation of intrinsic dimensionality, reduces overfitting, and possibly leads to better classification accuracy.

dc.publisherElsevier Inc
dc.titleGraph-induced restricted Boltzmann machines for document modeling
dc.typeJournal Article
dcterms.source.volume328
dcterms.source.startPage60
dcterms.source.endPage75
dcterms.source.issn0020-0255
dcterms.source.titleInformation Sciences
curtin.departmentMulti-Sensor Proc & Content Analysis Institute
curtin.accessStatusFulltext not available


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record