Probabilistic models for mining imbalanced relational data

Ghanem, Amal Saleh

dc.contributor.author	Ghanem, Amal Saleh
dc.contributor.supervisor	Prof. Svetha Venkatesh
dc.contributor.supervisor	Prof. Geoff West
dc.date.accessioned	2017-01-30T10:19:38Z
dc.date.available	2017-01-30T10:19:38Z
dc.date.created	2010-05-27T02:00:29Z
dc.date.issued	2009
dc.identifier.uri	http://hdl.handle.net/20.500.11937/2266
dc.description.abstract	Most data mining and pattern recognition techniques are designed for learning from at data files with the assumption of equal populations per class. However, most real-world data are stored as rich relational databases that generally have imbalanced class distribution. For such domains, a rich relational technique is required to accurately model the different objects and relationships in the domain, which can not be easily represented as a set of simple attributes, and at the same time handle the imbalanced class problem.Motivated by the significance of mining imbalanced relational databases that represent the majority of real-world data, learning techniques for mining imbalanced relational domains are investigated. In this thesis, the employment of probabilistic models in mining relational databases is explored. In particular, the Probabilistic Relational Models (PRMs) that were proposed as an extension of the attribute-based Bayesian Networks. The effectiveness of PRMs in mining real-world databases was explored by learning PRMs from a real-world university relational database. A visual data mining tool is also proposed to aid the interpretation of the outcomes of the PRM learned models.Despite the effectiveness of PRMs in relational learning, the performance of PRMs as predictive models is significantly hindered by the imbalanced class problem. This is due to the fact that PRMs share the assumption common to other learning techniques of relatively balanced class distributions in the training data. Therefore, this thesis proposes a number of models utilizing the effectiveness of PRMs in relational learning and extending it for mining imbalanced relational domains.The first model introduced in this thesis examines the problem of mining imbalanced relational domains for a single two-class attribute. The model is proposed by enriching the PRM learning with the ensemble learning technique. The premise behind this model is that an ensemble of models would attain better performance than a single model, as misclassification committed by one of the models can be often correctly classified by others.Based on this approach, another model is introduced to address the problem of mining multiple imbalanced attributes, in which it is important to predict several attributes rather than a single one. In this model, the ensemble bagging sampling approach is exploited to attain a single model for mining several attributes. Finally, the thesis outlines the problem of imbalanced multi-class classification and introduces a generalized framework to handle this problem for both relational and non-relational domains.
dc.language	en
dc.publisher	Curtin University
dc.subject	relational learning
dc.subject	probabilistic relational models (PRMs)
dc.subject	mining imbalanced relational domains
dc.subject	imbalanced class distribution
dc.subject	attribute-based Bayesian Networks
dc.title	Probabilistic models for mining imbalanced relational data
dc.type	Thesis
dcterms.educationLevel	PhD
curtin.department	Department of Computing
curtin.accessStatus	Open access

Files in this item

Name:: 138115_Ghanem2009.pdf
Size:: 2.732Mb
Format:: PDF

This item appears in the following Collection(s)

Curtin Theses

Show simple item record

Probabilistic models for mining imbalanced relational data

Files in this item

This item appears in the following Collection(s)

Related items