Learning in imbalanced relational data
Access Status
Fulltext not available
Authors
Ghanem, Amal
Venkatesh, Svetha
West, Geoff
Date
2008Type
Conference Paper
Metadata
Show full item recordCitation
Ghanem, A. and Venkatesh, S. and West, G. 2008. Learning in imbalanced relational data, in Ejiri, M. and Kasturi, R. and Sanniti di Baja, G. (ed), 19th international Conference on Pattern Recognition, Dec 8-11 2008. Tampa, Florida: IAPR.
Source Title
Proceedings of the 19th international conference on Pattern Recognition
Source Conference
19th international conference on Pattern Recognition
ISBN
School
Department of Computing
Collection
Abstract
Traditional learning techniques learn from flat data files with the assumption that each class has a similar number of examples. However, the majority of real-world data are stored as relational systems with imbalanced data distribution, where one class of data is over-represented as compared with other classes. We propose to extend a relational learning technique called Probabilistic Relational Models (PRMs) to deal with the imbalanced class problem. We address learning from imbalanced relational data using an ensemble of PRMs and propose a new model: the PRMs-IM. We show the performance of PRMs-IM on a real university relational database to identify students at risk.
Related items
Showing items related by title, author, creator and subject.
-
Ghanem, Amal Saleh (2009)Most data mining and pattern recognition techniques are designed for learning from at data files with the assumption of equal populations per class. However, most real-world data are stored as rich relational databases ...
-
Ghanem, Amal; Venkatesh, Svetha; West, Geoffrey (2010)The majority of multi-class pattern classification techniques are proposed for learning from balanced datasets. However, in several real-world domains, the datasets have imbalanced data distribution, where some classes ...
-
Ghanem, Amal; Venkatesh, Svetha; West, Geoffrey (2009)Real-world data are often stored as relational database systems with different numbers of significant attributes. Unfortunately, most classification techniques are proposed for learning from balanced nonrelational data ...