Outlier Detection in Logistic Regression: A Quest for Reliable Knowledge from Predictive Modeling and Classification

Nurunnabi, Abdul; West, Geoff

doi:10.1109/ICDMW.2012.107

dc.contributor.author	Nurunnabi, Abdul
dc.contributor.author	West, Geoff
dc.contributor.editor	Jilles Vreeken
dc.contributor.editor	Charles Ling
dc.contributor.editor	Mohammed J. Zaki
dc.contributor.editor	Arno Siebes
dc.contributor.editor	Jeffrey Xu Yu
dc.contributor.editor	Bart Goethals
dc.contributor.editor	Geoff Webb
dc.contributor.editor	Xindong Wu
dc.date.accessioned	2017-01-30T11:37:16Z
dc.date.available	2017-01-30T11:37:16Z
dc.date.created	2013-03-26T20:00:53Z
dc.date.issued	2012
dc.identifier.citation	Nurunnabi, Abdul and West, Geoff. 2012. Outlier Detection in Logistic Regression: A Quest for Reliable Knowledge from Predictive Modeling and Classification, in The 12th IEEE International Conference on Data Mining (ICDMW), Dec 10 2012, pp. 643-652. Brussels, Belgium: IEEE.
dc.identifier.uri	http://hdl.handle.net/20.500.11937/13467
dc.identifier.doi	10.1109/ICDMW.2012.107
dc.description.abstract	Logistic regression is well known to the data mining research community as a tool for modeling and classification. The presence of outliers is an unavoidable phenomenon in data analysis. Detection of outliers is important to increase the accuracy of the required estimates and for reliable knowledge discovery from the underlying databases. Most of the existing outlier detection methods in regression analysis are based on the single case deletion approach that is inefficient in the presence of multiple outliers because of the well known masking and swamping effects. To avoid these effects the multiple case deletion approach has been introduced. We propose a group deletion approach based diagnostic measure for identifying multiple influential observations in logistic regression. At the same time we introduce a plotting technique that can classify data into outliers, high leverage points, as well as influential and regular observations. This paper has two objectives. First, it investigates the problems of outlier detection in logistic regression, proposes a new method that can find multiple influential observations, and classifies the types of outlier. Secondly, it shows the necessity for proper identification of outliers and influential observations as a prelude for reliable knowledge discovery from modeling and classification via logistic regression. We demonstrate the efficiency of our method, compare the performance with the existing popular diagnostic methods, and explore the necessity of outlier detection for reliability and robustness in modeling and classification by using real datasets.
dc.publisher	Conference Publishing Services
dc.subject	data mining
dc.subject	influential observation
dc.subject	pattern recognition
dc.subject	knowledge discovery
dc.subject	reliability
dc.subject	regression
dc.subject	high leverge point
dc.subject	statistical computing
dc.subject	outlier
dc.title	Outlier Detection in Logistic Regression: A Quest for Reliable Knowledge from Predictive Modeling and Classification
dc.type	Conference Paper
dcterms.source.startPage	643
dcterms.source.endPage	652
dcterms.source.title	Proceedings of 2012 IEEE 12th International Conference on Data Mining
dcterms.source.series	Proceedings of 2012 IEEE 12th International Conference on Data Mining
dcterms.source.isbn	9781467351645
dcterms.source.conference	The 12th IEEE International Conference on Data Mining
dcterms.source.conference-start-date	Dec 10 2012
dcterms.source.conferencelocation	Brussels, Belgium
dcterms.source.place	USA
curtin.note	Copyright © 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
curtin.department
curtin.accessStatus	Open access

Files in this item

Name:: 190958_190958.pdf
Size:: 1.064Mb
Format:: PDF

This item appears in the following Collection(s)

Curtin Research Publications

Show simple item record

Outlier Detection in Logistic Regression: A Quest for Reliable Knowledge from Predictive Modeling and Classification

Files in this item

This item appears in the following Collection(s)

Related items