Curtin University Homepage
  • Library
  • Help
    • Admin

    espace - Curtin’s institutional repository

    JavaScript is disabled for your browser. Some features of this site may not work without it.
    View Item 
    • espace Home
    • espace
    • Curtin Research Publications
    • View Item
    • espace Home
    • espace
    • Curtin Research Publications
    • View Item

    Outlier Detection in Logistic Regression: A Quest for Reliable Knowledge from Predictive Modeling and Classification

    190958_190958.pdf (1.064Mb)
    Access Status
    Open access
    Authors
    Nurunnabi, Abdul
    West, Geoff
    Date
    2012
    Type
    Conference Paper
    
    Metadata
    Show full item record
    Citation
    Nurunnabi, Abdul and West, Geoff. 2012. Outlier Detection in Logistic Regression: A Quest for Reliable Knowledge from Predictive Modeling and Classification, in The 12th IEEE International Conference on Data Mining (ICDMW), Dec 10 2012, pp. 643-652. Brussels, Belgium: IEEE.
    Source Title
    Proceedings of 2012 IEEE 12th International Conference on Data Mining
    Source Conference
    The 12th IEEE International Conference on Data Mining
    DOI
    10.1109/ICDMW.2012.107
    ISBN
    9781467351645
    Remarks

    Copyright © 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

    URI
    http://hdl.handle.net/20.500.11937/13467
    Collection
    • Curtin Research Publications
    Abstract

    Logistic regression is well known to the data mining research community as a tool for modeling and classification. The presence of outliers is an unavoidable phenomenon in data analysis. Detection of outliers is important to increase the accuracy of the required estimates and for reliable knowledge discovery from the underlying databases. Most of the existing outlier detection methods in regression analysis are based on the single case deletion approach that is inefficient in the presence of multiple outliers because of the well known masking and swamping effects. To avoid these effects the multiple case deletion approach has been introduced. We propose a group deletion approach based diagnostic measure for identifying multiple influential observations in logistic regression. At the same time we introduce a plotting technique that can classify data into outliers, high leverage points, as well as influential and regular observations. This paper has two objectives. First, it investigates the problems of outlier detection in logistic regression, proposes a new method that can find multiple influential observations, and classifies the types of outlier. Secondly, it shows the necessity for proper identification of outliers and influential observations as a prelude for reliable knowledge discovery from modeling and classification via logistic regression. We demonstrate the efficiency of our method, compare the performance with the existing popular diagnostic methods, and explore the necessity of outlier detection for reliability and robustness in modeling and classification by using real datasets.

    Related items

    Showing items related by title, author, creator and subject.

    • Some applications of local influence diagnostics.
      Yick, John S. (2000)
      The influence of observations on the outcome of an analysis is of importance in statistical data analysis. A practical and well-established approach to influence analysis is case deletion. However, it has its draw-backs ...
    • Diagnostic-robust statistical analysis for Local Surface Fitting in 3D Point Cloud Data
      Nurunnabi, Abdul; Belton, David; West, Geoff (2012)
      Objectives: Surface reconstruction and fitting for geometric primitives and three Dimensional (3D) modeling is a fundamental task in the field of photogrammetry and reverse engineering. However it is impractical to get ...
    • Statistical analysis of genomic data : a new model for class prediction and inference
      Jiang, Zhenyu (2011)
      Genomics is a major scientific revolution in this century. High-throughput genomic data provides an opportunity for identifying genes and SNPs (singlenucleotide polymorphism) that are related to various clinical phenotypes. ...
    Advanced search

    Browse

    Communities & CollectionsIssue DateAuthorTitleSubjectDocument TypeThis CollectionIssue DateAuthorTitleSubjectDocument Type

    My Account

    Admin

    Statistics

    Most Popular ItemsStatistics by CountryMost Popular Authors

    Follow Curtin

    • 
    • 
    • 
    • 
    • 

    CRICOS Provider Code: 00301JABN: 99 143 842 569TEQSA: PRV12158

    Copyright | Disclaimer | Privacy statement | Accessibility

    Curtin would like to pay respect to the Aboriginal and Torres Strait Islander members of our community by acknowledging the traditional owners of the land on which the Perth campus is located, the Whadjuk people of the Nyungar Nation; and on our Kalgoorlie campus, the Wongutha people of the North-Eastern Goldfields.