Curtin University Homepage
  • Library
  • Help
    • Admin

    espace - Curtin’s institutional repository

    JavaScript is disabled for your browser. Some features of this site may not work without it.
    View Item 
    • espace Home
    • espace
    • Curtin Research Publications
    • View Item
    • espace Home
    • espace
    • Curtin Research Publications
    • View Item

    Explaining anomalies in coal proximity and coal processing data with Shapley and tree-based models

    Explaining anomalies in coal proximity and coal processing data with Shapley and tree-based models Preprint.pdf (1.587Mb)
    Access Status
    Open access
    Authors
    Liu, Xiu
    Aldrich, Chris
    Date
    2022
    Type
    Journal Article
    
    Metadata
    Show full item record
    Citation
    Liu, X. and Aldrich, C. 2022. Explaining anomalies in coal proximity and coal processing data with Shapley and tree-based models. FUEL. 335: 126891.
    Source Title
    FUEL
    DOI
    10.1016/j.fuel.2022.126891
    ISSN
    0016-2361
    Faculty
    Faculty of Science and Engineering
    School
    WASM: Minerals, Energy and Chemical Engineering
    URI
    http://hdl.handle.net/20.500.11937/97646
    Collection
    • Curtin Research Publications
    Abstract

    Modelling the characteristics and composition of coal is important, as proximity data and other measurements to do so are typically expensive or hard to acquire in real-time. Understanding anomalies in these relatively small data sets are important, as removal may result in an unnecessary loss of data or bias in the data used in the model. Although anomaly detection has been considered in-depth in the literature, very little work has been devoted to the explanation of anomalies. In this paper, a general anomaly detection and identification methodology is considered, based on three models, viz an isolation forest, a random forest and a tree SHAP explanatory model. Three case studies related to the composition of coal and coal processing are considered. In these case studies, the IF-RF-SHAP approach identified outliers of data anomalies not identifiable with principal component analysis. The model is a new variant of some of the integrated approaches that have recently been considered. Further contribution of the study lies in the empirical comparison of IF anomaly scores with distance-based and reconstruction-based anomaly scores generated with principal component models. In the case studies considered, the IF anomaly scores were better able to identify anomalies in the data than the scores derived from the principal component models. As a result, the methodology can complement distance-based approaches, such as principal component analysis, to explain anomalies or outliers detected in data. Apart from the proposed IF-RF-SHAP approach, four approaches to compare the contributions of variables in random forest models are considered as well. These were simple correlation of individual predictors with anomaly scores of samples, random forest prediction based on an impurity criterion, random forest prediction based on a permutation criterion, as well as the tree SHAP approach. If the latter is considered as a benchmark, then the impurity criterion gave the most reliable results, while simple predictor correlations gave the least reliable results.

    Related items

    Showing items related by title, author, creator and subject.

    • Data-driven approach for labelling process plant event data
      Corrêa, D.; Polpo, A.; Small, Michael ; Srikanth, S.; Hollins, K.; Hodkiewicz, M. (2022)
      An essential requirement in any data analysis is to have a response variable representing the aim of the analysis. Much academic work is based on laboratory or simulated data, where the experiment is controlled, and the ...
    • Change point detection in time series data with random forests
      Auret, L.; Aldrich, Chris (2010)
      A large class of monitoring problems can be cast as the detection of a change in the parameters of astatic or dynamic system, based on the effects of these changes on one or more observed variables. Inthis paper, the use ...
    • Non-compliance by school principals : the effects of experience, stakeholder characteristics and governance mechanisms on reasoned risk-taking in decision-making
      Trimmer, Karen Joy (2011)
      Reasoned risk-taking has long been associated with governance mechanisms for organisations within business contexts. Research has been conducted in business contexts to develop theories of risk-taking that incorporate ...
    Advanced search

    Browse

    Communities & CollectionsIssue DateAuthorTitleSubjectDocument TypeThis CollectionIssue DateAuthorTitleSubjectDocument Type

    My Account

    Admin

    Statistics

    Most Popular ItemsStatistics by CountryMost Popular Authors

    Follow Curtin

    • 
    • 
    • 
    • 
    • 

    CRICOS Provider Code: 00301JABN: 99 143 842 569TEQSA: PRV12158

    Copyright | Disclaimer | Privacy statement | Accessibility

    Curtin would like to pay respect to the Aboriginal and Torres Strait Islander members of our community by acknowledging the traditional owners of the land on which the Perth campus is located, the Whadjuk people of the Nyungar Nation; and on our Kalgoorlie campus, the Wongutha people of the North-Eastern Goldfields.