Empirical comparison of tree ensemble variable importance measures

Auret, L.; Aldrich, Chris

doi:10.1016/j.chemolab.2010.12.004

Access Status

Fulltext not available

Authors

Auret, L.

Aldrich, Chris

Date

2011

Type

Journal Article

Metadata

Show full item record

Citation

Auret, Lidia and Aldrich, Chris. 2011. Empirical comparison of tree ensemble variable importance measures. Chemometrics and Intelligent Laboratory Systems. 105 (2): pp. 157-170.

Source Title

Chemometrics and Intelligent Laboratory Systems

DOI

10.1016/j.chemolab.2010.12.004

ISSN

0169-7439

School

WASM Minerals Engineering and Extractive Metallurgy Teaching Area

URI

http://hdl.handle.net/20.500.11937/47469

Collection

Curtin Research Publications

Abstract

Tree ensembles are becoming well-established as popular and powerful data modelling techniques. Tree ensemble models are essentially black box models, although their individual members may not be, and with their growing popularity, interest in the interpretation of tree ensemble models has also grown. This study presents variable importance measures associated with random forests, conditional inference forests and boosted trees, and employs a number of simulated data sets to compare these methods. Overall, variable importance indicators based on bagged conditional inference forests appear to strike a good balance between identification of significant variables and avoiding unnecessary flagging of correlated variables. Data preprocessing and interpretation by experts knowledgeable with a specific data set remain vital.