Multivariate Image Processing in Minerals Engineering with Vision Transformers

Liu, Xiu; Aldrich, Chris

doi:10.1016/j.mineng.2024.108599

94158.pdf (7.884Mb)

Access Status

Open access

Authors

Liu, Xiu

Aldrich, Chris

Date

2024

Type

Journal Article

Metadata

Show full item record

Citation

Liu, X. and Aldrich, C. 2024. Multivariate Image Processing in Minerals Engineering with Vision Transformers. Minerals Engineering. 208: 108599.

Source Title

Minerals Engineering

DOI

10.1016/j.mineng.2024.108599

ISSN

0892-6875

Faculty

Faculty of Science and Engineering

School

WASM: Minerals, Energy and Chemical Engineering

Abstract

Vision transformers (ViTs) are a new class of deep learning algorithms that have recently emerged as a competitive alternative to convolutional neural networks. In this investigation, their application to two operations previously studied in the mineral processing industry is considered. These are image recognition of fines in coal particles on conveyor belts and characterisation of the particle size in the underflow of a hydrocyclone. Promising results were achieved by use of vision transformers, as they performed as well as, or better than convolutional neural networks in these image recognition problems. In addition, features extracted from the best ViT model could be used to visualise its performance and these features could also serve as a basis for nonlinear process monitoring models. Furthermore, explainability techniques such as attention maps for ViTs were implemented to better understand the ViT models, similar to techniques such as occlusion sensitivity maps used with convolutional neural networks.