A gradient descent boosting spectrum modeling method based on back interval partial least squares
Access Status
Authors
Date
2015Type
Metadata
Show full item recordCitation
Source Title
ISSN
School
Collection
Abstract
When the technique of boosting regression is applied to near-infrared spectroscopy, the full spectrum of samples are generally used to perform partial least squares (PLS) modeling. However, there is a large amount of redundant information and noise contained in the full spectrum. This not only increases the complexity of the model, but also reduces its predictive performance. In addition, the boosting method is sensitive to data noise. When the data are mixed with too much noise, the generalization performance of boosting will decrease, and the prediction error and the variance of PLS will be relatively large. To solve these problems, a gradient descent boosting ensemble method combined with backward interval PLS (GD-Boosting-BiPLS) is proposed in this paper. BiPLS is used to select the effective variables for the boosting base model, and each base model is trained sequentially by resampling. The spectral segmentation parameter of BiPLS and the iteration parameter of boosting are fused, and the weight of each base model is distributed by the gradient descent strategy. This leads to a new ensemble model (forward additive model) in the direction of reduced residuals. The final model is the ensemble model that obtains the minimum root mean square error of prediction (RMSEP). The proposed method is applied to the quantitative prediction of ethanol concentrations. Over iterations 1–50, the average correlation coefficients of the calibration and validation sets are 0.9628 and 0.9388, and the average RMSE of cross-validation and RMSEP are 0.0732 and 0.0675, respectively. The overall performance of the proposed GD-Boosting-BiPLS method is compared with those of various ensemble strategies and 4 kinds of state-of-the-art spectral modeling methods. The experimental results reveal that the proposed method has the best generalization performance and stability.
Related items
Showing items related by title, author, creator and subject.
-
Tran, The Truyen (2008)There has been a growing interest in stochastic modelling and learning with complex data, whose elements are structured and interdependent. One of the most successful methods to model data dependencies is graphical models, ...
-
Pham, Hoa Thi ; Awange, Joseph ; Kuhn, Michael (2022)Machine learning (ML) has been widely used worldwide to develop crop yield forecasting models. However, it is still challenging to identify the most critical features from a dataset. Although either feature selection (FS) ...
-
Li, J.; Calo, Victor (2013)We present a single-particle Lennard-Jones (L-J) model for CO2 and N2. Simplified L-J models for other small polyatomic molecules can be obtained following the methodology described herein. The phase-coexistence diagrams ...