Stochastic diagonal approximate greatest descent in convolutional neural networks
MetadataShow full item record
© 2017 IEEE. Deep structured of Convolutional Neural Networks (CNN) has recently gained intense attention in development due to its good performance in object recognition. One of the crucial components in CNN is the learning mechanism of weight parameters through backpropagation. In this paper, stochastic diagonal Approximate Greatest Descent (SDAGD) is proposed to train weight parameters in CNN. SDAGD adopts the concept of multistage control system and diagonal Hessian approximation for weight optimization. It can be defined into two-phase optimization. In phase 1, when an initial guessing point is far from the solution, SDAGD constructs local search regions to determine the step length of next iteration at the boundary of search region. Subsequently, when the solution is at the final search region, SDAGD will shift to phase 2 by approximating Newton method to obtain a fast weight convergence. The calculation of Hessian in diagonal approximation results in less computational cost as compared to full Hessian calculation. The experiment showed that SDAGD learning algorithm could achieve misclassification rate of 8.85% on MNIST dataset.
Showing items related by title, author, creator and subject.
Teunissen, Peter (2010)Global navigation satellite system (GNSS) carrier phase integer ambiguity resolution is the key to high-precision positioning and attitude determination. In this contribution, we develop new integer least-squares (ILS) ...
Tan, H.; Lim, Hann; Harno, H. (2017)© 2017 IEEE. Stochastic Diagonal Approximate Greatest Descent (SDAGD) is proposed to manage the optimization in two stages, (a) apply a radial boundary to estimate step length when the weights are far from solution, (b) ...
Tan, H.; Lim, Hann; Harno, H. (2017)© 2017 IEEE. Optimization is important in neural networks to iteratively update weights for pattern classification. Existing optimization techniques suffer from suboptimal local minima and slow convergence rate. In this ...