An Analysis of Derivative-Based Optimizers on Deep Neural Network Models

doi:10.1201/9781003111290-10-12

ABSTRACT

In machine learning, deep neural networks find its applications in many areas like aerospace design, mechanical engineering, and chemical engineering to solve different problems like object detection, pose recognition, object tracking, etc. By selecting the appropriate objective function deep neural networks, the most commonly used machine learning algorithm allows us to encode more and more complex features. To enhance the effect of the model, an increase in the number of hidden layers in the deep neural network imposes significant challenges in their training.

Optimizers are employed to train the deep neural network by changing the attributes such as weights, learning rates of the neural model to minimize the losses and improve the model. Optimizers help to minimize the effort required or to maximize the expected outcome by finding the condition that offers the maximum or minimum value of a function. Most of the optimization algorithms automatically compute the learning rate. A decent optimizer not only trains the model quickly but also helps to succeed in the prediction without being stuck in local minima. Selection of appropriate optimizers with the appropriate parameters helps to improve the accuracy of the neural network. In this work, derivative-based optimization approaches such as Stochastic Gradient Descent (SGD), Adaptive Gradient (AdaGrad), Root Mean Square Propagation (RMSProp), Adaptive Moment Estimation (Adam), Adaptive Delta (AdaDelta), Adaptive Moment Estimation Extension Based on Infinity Norm (Adamax), Nesterov-Accelerated Adaptive Moment Estimation (Nadam), and Stochastic Gradient Descent (SGD) with momentum are studied from the theoretical as well as practical perspective to understand their relevance in context of training the neural network to improve the accuracy. These algorithms empirically have been tested on artificial neural models ResNet50, VGG19, DenseNet201, and EfficientNetB5 to highlight their strengths and weaknesses. The models ResNet50, VGG19, DenseNet201, and EfficientNetB5 trained using the COVID-19 dataset that is publicly available on Kaggle. The overall experiment results have been evaluated and analyzed using different performance metrics like accuracy and loss function to obtain the better performance of the models. It has been observed that DenseNet201 gives the best performance compared to other architectures like VGG16, ResNet50, and EfficientNetB5, with the highest validation accuracy of 98% using the Adam optimizer.