Comparative Study of High Speed Back- Propagation Learning Algorithms
Автор: Saduf, Mohd.Arif Wani
Журнал: International Journal of Modern Education and Computer Science (IJMECS) @ijmecs
Статья в выпуске: 12 vol.6, 2014 года.
Бесплатный доступ
Back propagation is one of the well known training algorithms for multilayer perceptron. However the rate of convergence in back propagation learning tends to be relatively slow, which in turn makes it computationally excruciating. Over the last years many modifications have been proposed to improve the efficiency and convergence speed of the back propagation algorithm. The main emphasis of this paper is on investigating the performance of improved versions of back propagation algorithm in training the neural network. All of them are assessed on different training sets and a comparative analysis is made. Results of computer simulations with standard benchmark problems such as XOR, 3 BIT PARITY, MODIFIED XOR and IRIS are presented. The training performance of these algorithms is evaluated in terms of percentage of accuracy, and convergence speed.
ANN, gain, momentum, error saturation, Local minima
Короткий адрес: https://sciup.org/15014712
IDR: 15014712
Текст научной статьи Comparative Study of High Speed Back- Propagation Learning Algorithms
Published Online December 2014 in MECS DOI: 10.5815/ijmecs.2014.12.05
Neural networks are information processing networks that mimic the human nervous system. Sigmoid activation function which models the actual behaviour of a neuron is mainly used, even though there exist various other functions that represent the characteristics of neuron. A neural network can be trained to imitate any function by using a training set that contains the input output pairs of the desired function. The popular structure of neural networks is multi layer perceptron (MLP) in which the neurons are placed in some layers and signal is transmitted in one direction. MLP consists of an input layer, one or more hidden layer and an output layer. The hidden layer is used to capture the non linear relationships among input variables and the output layer is used to obtain the predicted output. Back propagation algorithm (BP) introduced by [1] is a supervised learning method based on the gradient descent of the quadratic error function and is considered as the universal function approximator. Back propagation based MLP (BPNN) could approximate any smooth function to an arbitrary degree of accuracy, when the tuning parameters could be optimized properly. Supervised learning of a neural network can be viewed as a curve fitting process. A training vector pairs that consist of an input vector from the input space and a target vector as the neural response are presented to the network. Based on the learning algorithm, the neural network performs the weight adjustment so that the error between the actual output vectors and the target vector is minimized relative to some optimization criterion. Once trained the neural network performs the interpolation in the output vector space which is then referred to as the generalization capability. While as in unsupervised learning the target output pattern is not known and the system learns of its own by discovering the features in the input patterns. This type of learning is based on clustering technique. Reinforcement learning is based upon both the supervised learning as well as unsupervised learning. In this type of learning although a teacher is present but the correct answer is not presented to the network. The teacher only indicates whether the computed answer is correct or not.
This paper is divided into two prime areas. The second section provides a general idea of BP and presents the brief idea of some of the improvements of BP. In the third section a performance comparative analysis of the results is made that are obtained by implementing some of improved versions of BP on a number of benchmark problems.
-
II. Bp Algorithm
The MFNN is presented with a set of exemplar cases consisting of input pattern and target pattern. The input pattern is fed directly into the input layer. The activations of the input nodes are multiplied by the weighted connections and are passed through a transfer function at each node in the first hidden layer. The activations from the first hidden layer are then passed to the neurons in the next layer, and this process is repeated until the output activations are obtained from the output layer. The output activation values and the target pattern are compared and the error signal is calculated based on the difference between target and calculated pattern. This error signal is then propagated backwards to adjust network weights so that network will generate correct output for the presented input pattern. The training patterns are presented repeatedly until the error reaches an acceptable value or other convergence criteria are satisfied. As this technique involves performing computation backwards it is named as backpropagation.
The purpose of the training process is to reduce the sum squared error function (MSE) which is given as:
Е ( п )= | ∑ с ∈ N ( tk ( П )- Ок ( п )2 (1)
tk and °к are the desired and actual outputs of neuron k. The averaged squared error over the total number of patterns N is given by:
Eav = ∑П=1 ^(и) (2)
The output of the unit j in any layer after applying the sigmoid function is given by:
=1+ е")
And the activation aj is given by
Oj =∑ wtjot i
W[j refers to the weighted connection between neuron i and neuron j . A weight change ∆ W[j is calculated as:
. ЭЕ
∆ Wtj =- ɳ
-
L dwij
ɳ is the learning rate. Once the initial weights are known the formula for the weight updation takes the following form:
Wk+1 = - ɳEw ( Wk ) (3)
Where ^w ( wk ) is partial derivative of E with respect to weight vector W Updation of weights in BP can be done in two ways: online updating and batch updating. In batch updating the weight adjustments are made only at the end of epoch i.e. after the presentation of the entire training set. On the other hand in online updating the weights are updated after every pattern presentation. In both the schemes the learning process continues until the sum squared error reaches a predefined value.
BP algorithm has many considerable advantages, such as it is computationally efficient, simple and easy to implement. However it has some disadvantages also such as it may converge to local minima and convergence to the global minima is not always guaranteed. Even if the specified termination criterion is achieved it takes long time to converge. However most of the researchers have found that most of the limitations of BP are because of the flat spot problem as well as due to choice of the initial values of the network weight connections and the parameters that are used in the algorithm such as learning rate and momentum. The flat spot problem generally occurs when the actual output is in the saturation areas i.e. ‘0’ or ‘1’. Flat spot problem makes it very difficult for a neural network to learn. It results in slow learning speed and slight weight update of the neuron network and thus taking long time for a neural network to converge. Many modifications have been proposed to improve the performance of BP, many of which include 1) momentum strategy 2) using error saturation prevention function 3) using proper weight initialisation methods 4) adjusting the steepness of sigmoid function. These are summarized below.
-
A. Momentum Strategy
The learning strategy that is used in original BP is gradient descent, considering the effect of learning rate on BP reveals that smaller the learning rate, smaller will be the changes to the synaptic weights in the network from one iteration to the next. On the other hand if the learning rate is made too large, it will result in large changes to the synaptic weights and in turn makes network unstable. A simple method of increasing the learning rate nonetheless avoiding the risk of instability of network is to add momentum coefficient to the weight updation rule. The momentum strategy [2] adds a fraction of the last weight change to the current direction of weight change. The weight updating rule for BP with momentum is given as:
Wk+1 = - ɳEw ( Wk )+ a Δ Wk-1 (4)
But it was found that a fixed momentum coefficient accelerates the learning only when the current downhill gradient of the error function and the last change in weight have the similar direction. However, when the current gradient is in an opposing direction to the previous weight change, the momentum causes the weight to be adjusted up the slope instead of down the slope. In order to make the learning more effective a number of methods have been proposed by researchers to dynamically vary the momentum coefficient.
Список литературы Comparative Study of High Speed Back- Propagation Learning Algorithms
- D.E. Rumelhart, G.E. Hinton, and R.J. Williams, "Learning internal representations by error propagation," Parallel Distributed Processing: Explorations in the Microstructure of Cognition (D. Rumelhart and J. McClelland, editors), pp 318-362, 1986.
- D.E. Rumelhart, G.E. Hinton and R.J.Williams,"Learning representations by back-propagating errors",Nature,vol 323,pp 533-536,1986.
- D.J. Swanston, J.M. Bishop and R.J. Mitchell, "Simple adaptive momentum, new algorithm for training multilayer perceptrons," Electronics Letters, 30(18), pp 1498 -1500, 1994.
- C. Yu and B. Liu, "A backpropagation algorithm with adaptive learning rate and momentum coefficient," Proc. Int. Conf. on Neural Networks (IJCNN'02), vol 2, pp 1218-1223, 2002.
- H.M. Shao, G.F. Zheng, "A new BP algorithm with adaptive momentum for FNNs training," Proc. WRI Global Congress on Intelligent Systems (GCIS'09), vol. 4, pp. 16–20, 2009.
- S.H. Oh, "Improving the error back-propagation algorithm with a modified error functions,"IEEE Trans. Neural Networks 8 (3), pp 799-803, 1997.
- S.C. Ng, S.H. Leung, A. Luk, "Fast and global convergent weight evolution algorithm based on the modified back-propagation,"IEEE International Conference on Neural Networks Proceedings,pp. 3004-3008, 1995.
- A.V. Ooyen, B. Nienhuis, "Improving the learning convergence of the back propagation algorithm," Neural Networks, vol 5, pp 465-471, 1992.
- H.M.Lee, C.M.Chen, T.C.Huang, "Learning efficiency improvement of back-propagation algorithm by error saturation prevention method,"Neurocomputing, vol 41, pp. 125-143, 2001.
- Yam, J.Y. and Chow, T.W., "A Weight initialization method for improving training speed in Feedforward neural network," Neurocomputing, Vol. 30, pp. 219-232, 2000.
- T. Masters, Practical Neural Network Recipes in C + + (Academic Press, Boston, 1993).
- Nguyen and B. Widrow, "Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights," Proc. Internat. Joint Conf on Neural Networks, San Diego Vol. 3, pp 21-26, 1990.
- X.G. Wang, Z. Tang, H. Tamura, M. Ishii, W.D. Sun, "An improved backpropagation algorithm to avoid the local minima problem,"Neurocomputing,vol 56,pp 455 – 460, 2004.
- Y. Bai, H. Zhang, Y.Hao, "The performance of the backpropagation algorithm with varying slope of the activation function," Chaos, Solitons and Fractals, vol 40, pp 69–77, 2009.
- N. M. Nawi, R. S. Ransing and M. R. Ransing, "A new method to improve the gradient based search direction to enhance the computational efficiency of back propagation based Neural Network algorithms," Proc.IEEE Second Asia International Conference on Modelling & Simulation, pp.546-551 DOI 10.1109/AMS.2008.70,2008.
- M. Gori and A. Tesi, "On the problem of local minima in backpropagation," IEEE Trans. Pattern Anal. Mach.Intell. 14 (1) pp 76–86, 1992.
- H. Ishibuchi, R. Fujioka, H. Tanaka, Neural networks that learn from fuzzy if-then rules, IEEE Trans. Fuzzy Syst. 1 (2), pp 85-97,1993.
- Saduf, M. Arif Wani, "Comparative study of adaptive learning rate with momentum and resilient back propagation algorithms for neural net classifier optimization," International Journal of Distributed and Cloud Computing, vol 2, pp. 1-6, 2014.