Why we need a better learning algorithm than Backpropagation in Deep Learning

We all agree on one thing that Backpropagation is a revolutionary learning algorithm. For sure, it has helped us in the training of almost all neural network architectures. With the help of GPUs, backpropagation has reduced months of training time to hours/days of training time. It has allowed efficient training of neural networks.

I think of two reasons because of which it has gotten this widespread adoption, (1) we didn’t have anything better than backpropagation, & (2) it worked. Backpropagation is based on the chain rule of differentiation.

The problem lies in the implementation of the Backpropagation algorithm itself. To calculate gradients of the current layer we need gradients of the next layer, so the current layer is locked and we can’t calculate gradients until and unless we have gradients for the next layer. If we have 1000s of layers in our network, our 1st layer has to wait till eternity to get its weights updated. First few layers in the neural networks are miserable ones and don’t get updated properly. Sometimes, in case of the Sigmoid activation function, when we propagate back, gradient vanishes or explodes.

When we make decisions, we make decisions based on our current observation and our previous learning. Current neural networks or deep learning algorithms are not designed the way we make decisions. Our experience defines our decisions. For example, when we walk we use vision, audio, and sensory inputs to take decisions. We use learning from one task to learn other tasks.

Limitations of the Backpropagation algorithm:

  • It is slow, all previous layers are locked until gradients for the current layer is calculated
  • It suffers from vanishing or exploding gradients problem
  • It suffers from overfitting & underfitting problem
  • It considers predicted value & actual value only to calculate error and to calculate gradients, related to the objective function, partially related to the Backpropagation algorithm
  • It doesn’t consider the spatial, associative and dis-associative relationship between classes while calculating errors, related to the objective function, partially related to the Backpropagation algorithm

DeepMind’s synthetic gradients show a workaround, but it is not a solution. In my opinion, we have to think from scratch and design a new learning algorithm which can learn efficiently and can help our network learn in real time.

Disclaimer: This is my personal opinion and it is solely based on my studies and research. I invite you all to share your thoughts on this.

Thank you for reading.

If you want to get into contact, you can reach out to me at ahikailash1@gmail.com

About author:

Kailash Ahirwar

“I am a Co-Founder of MateLabs, where we have built Mateverse, an ML Platform which enables everyone to easily build and train Machine Learning Models, without writing a single line of code.”

Note: Recently, I published a book on GANs titled “Generative Adversarial Networks Projects”, in which I covered most of the widely popular GAN architectures and their implementations. DCGAN, StackGAN, CycleGAN, Pix2pix, Age-cGAN, and 3D-GAN have been covered in details at the implementation level. Each architecture has a chapter dedicated to it. I have explained these networks in a very simple and descriptive language using Keras framework with Tensorflow backend. If you are working on GANs or planning to use GANs, give it a read and share your valuable feedback with me at ahikailash1@gmail.com

You can grab a copy of the book from http://www.amazon.com/Generative-Adversarial-Networks-Projects-next-generation/dp/1789136679https://www.amazon.in/Generative-Adversarial-Networks-Projects-next-generation/dp/1789136679?fbclid=IwAR0X2pDk4CTxn5GqWmBbKIgiB38WmFX-sqCpBNI8k9Z8I-KCQ7VWRpJXm7I https://www.packtpub.com/big-data-and-business-intelligence/generative-adversarial-networks-projects?fbclid=IwAR2OtU21faMFPM4suH_HJmy_DRQxOVwJZB0kz3ZiSbFb_MW7INYCqqV7U0c

Related Articles