Mini-batch performs poorly than Batch gradient descent?

Question

I am able to get pretty good results from batch gradient descent(batch size 37000) but when I try out mini-batch gradient descent, I get very poor results (even with adam and dropout).

In batch gd, I'm able to get 100% train and 97% dev/cv accuracy. Whereas in mini-batch of size 128, I'm getting only around 88% accuracy in both.

The train loss seems to revolve around 1.6 and doesn't decrease with any further iteration but slowly decreases when I increase the batch size(hence improving accuracy). And eventually I arrive at batch size of 37000 for max accuracy.

I tried tweaking alpha but still same accuracy.

I'm training the mnist digits dataset.

What could be the reason? Please help.

Mostafa Labib · Accepted Answer · 2020-07-10 17:20:11Z

2

In Batch Gradient Descent, all the training data is taken into consideration to take a single step. In mini batch gradient descent you consider some of data before taking a single step so the model update frequency is higher than batch gradient descent.

But mini-batch gradient descent comes with a cost:

Firstly, mini-batch makes some learning problems from technically untackleable to be tackleable due to the reduced computation demand with smaller batch size.

Secondly, reduced batch size does not necessarily mean reduced gradient accuracy. The training samples many have lots of noises or outliers or biases.

I believe that because of the oscillations in mini-batch you might fell into a local minima. Try to increase the learning rate with mini-batch it may solve the problem. also try to normalize the pictures it may help too.

answered Jul 10, 2020 at 17:20

Mostafa Labib

8097 silver badges17 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Lelouche Lamperouge Over a year ago

I fixed it by decreasing the lambda from 10 to 0.1. weird fix but it works.

Lelouche Lamperouge · Accepted Answer · 2020-07-11 06:56:09Z

0

I found the solution

The lmbda value i used for batch gd (i.e 10) seems to to be too big for mini batch gd. And by decreasing it to 0.1 , i fixed the problem.

answered Jul 11, 2020 at 6:56

Lelouche Lamperouge

1915 bronze badges

Collectives™ on Stack Overflow

Mini-batch performs poorly than Batch gradient descent?

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related