I am able to get pretty good results from batch gradient descent(batch size 37000) but when I try out mini-batch gradient descent, I get very poor results (even with adam and dropout).
In batch gd, I'm able to get 100% train and 97% dev/cv accuracy. Whereas in mini-batch of size 128, I'm getting only around 88% accuracy in both.
The train loss seems to revolve around 1.6 and doesn't decrease with any further iteration but slowly decreases when I increase the batch size(hence improving accuracy). And eventually I arrive at batch size of 37000 for max accuracy.
I tried tweaking alpha but still same accuracy.
I'm training the mnist digits dataset.
What could be the reason? Please help.
