How to implement momentum in mini-batch gradient descent with python?

Question

I was reading about momentum and I was trying to implement the equation of momentum in my mini-batch code.

The problem is that it is not working the regression line is going too far from the ideal line and I`m not sure if the implementation is correct.

def stochastic_gradient_descent_step(m,b,data_sample):

    n_points = data_sample.shape[0] #size of data
    m_grad = 0
    b_grad = 0
    stepper = 0.0001 #this is the learning rate
    z_m = 1.0
    z_b = 1.0
    betha = 0.81

    for i in range(n_points):

        #Get current pair (x,y)
        x = data_sample[i,0]
        y = data_sample[i,1]
        if(math.isnan(x)|math.isnan(y)): #it will prevent for crashing when some data is missing
            #print("is nan")
            continue

        #you will calculate the partical derivative for each value in data
        #Partial derivative respect 'm'
        dm = -((2/n_points) * x * (y - (m*x + b)))

        #Partial derivative respect 'b'
        db = - ((2/n_points) * (y - (m*x + b)))


        #Update gradient
        m_grad = m_grad + dm
        b_grad = b_grad + db

    #calculate the momentum
    z_m = betha*z_m + m_grad
    z_b = betha*z_b + b_grad
    #Set the new 'better' updated 'm' and 'b'   
    m_updated = m - stepper*z_m
    b_updated = b - stepper*z_b

return m_updated,b_updated

Edited

I have edited my code now and as Sasha suggested me I put the gradient calculation in one function and the momentum in other and I put z_m and z_b as global so they don't lose their value in each iteration.

z_m =0.0 #initilise to 0
z_b =0.0 #initilise to 0
def getGradient(m,b,data_sample):
    global z_m
    global z_b
    n_points = data_sample.shape[0] #size of data
    m_grad = 0
    b_grad = 0
    stepper = 0.0001 #this is the learning rate

    betha = 0.81

    for i in range(n_points):

        #Get current pair (x,y)
        x = data_sample[i,0]
        y = data_sample[i,1]
        if(math.isnan(x)|math.isnan(y)): #it will prevent for crashing when some data is missing
            #print("is nan")
            continue

        #you will calculate the partical derivative for each value in data
        #Partial derivative respect 'm'
        dm = -((2/n_points) * x * (y - (m*x + b)))

        #Partial derivative respect 'b'
        db = - ((2/n_points) * (y - (m*x + b)))


        #Update gradient
        m_grad = m_grad + dm
        b_grad = b_grad + db


    return m_grad,b_grad

def calculateMomentum(m_grad,b_grad,betha=0.81,stepper=0.0001):
    global z_m,z_b
    #calculate the momentum
    z_m = betha*z_m + m_grad
    z_b = betha*z_b + b_grad
    #Set the new 'better' updated 'm' and 'b'   
    m_updated = m - stepper*z_m
    b_updated = b - stepper*z_b
    return m_updated,b_updated

Now the regression line is calculated correctly (maybe). With SGD the final error is 59706304 and with momentum the final error is 56729062, but it could be for the random mini-batch choosen at the moment of calculating the gradient.

Not working is the classic useless description here on SO! — sascha
– sascha, Commented Jul 27, 2017 at 0:00
You can see the rest of the code in my github file github.com/matvi/GradientDescent/blob/master/SGD.ipynb — mavi
– mavi, Commented Jul 27, 2017 at 0:06
That momentum-usage makes no sense here! The momentum is some form of state between weight-updates. Your's is only living for one update and then lost as the function is finished (your code would need to be refactored). Apart from that, those calculations also look wrong (imagine a gradient of 0.00001; you are always adding 0.81 to that; obviously that's not good). — sascha
– sascha, Commented Jul 27, 2017 at 0:15
I know what momentum is used for. But you don't seem to understand the logic. It's a state which persists between mini-batches. So those can't be local variables within your mini-batch function. I think you should got the idea and can refactor your code. You probably want to avoid doing the step in that function at all; making it a pure: calc-gradient function. Then the momentum-smoothing can be used in one outer function. — sascha
– sascha, Commented Jul 27, 2017 at 15:19

lejlot · Accepted Answer · 2017-07-27 00:31:00Z

0

First of all initialisation is invalid, z_m and z_b should be initialised to 0 (as this is your first guess of the gradient). Second of all in the current, functional form you never "store" z_m or z_b for next iteration, so they do get reset (to the invalid value of 1)

answered Jul 27, 2017 at 0:31

lejlot

67k9 gold badges138 silver badges168 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

mavi Over a year ago

Sasha says that " It's a state which persists between mini-batches" .

Collectives™ on Stack Overflow

How to implement momentum in mini-batch gradient descent with python?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related