I was reading about momentum and I was trying to implement the equation of momentum in my mini-batch code.

The problem is that it is not working the regression line is going too far from the ideal line and I`m not sure if the implementation is correct.
def stochastic_gradient_descent_step(m,b,data_sample):
n_points = data_sample.shape[0] #size of data
m_grad = 0
b_grad = 0
stepper = 0.0001 #this is the learning rate
z_m = 1.0
z_b = 1.0
betha = 0.81
for i in range(n_points):
#Get current pair (x,y)
x = data_sample[i,0]
y = data_sample[i,1]
if(math.isnan(x)|math.isnan(y)): #it will prevent for crashing when some data is missing
#print("is nan")
continue
#you will calculate the partical derivative for each value in data
#Partial derivative respect 'm'
dm = -((2/n_points) * x * (y - (m*x + b)))
#Partial derivative respect 'b'
db = - ((2/n_points) * (y - (m*x + b)))
#Update gradient
m_grad = m_grad + dm
b_grad = b_grad + db
#calculate the momentum
z_m = betha*z_m + m_grad
z_b = betha*z_b + b_grad
#Set the new 'better' updated 'm' and 'b'
m_updated = m - stepper*z_m
b_updated = b - stepper*z_b
return m_updated,b_updated
Edited
I have edited my code now and as Sasha suggested me I put the gradient calculation in one function and the momentum in other and I put z_m and z_b as global so they don't lose their value in each iteration.
z_m =0.0 #initilise to 0
z_b =0.0 #initilise to 0
def getGradient(m,b,data_sample):
global z_m
global z_b
n_points = data_sample.shape[0] #size of data
m_grad = 0
b_grad = 0
stepper = 0.0001 #this is the learning rate
betha = 0.81
for i in range(n_points):
#Get current pair (x,y)
x = data_sample[i,0]
y = data_sample[i,1]
if(math.isnan(x)|math.isnan(y)): #it will prevent for crashing when some data is missing
#print("is nan")
continue
#you will calculate the partical derivative for each value in data
#Partial derivative respect 'm'
dm = -((2/n_points) * x * (y - (m*x + b)))
#Partial derivative respect 'b'
db = - ((2/n_points) * (y - (m*x + b)))
#Update gradient
m_grad = m_grad + dm
b_grad = b_grad + db
return m_grad,b_grad
def calculateMomentum(m_grad,b_grad,betha=0.81,stepper=0.0001):
global z_m,z_b
#calculate the momentum
z_m = betha*z_m + m_grad
z_b = betha*z_b + b_grad
#Set the new 'better' updated 'm' and 'b'
m_updated = m - stepper*z_m
b_updated = b - stepper*z_b
return m_updated,b_updated
Now the regression line is calculated correctly (maybe). With SGD the final error is 59706304 and with momentum the final error is 56729062, but it could be for the random mini-batch choosen at the moment of calculating the gradient.

