I was trying to build a gradient descent function in python. I have used the binary-crossentropy as the loss function and sigmoid as the activation function.
def sigmoid(x):
return 1/(1+np.exp(-x))
def binary_crossentropy(y_pred,y):
epsilon = 1e-15
y_pred_new = np.array([max(i,epsilon) for i in y_pred])
y_pred_new = np.array([min(i,1-epsilon) for i in y_pred_new])
return -np.mean(y*np.log(y_pred_new) + (1-y)*np.log(1-y_pred_new))
def gradient_descent(X, y, epochs=10, learning_rate=0.5):
features = X.shape[0]
w = np.ones(shape=(features, 1))
bias = 0
n = X.shape[1]
for i in range(epochs):
weighted_sum = w.T@X + bias
y_pred = sigmoid(weighted_sum)
loss = binary_crossentropy(y_pred, y)
d_w = (1/n)*(X@(y_pred-y).T)
d_bias = np.mean(y_pred-y)
w = w - learning_rate*d_w
bias = bias - learning_rate*d_bias
print(f'Epoch:{i}, weights:{w}, bias:{bias}, loss:{loss}')
return w, bias
So, as input I gave
X = np.array([[0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.4, 0.6, 0.2, 0.4],
[0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.9, 0.4, 0.7]])
y = 2*X[0] - 3*X[1] + 0.4
and then w, bias = gradient_descent(X, y, epochs=100) the output was w = array([[-20.95],[-29.95]]), b = -55.50000017801383, and loss:40.406546076763014. The weights are decreasing(becoming more -ve) and bias is also decreasing for more epochs. Expected output was w = [[2],[-3]], and b = 0.4.
I don't know what I am doing wrong, the loss is also not converging. It is constant throughout all the epochs.
weighted_sumthat you pass intosigmoid?weighted_sum = w.T@X + bias. Sorry I typed it incorrectly, now I have edited and corrected it.