0

I was trying to build a gradient descent function in python. I have used the binary-crossentropy as the loss function and sigmoid as the activation function.

def sigmoid(x):
    return 1/(1+np.exp(-x))

def binary_crossentropy(y_pred,y):
    epsilon = 1e-15
    y_pred_new = np.array([max(i,epsilon) for i in y_pred])
    y_pred_new = np.array([min(i,1-epsilon) for i in y_pred_new])
    return -np.mean(y*np.log(y_pred_new) + (1-y)*np.log(1-y_pred_new))

def gradient_descent(X, y, epochs=10, learning_rate=0.5):
    features = X.shape[0]
    w = np.ones(shape=(features, 1))
    bias = 0
    n = X.shape[1]
    for i in range(epochs):
        weighted_sum = w.T@X + bias
        y_pred = sigmoid(weighted_sum)
        
        loss = binary_crossentropy(y_pred, y)
        
        d_w = (1/n)*(X@(y_pred-y).T)
        d_bias = np.mean(y_pred-y)
        
        w = w - learning_rate*d_w
        bias = bias - learning_rate*d_bias
        
        print(f'Epoch:{i}, weights:{w}, bias:{bias}, loss:{loss}')
    return w, bias

So, as input I gave

X = np.array([[0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.4, 0.6, 0.2, 0.4], 
              [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.9, 0.4, 0.7]])
y = 2*X[0] - 3*X[1] + 0.4

and then w, bias = gradient_descent(X, y, epochs=100) the output was w = array([[-20.95],[-29.95]]), b = -55.50000017801383, and loss:40.406546076763014. The weights are decreasing(becoming more -ve) and bias is also decreasing for more epochs. Expected output was w = [[2],[-3]], and b = 0.4.

I don't know what I am doing wrong, the loss is also not converging. It is constant throughout all the epochs.

2
  • What is weighted_sum that you pass into sigmoid? Commented Mar 5, 2021 at 14:28
  • 1
    weighted_sum = w.T@X + bias. Sorry I typed it incorrectly, now I have edited and corrected it. Commented Mar 5, 2021 at 15:39

1 Answer 1

1

Usually, binary cross-entropy loss is used for binary classification task. However, here your task is a linear regression so I would prefer using Mean Square Error as loss function. Here is my suggesstion:

def gradient_descent(X, y, epochs=1000, learning_rate=0.5):
    w = np.ones((X.shape[0], 1))
    bias = 1
    n = X.shape[1]

    for i in range(epochs):
        y_pred = w.T @ X + bias

        mean_square_err = (1.0 / n) * np.sum(np.power((y - y_pred), 2))

        d_w = (-2.0 / n) * (y - y_pred) @ X.T
        d_bias = (-2.0 / n) * np.sum(y - y_pred)

        w -= learning_rate * d_w.T
        bias -= learning_rate * d_bias

        print(f'Epoch:{i}, weights:{w}, bias:{bias}, loss:{mean_square_err}')

    return w, bias


X = np.array([[0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.4, 0.6, 0.2, 0.4],
              [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.9, 0.4, 0.7]])
y = 2*X[0] - 3*X[1] + 0.4

w, bias = gradient_descent(X, y, epochs=5000, learning_rate=0.5)

print(f'w = {w}')
print(f'bias = {bias}')

Output:

w = [[ 1.99999999], [-2.99999999]]
bias = 0.40000000041096756
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.