Neural Networks: Sigmoid Activation Function for continuous output variable

Question

Okay, so I am in the middle of Andrew Ng's machine learning course on coursera and would like to adapt the neural network which was completed as part of assignment 4.

In particular, the neural network which I had completed correctly as part of the assignment was as follows:

Sigmoid activation function: g(z) = 1/(1+e^(-z))
10 output units, each which could take 0 or 1
1 hidden layer
Back-propagation method used to minimize cost function
Cost function:

$-1/m sum^m_{i=1} sum^K_{k=1} (y_k_{(i)}) log((h_theta(x^{(i)}_k) + (1-y_k^{(i)}) log(1-h_theta(x^{(i)}_k) + lambda/(2*m)(sum_{l=1}^{L-1}sum_{i=1}^{s_l} sum_{j=1}^{s_{l=1}} (Theta_{ji}^{(l)})^{2}$

where L=number of layers, s_l = number of units in layer l, m = number of training examples, K = number of output units

Now I want to adjust the exercise so that there is one continuous output unit that takes any value between [0,1] and I am trying to work out what needs to change, so far I have

Replaced the data with my own, i.e.,such that the output is continuous variable between 0 and 1
Updated references to the number of output units
Updated the cost function in the back-propagation algorithm to: $J=1/(2m) * sum^m_{i=1} (g(a_3)-y)^2 + lambda/(2*m)(sum_{l=1}^{L-1}sum_{i=1}^{s_l} sum_{j=1}^{s_{l=1}} (Theta_{ji}^{(l)})^{2}$ where a_3 is the value of the output unit determined from forward propagation.

I am certain that something else must change as the gradient checking method shows the gradient determined by back-propagation and that by the numerical approximation no longer match up. I did not change the sigmoid gradient; it is left at f(z)*(1-f(z)) where f(z) is the sigmoid function 1/(1+e^(-z))) nor did I update the numerical approximation of the derivative formula; simply (J(theta+e) - J(theta-e))/(2e).

Can anyone advise of what other steps would be required?

Coded in Matlab as follows:

% FORWARD PROPAGATION
% input layer
a1 = [ones(m,1),X];
% hidden layer
z2 = a1*Theta1';
a2 = sigmoid(z2);
a2 = [ones(m,1),a2];
% output layer
z3 = a2*Theta2';
a3 = sigmoid(z3);

% BACKWARD PROPAGATION
delta3 = a3 - y;
delta2 = delta3*Theta2(:,2:end).*sigmoidGradient(z2);
Theta1_grad = (delta2'*a1)/m;
Theta2_grad = (delta3'*a2)/m;

% COST FUNCTION
J = 1/(2 * m) * sum( (a3-y).^2 );

% Implement regularization with the cost function and gradients.
Theta1_grad(:,2:end) = Theta1_grad(:,2:end) + Theta1(:,2:end)*lambda/m;
Theta2_grad(:,2:end) = Theta2_grad(:,2:end) + Theta2(:,2:end)*lambda/m;
J = J + lambda/(2*m)*( sum(sum(Theta1(:,2:end).^2)) + sum(sum(Theta2(:,2:end).^2)));

I have since realised that this question is similar to that asked by @Mikhail Erofeev on StackOverflow, however in this case I wish the continuous variable to be between 0 and 1 and therefore use a sigmoid function.

Did you make it work, for a continuous outcome? I made it run but it stops around 40th iteration and doesn't produce a good outcome. Would be great if you could share what you ended up with? — cmelan
– cmelan, Commented Apr 9, 2015 at 4:58

lennon310 · Accepted Answer · 2013-12-18 22:50:25Z

2

First, your cost function should be:

J = 1/m * sum( (a3-y).^2 );

I think your Theta2_grad = (delta3'*a2)/m;is expected to match the numerical approximation after changed to delta3 = 1/2 * (a3 - y);).

Check this slide for more details.

EDIT: In case there is some minor discrepancy between our codes, I pasted my code below for your reference. The code has already been compared with numerical approximation function checkNNGradients(lambda);, the Relative Difference is less than 1e-4 (not meets the 1e-11 requirement by Dr.Andrew Ng though)

function [J grad] = nnCostFunctionRegression(nn_params, ...
                                   input_layer_size, ...
                                   hidden_layer_size, ...
                                   num_labels, ...
                                   X, y, lambda)

Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
                 hidden_layer_size, (input_layer_size + 1));

Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...
                 num_labels, (hidden_layer_size + 1));

m = size(X, 1);   
J = 0;
Theta1_grad = zeros(size(Theta1));
Theta2_grad = zeros(size(Theta2));


X = [ones(m, 1) X];   
z1 = sigmoid(X * Theta1');
zs = z1;
z1 = [ones(m, 1) z1];
z2 = z1 * Theta2';
ht = sigmoid(z2);


y_recode = zeros(length(y),num_labels);
for i=1:length(y)
    y_recode(i,y(i))=1;
end    
y = y_recode;


regularization=lambda/2/m*(sum(sum(Theta1(:,2:end).^2))+sum(sum(Theta2(:,2:end).^2)));
J=1/(m)*sum(sum((ht - y).^2))+regularization;
delta_3 = 1/2*(ht - y);
delta_2 = delta_3 * Theta2(:,2:end) .* sigmoidGradient(X * Theta1');

delta_cap2 = delta_3' * z1; 
delta_cap1 = delta_2' * X;

Theta1_grad = ((1/m) * delta_cap1)+ ((lambda/m) * (Theta1));
Theta2_grad = ((1/m) * delta_cap2)+ ((lambda/m) * (Theta2));

Theta1_grad(:,1) = Theta1_grad(:,1)-((lambda/m) * (Theta1(:,1)));
Theta2_grad(:,1) = Theta2_grad(:,1)-((lambda/m) * (Theta2(:,1)));


grad = [Theta1_grad(:) ; Theta2_grad(:)];

end

edited Dec 18, 2013 at 22:50

answered Dec 18, 2013 at 5:48

lennon310

12.7k11 gold badges46 silver badges63 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

user1420372 Over a year ago

Thank-you for your suggestions; I tried both updating delta3 and delta2 as you suggested, but still the gradients don't match.

lennon310 Over a year ago

@user1420372 your cost function should be a3-y rather than sigmoid(a3)-y, see my update in answer.

user1420372 Over a year ago

Thanks! I had actually just noticed that - however the gradient is still incorrect - will edit code in question to fix.

lennon310 Over a year ago

I added my code in the answer for your reference. It matches the numercial approximation, although the relative difference is 1e-4, which is larger than 1e-11

user1420372 Over a year ago

Thanks a lot for your help. I have also realised that by leaving the cost function as it was in the exercise, i.e., with the logs, the gradients match well and the output kind of looks like predicted (preliminary - still testing).

Žarko Milošević · Accepted Answer · 2017-08-07 19:07:50Z

0

If you want to have continuous output try not to use sigmoid activation when computing target value.

a1 = [ones(m, 1) X];   
a2 = sigmoid(X * Theta1');  
a2 = [ones(m, 1) z1];  
a3 = z1 * Theta2';  
ht = a3;

Normalize input before using it in nnCostFunction. Everything else remains same.

edited Aug 7, 2017 at 19:07

answered Aug 6, 2017 at 21:40

Žarko Milošević

2602 silver badges6 bronze badges

Collectives™ on Stack Overflow

Neural Networks: Sigmoid Activation Function for continuous output variable

2 Answers 2

5 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related