1

I am new to tensorflow and trying to train the following two layer network. It seems it is not working as cross entropy is not decreasing as iteration. I think I am screwed up connecting hidden layer to output layer. Please help me if you can see the problem,

import tensorflow as tf
from scipy.io import loadmat
import numpy as np
import sys

x = loadmat('../mnist_data/ex4data1.mat')
X = x['X']

# one hot conversion
y_temp = x['y']
y_temp = np.reshape(y_temp, (len(y_temp),))
y = np.zeros((len(y_temp),10))
y[np.arange(len(y_temp)), y_temp-1] = 1.



input_size = 400
hidden1_size = 25
output_size = 10
num_iters = 50
reg_alpha = 0.05


x = tf.placeholder(tf.float32, [None, input_size], name='data')
W1 = tf.Variable(tf.zeros([hidden1_size, input_size], tf.float32, name='weights_1st_layer'))
b1 = tf.Variable(tf.zeros([hidden1_size], tf.float32), name='bias_layer_1')
W2 = tf.Variable(tf.zeros([output_size, hidden1_size], tf.float32, name='weights_2nd_layer'))
b2 = tf.Variable(tf.zeros([output_size], tf.float32), name='bias_layer_2')


hidden_op = tf.nn.relu(tf.add(tf.matmul(x, W1, transpose_b=True), b1))
output_op = tf.matmul(hidden_op, W2, transpose_b=True) + b2
pred = tf.nn.softmax(output_op) 

y_ = tf.placeholder(tf.float32, [None, 10], name='actual_labels')


cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
    labels=y_, logits=output_op))
train_step = tf.train.GradientDescentOptimizer(reg_alpha).minimize(cross_entropy)

sess = tf.InteractiveSession()
tf.global_variables_initializer().run()

for _ in range(50):
    print ('training..', _)
    print (sess.run([train_step, cross_entropy], feed_dict={x : X, y_ : y}))

corr_pred = tf.equal(tf.argmax(pred, axis=1), tf.argmax(y_, axis=1))
acc = tf.reduce_mean(tf.cast(corr_pred, tf.float32))
print (sess.run(acc, feed_dict={x:X, y_:y}))
sess.close()
1
  • 2
    I'm not familiar with TF, but it looks like you're initializing all of the weights and biases to zero. Is that correct? If so, it's a huge problem, as it prevents breaking symmetry. Commented Sep 4, 2017 at 7:02

1 Answer 1

1

Try initialising your weights as randoms, not zeros.

So instead of:

W1 = tf.Variable(tf.zeros([hidden1_size, input_size], tf.float32, name='weights_1st_layer'))
W2 = tf.Variable(tf.zeros([output_size, hidden1_size], tf.float32, name='weights_2nd_layer'))

use:

W1 = tf.Variable(tf.truncated_normal([hidden1_size, input_size], tf.float32, name='weights_1st_layer'), stddev=0.1))
W2 = tf.Variable(tf.truncated_normal([output_size, hidden1_size], tf.float32, name='weights_2nd_layer'), stddev=0.1))

Check this nice summary why initialising all the weights to zero prevents the network from training.

Sign up to request clarification or add additional context in comments.

1 Comment

yes. that works. initially i was following an implementation of single layer network. weights were initialized to 0 in case of single layer. i think the prior layer before output can have 0 initialization, layers before that must have non zero initialization to let GD work...Thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.