Differences between numpy.random.rand vs numpy.random.randn in Python

Question

What are the differences between numpy.random.rand and numpy.random.randn?

From the documentation, I know the only difference between them is the probabilistic distribution each number is drawn from, but the overall structure (dimension) and data type used (float) is the same. I have a hard time debugging a neural network because of this.

Specifically, I am trying to re-implement the Neural Network provided in the Neural Network and Deep Learning book by Michael Nielson. The original code can be found here. My implementation was the same as the original; however, I instead defined and initialized weights and biases with numpy.random.rand in the init function, rather than the numpy.random.randn function as shown in the original.

However, my code that uses random.rand to initialize weights and biases does not work. The network won't learn and the weights and biases will not change.

What is the difference(s) between the two random functions that cause this weirdness?

The former draws from a uniform distribution and the latter from a normal distribution. "Why does initial weights drawn from a normal distribution work better in deep learning" is more suited for Cross Validated though. This is not related to numpy or a deep learning framework at all. — user2285236
– user2285236, Commented Nov 11, 2017 at 16:46
@ayhan thanks for comment. I thought this was a numpy problem not the initial weights problem because even if I initialize the weights as zeros, I have worst performance than initialize with random.randn, but the network does still learn. While if I use the random.rand the network just keep repeating the initial result over and over and didn't learn anything. — Phúc Lê
– Phúc Lê, Commented Nov 11, 2017 at 16:59

Dylan · Accepted Answer · 2021-07-28 23:32:13Z

146

First, as you see from the documentation numpy.random.randn generates samples from the normal distribution, while numpy.random.rand from a uniform distribution (in the range [0,1)).

Second, why did the uniform distribution not work? The main reason is the activation function, especially in your case where you use the sigmoid function. The plot of the sigmoid looks like the following:

So you can see that if your input is away from 0, the slope of the function decreases quite fast and as a result you get a tiny gradient and tiny weight update. And if you have many layers - those gradients get multiplied many times in the back pass, so even "proper" gradients after multiplications become small and stop making any influence. So if you have a lot of weights which bring your input to those regions you network is hardly trainable. That's why it is a usual practice to initialize network variables around zero value. This is done to ensure that you get reasonable gradients (close to 1) to train your net.

However, uniform distribution is not something completely undesirable, you just need to make the range smaller and closer to zero. As one of good practices is using Xavier initialization. In this approach you can initialize your weights with:

Normal distribution. Where mean is 0 and var = sqrt(2. / (in + out)), where in - is the number of inputs to the neurons and out - number of outputs.
Uniform distribution in range [-sqrt(6. / (in + out)), +sqrt(6. / (in + out))]

edited Jul 28, 2021 at 23:32

Dylan

435 bronze badges

answered Nov 11, 2017 at 17:55

asakryukin

2,6141 gold badge16 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Phúc Lê Over a year ago

thank you. I know vanishing gradient is a thing for but I never thought that just switching from random.randn to random.rand can render the network totally useless right from the beginning.

Sam Hammamy Over a year ago

I think that's why people stopped using the sigmoid as an activation function. That book is a great intro by the way! But I wish he had coded up the ReLU instead. Still since early Neural Networks used Sigmoid, it does make sense

titus Over a year ago

did the same experiment with normalized input, 2-3 FCs, ReLU and rand init, same behaviour, doesn't converge

Mounesh Over a year ago

How mean for randn will be zero? I tried it but I didn't get the mean as 0

YaOzI · Accepted Answer · 2019-07-01 03:56:53Z

69

np.random.rand is for Uniform distribution (in the half-open interval [0.0, 1.0))
np.random.randn is for Standard Normal (aka. Gaussian) distribution (mean 0 and variance 1)

You can visually explore the differences between these two very easily:

import numpy as np
import matplotlib.pyplot as plt

sample_size = 100000
uniform = np.random.rand(sample_size)
normal = np.random.randn(sample_size)

pdf, bins, patches = plt.hist(uniform, bins=20, range=(0, 1), density=True)
plt.title('rand: uniform')
plt.show()

pdf, bins, patches = plt.hist(normal, bins=20, range=(-4, 4), density=True)
plt.title('randn: normal')
plt.show()

Which produce:

and

answered Jul 1, 2019 at 3:56

YaOzI

17.7k9 gold badges84 silver badges72 bronze badges

Comments

Perry · Accepted Answer · 2020-06-17 20:49:02Z

-2

1) numpy.random.rand from uniform (in range [0,1))

2) numpy.random.randn generates samples from the normal distribution

edited Jun 17, 2020 at 20:49

Perry

3,90326 gold badges40 silver badges51 bronze badges

answered Jun 17, 2020 at 18:15

Prasheek07

511 bronze badge

1 Comment

Wai Ha Lee Over a year ago

This doesn't add anything that wasn't said three years ago.

Collectives™ on Stack Overflow

Differences between numpy.random.rand vs numpy.random.randn in Python

3 Answers 3

4 Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related