10

Suppose i have the following two arrays with means and standard deviations:

mu = np.array([2000, 3000, 5000, 1000])
sigma = np.array([250, 152, 397, 180])

Then:

a = np.random.normal(mu, sigma)

In [1]: a
Out[1]: array([1715.6903716 , 3028.54168667, 4731.34048645, 933.18903575])

However, if i ask for 100 draws for each element of mu, sigma:

a = np.random.normal(mu, sigma, 100)

a = np.random.normal(mu, sigma, 100)
Traceback (most recent call last):

File "<ipython-input-417-4aadd7d15875>", line 1, in <module>
a = np.random.normal(mu, sigma, 100)

File "mtrand.pyx", line 1652, in mtrand.RandomState.normal

File "mtrand.pyx", line 265, in mtrand.cont2_array

ValueError: shape mismatch: objects cannot be broadcast to a single shape

I have also tried using a tuple for size(s):

s = (100, 100, 100, 100)
a = np.random.normal(mu, sigma, s)

What am i missing?

3 Answers 3

3

I don't believe you can control the size parameter when you pass a list/vector of values for the mean and std. Instead, you can iterate over each pair and then concatenate:

np.concatenate(
   [np.random.normal(m, s, 100) for m, s in zip(mu, sigma)]
) 

This gives you a (400, ) array. If you want a (4, 100) array instead, call np.array instead of np.concatenate.

Sign up to request clarification or add additional context in comments.

4 Comments

Thank you. This was also my guess as the documentation is not clear on this. I was hoping i could avoid iterating using a for loop though.
@user177324 Well, you can "avoid using a for loop", yes: np.array(list(map(np.random.normal, mu, sigma, [100] * len(mu)))). But if you want to know how to avoid calling the function more than once, I think it may not be possible.
Thank you, this is helpful indeed. I am just a bit worried that if i have to do this 10000 times a for loop would be considerably slower.
@user177324 Yes, if you want to generate 1 million random numbers, that would indeed be slow with a loop!
2

If you want to make only one call, the normal distribution is easy enough to shift and rescale after the fact. (I'm making up a 10000-long vector of mu and sigma from your example here):

mu = np.random.choice([2000., 3000., 5000., 1000.], 10000)               
sigma = np.random.choice([250., 152., 397., 180.], 10000)

a = np.random.normal(size=(10000, 100)) * sigma[:,None] + mu[:,None]

This works fine. You can decide if speed is an issue. On my system the following is just 50% slower:

a = np.array([np.random.normal(m, s, 100) for m,s in zip(mu, sigma)])

1 Comment

This is an excellent answer! Add some information on why this works (mathematically), and it's a perfect answer.
1

This is an old question but I had the same issue recently and the documentation is still not clear at present, so my answer may be useful to other people.

The thing is that if you want to draw n_sample samples from (uncorrelated) normal distributions with n_param different parameters, the size argument of the function needs to be a tuple (n_sample, n_param). Back to your example :

mu = np.array([2000, 3000, 5000, 1000])
sigma = np.array([250, 152, 397, 180])

n_sample = 10
n_param = len(mu)

np.random.normal(mu, sigma, (n_sample, n_param))

which returns

array([[2048.27840802, 2997.96810385, 4388.76381537,  834.58578664],
       [2284.62302217, 3057.37011582, 5141.42601472,  757.21437687],
       [1933.16814182, 3060.13736788, 5431.56812414,  949.80295487],
       [2444.69699622, 3049.32584965, 4850.82175943,  772.26041345],
       [2129.87928253, 2976.20614441, 5140.33783836, 1017.96741881],
       [1906.47137372, 2829.44037933, 4894.20964032, 1245.29240452],
       [2031.94886175, 2693.19106648, 5385.33674047,  849.72485587],
       [2034.22639971, 3017.86916011, 5050.08920701, 1198.48286148],
       [2278.8297283 , 3036.31308636, 5043.93694099,  988.87438521],
       [1760.04486593, 2875.0750094 , 4615.1775128 ,  946.76458665]])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.