Python Numpy Random Numbers - inconsistent?

Question

I am trying to generate log-normally distributed random numbers in python (for later MC simulation), and I find the results to be quite inconsistent when parameters are a bit larger.

Below I am generating a series of LogNormals from Normals (and then using Exp) and directly from LogNormals. The resulting means are bearable, but the variances - quite imprecise.. this also holds for mu = 4,5,...

If you re-run the below code a couple of times - the results come back quite different.

Code:

import numpy as np
mu = 10;
tmp1 = np.random.normal(loc=-mu, scale=np.sqrt(mu*2),size=1e7)
tmp1 = np.exp(tmp1)
print tmp1.mean(), tmp1.var()
tmp2 = np.random.lognormal(mean=-mu, sigma=np.sqrt(mu*2), size=1e7)
print tmp2.mean(), tmp2.var()
print 'True Mean:', np.exp(0), 'True Var:',(np.exp(mu*2)-1)

Any advice how to fix this? I've tried this also on Wakari.io - so the result is consistent there as well

Update: I've taken the 'True' Mean and Variance formula from Wikipedia: https://en.wikipedia.org/wiki/Log-normal_distribution

Snapshots of results: 1)

0.798301881219 57161.0894726
1.32976988569 2651578.69947
True Mean: 1.0 True Var: 485165194.41

2)

1.20346203176 315782.004309
0.967106664211 408888.403175
True Mean: 1.0 True Var: 485165194.41

3) Last one with n=1e8 random numbers

1.17719369919 2821978.59163
0.913827160458 338931.343819
True Mean: 1.0 True Var: 485165194.41

Can you re-run the code for us a couple of times and post the results? — SethMMorton
– SethMMorton, Commented Oct 22, 2013 at 17:30
This code doesn't run, because you never imported any of those functions from anywhere. You may have wanted from numpy import sqrt, exp, but that's just a guess. — abarnert
– abarnert, Commented Oct 22, 2013 at 17:38
Beware, scale for numpy.random.normal is the standard deviation, not the variance. — Jblasco
– Jblasco, Commented Oct 22, 2013 at 17:40
Also, where did you get these algorithms from? -2*mu+mu*2 is 0, so I'm not sure what you're trying to calculate. — abarnert
– abarnert, Commented Oct 22, 2013 at 17:41
@abarnert I bet that is where the imprecision of the variance came from... — SethMMorton
– SethMMorton, Commented Oct 22, 2013 at 17:42

Robert Kern · Accepted Answer · 2013-10-22 18:23:34Z

6

Even with the large sample size that you have, with these parameters, the estimated variance is going to change wildly from run to run. That's just the nature of the fat-tailed lognormal distribution. Try running the np.exp(np.random.normal(...)).var() several times. You will see a similar swing of values as np.random.lognormal(...).var().

In any case, np.random.lognormal() is just implemented as np.exp(np.random.normal()) (well, the C equivalent).

answered Oct 22, 2013 at 18:23

Robert Kern

13.5k3 gold badges37 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

senderle Over a year ago

Yes, it's the fat tail. If you scan through the values you'll see some wild outliers.

Vytautas Over a year ago

Ok, this does make sense. I was simply not expecting that such errors would appear already when the chosen mean is up to 10, 15.

Jblasco Over a year ago

Still, if the values posted above are true, the values that Vytautas gets are a couple of orders of magnitude below what it should be. So, the outliers are actually too few. I do agree with the fat-tail, just that I think that we do not get enough of the high points (see comment by @Craig J Copi below).

Jblasco · Accepted Answer · 2013-10-22 18:25:21Z

1

Ok, as you have just built the sample, and using the notation in wikipedia (first section, mu and sigma) and the example given by you:

from numpy import log, exp, sqrt
import numpy as np
mu = -10
scale = sqrt(2*10)   # scale is sigma, not variance
tmp1 = np.random.normal(loc=mu, scale=scale, size=1e8)
# Just checking
print tmp1.mean(), tmp1.std()
# 10.0011028634 4.47048010775, perfectly accurate
tmp1_exp = exp(tmp1)    # Not sensible to use the same name for two samples
# WIKIPEDIA NOTATION!
m = tmp1_exp.mean()     # until proven wrong, this is a meassure of the mean
v = tmp1_exp.var()  # again, until proven wrong, this is sigma**2
#Now, according to wikipedia
print "This: ", log(m**2/sqrt(v+m**2)), "should be similar to", mu
# I get This:  13.9983309499 should be similar to 10
print "And this:", sqrt(log(1+v/m**2)), "should be similar to", scale
# I get And this: 3.39421327037 should be similar to 4.472135955

So, even if the values are not exactly perfect, I wouldn't claim that they are completely wrong.

answered Oct 22, 2013 at 18:25

Jblasco

3,98724 silver badges26 bronze badges

2 Comments

Vytautas Over a year ago

They were not 'completely' wrong - but I did want to use this for modelling and have reliable, 'convergent' results... This problem took me a while to find in the full code - I was sure that I have some 'math' bug somewhere...

Craig J Copi Over a year ago

@Vytautas I would think there is either a problem with the width of your distribution (your scale) or you are just going to need a lot more samples. For scale=sqrt(20) the "3-sigma" range for x in the log normal distribution spans more than 11 orders of magnitude! [That is, exp[6*scale]~10^{11.6}.] Thus I would expect to need more than 10^{11} values to properly sample the distribution.

Collectives™ on Stack Overflow

Python Numpy Random Numbers - inconsistent?

2 Answers 2

3 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related