1

I am trying to generate log-normally distributed random numbers in python (for later MC simulation), and I find the results to be quite inconsistent when parameters are a bit larger.

Below I am generating a series of LogNormals from Normals (and then using Exp) and directly from LogNormals. The resulting means are bearable, but the variances - quite imprecise.. this also holds for mu = 4,5,...

If you re-run the below code a couple of times - the results come back quite different.

Code:

import numpy as np
mu = 10;
tmp1 = np.random.normal(loc=-mu, scale=np.sqrt(mu*2),size=1e7)
tmp1 = np.exp(tmp1)
print tmp1.mean(), tmp1.var()
tmp2 = np.random.lognormal(mean=-mu, sigma=np.sqrt(mu*2), size=1e7)
print tmp2.mean(), tmp2.var()
print 'True Mean:', np.exp(0), 'True Var:',(np.exp(mu*2)-1)

Any advice how to fix this? I've tried this also on Wakari.io - so the result is consistent there as well

Update: I've taken the 'True' Mean and Variance formula from Wikipedia: https://en.wikipedia.org/wiki/Log-normal_distribution

Snapshots of results: 1)

0.798301881219 57161.0894726
1.32976988569 2651578.69947
True Mean: 1.0 True Var: 485165194.41

2)

1.20346203176 315782.004309
0.967106664211 408888.403175
True Mean: 1.0 True Var: 485165194.41

3) Last one with n=1e8 random numbers

1.17719369919 2821978.59163
0.913827160458 338931.343819
True Mean: 1.0 True Var: 485165194.41
16
  • Can you re-run the code for us a couple of times and post the results? Commented Oct 22, 2013 at 17:30
  • This code doesn't run, because you never imported any of those functions from anywhere. You may have wanted from numpy import sqrt, exp, but that's just a guess. Commented Oct 22, 2013 at 17:38
  • Beware, scale for numpy.random.normal is the standard deviation, not the variance. Commented Oct 22, 2013 at 17:40
  • 5
    Also, where did you get these algorithms from? -2*mu+mu*2 is 0, so I'm not sure what you're trying to calculate. Commented Oct 22, 2013 at 17:41
  • 1
    @abarnert I bet that is where the imprecision of the variance came from... Commented Oct 22, 2013 at 17:42

2 Answers 2

6

Even with the large sample size that you have, with these parameters, the estimated variance is going to change wildly from run to run. That's just the nature of the fat-tailed lognormal distribution. Try running the np.exp(np.random.normal(...)).var() several times. You will see a similar swing of values as np.random.lognormal(...).var().

In any case, np.random.lognormal() is just implemented as np.exp(np.random.normal()) (well, the C equivalent).

Sign up to request clarification or add additional context in comments.

3 Comments

Yes, it's the fat tail. If you scan through the values you'll see some wild outliers.
Ok, this does make sense. I was simply not expecting that such errors would appear already when the chosen mean is up to 10, 15.
Still, if the values posted above are true, the values that Vytautas gets are a couple of orders of magnitude below what it should be. So, the outliers are actually too few. I do agree with the fat-tail, just that I think that we do not get enough of the high points (see comment by @Craig J Copi below).
1

Ok, as you have just built the sample, and using the notation in wikipedia (first section, mu and sigma) and the example given by you:

from numpy import log, exp, sqrt
import numpy as np
mu = -10
scale = sqrt(2*10)   # scale is sigma, not variance
tmp1 = np.random.normal(loc=mu, scale=scale, size=1e8)
# Just checking
print tmp1.mean(), tmp1.std()
# 10.0011028634 4.47048010775, perfectly accurate
tmp1_exp = exp(tmp1)    # Not sensible to use the same name for two samples
# WIKIPEDIA NOTATION!
m = tmp1_exp.mean()     # until proven wrong, this is a meassure of the mean
v = tmp1_exp.var()  # again, until proven wrong, this is sigma**2
#Now, according to wikipedia
print "This: ", log(m**2/sqrt(v+m**2)), "should be similar to", mu
# I get This:  13.9983309499 should be similar to 10
print "And this:", sqrt(log(1+v/m**2)), "should be similar to", scale
# I get And this: 3.39421327037 should be similar to 4.472135955

So, even if the values are not exactly perfect, I wouldn't claim that they are completely wrong.

2 Comments

They were not 'completely' wrong - but I did want to use this for modelling and have reliable, 'convergent' results... This problem took me a while to find in the full code - I was sure that I have some 'math' bug somewhere...
@Vytautas I would think there is either a problem with the width of your distribution (your scale) or you are just going to need a lot more samples. For scale=sqrt(20) the "3-sigma" range for x in the log normal distribution spans more than 11 orders of magnitude! [That is, exp[6*scale]~10^{11.6}.] Thus I would expect to need more than 10^{11} values to properly sample the distribution.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.