8

I'm new to Python. While reading, please mention any other suggestions regarding ways to improve my Python code.

Question: How do I generate a 8xN dimensional array in Python containing random numbers? The constraint is that each column of this array must contain 8 draws without replacement from the integer set [1,8]. More specifically, when N = 10, I want something like this.

[[ 6.  2.  3.  4.  7.  5.  5.  7.  8.  4.]
 [ 1.  4.  5.  5.  4.  4.  8.  5.  7.  5.]
 [ 7.  3.  8.  8.  3.  8.  7.  3.  6.  7.]
 [ 3.  6.  7.  1.  5.  6.  2.  1.  5.  1.]
 [ 8.  1.  4.  3.  8.  2.  3.  4.  3.  3.]
 [ 5.  8.  1.  7.  1.  3.  6.  8.  1.  6.]
 [ 4.  5.  2.  6.  2.  1.  1.  6.  4.  2.]
 [ 2.  7.  6.  2.  6.  7.  4.  2.  2.  8.]]

To do this I use the following approach:

import numpy.random
import numpy
def rand_M(N):
    M = numpy.zeros(shape = (8, N))
    for i in range (0, N):
        M[:, i] = numpy.random.choice(8, size = 8, replace = False) + 1 
    return M

In practice N will be ~1e7. The algorithm above is O(n) in time and it takes roughly .38 secs when N=1e3. The time therefore when N = 1e7 is ~1hr (i.e. 3800 secs). There has to be a much more efficient way.

Timing the function

from timeit import Timer 
t = Timer(lambda: rand_M(1000))
print(t.timeit(5))
0.3863314103162543
2
  • I would refer to this question, seems to be what you need stackoverflow.com/questions/2709818/… Commented Aug 12, 2015 at 3:53
  • @Ericwright thanks this is helpful thought I'm not sure it answers my question. My constraint is much different. Specifically, I have to make draw without replacement. This post is a good start though so thanks! Commented Aug 12, 2015 at 7:01

4 Answers 4

9

Create a random array of specified shape and then sort along the axis where you want to keep the limits, thus giving us a vectorized and very efficient solution. This would be based on this smart answer to MATLAB randomly permuting columns differently. Here's the implementation -

Sample run -

In [122]: N = 10

In [123]: np.argsort(np.random.rand(8,N),axis=0)+1
Out[123]: 
array([[7, 3, 5, 1, 1, 5, 2, 4, 1, 4],
       [8, 4, 3, 2, 2, 8, 5, 5, 6, 2],
       [1, 2, 4, 6, 5, 4, 4, 3, 4, 7],
       [5, 6, 2, 5, 8, 2, 7, 8, 5, 8],
       [2, 8, 6, 3, 4, 7, 1, 1, 2, 6],
       [6, 7, 7, 8, 6, 6, 3, 2, 7, 3],
       [4, 1, 1, 4, 3, 3, 8, 6, 8, 1],
       [3, 5, 8, 7, 7, 1, 6, 7, 3, 5]], dtype=int64)

Runtime tests -

In [124]: def sortbased_rand8(N):
     ...:     return np.argsort(np.random.rand(8,N),axis=0)+1
     ...: 
     ...: def rand_M(N):
     ...:     M = np.zeros(shape = (8, N))
     ...:     for i in range (0, N):
     ...:         M[:, i] = np.random.choice(8, size = 8, replace = False) + 1 
     ...:     return M
     ...: 

In [125]: N = 5000

In [126]: %timeit sortbased_rand8(N)
100 loops, best of 3: 1.95 ms per loop

In [127]: %timeit rand_M(N)
1 loops, best of 3: 233 ms per loop

Thus, awaits a 120x speedup!

Sign up to request clarification or add additional context in comments.

1 Comment

Beware that there's a non-zero probability of collisions using this method: i.e. what if there are duplicate numbers in the output of np.random.rand? Also, to generate an MxN matrix like this has O(M log(M) N) time complexity, rather than the optimal O(M N). See en.wikipedia.org/wiki/Shuffling#Shuffling_algorithms
1

How about shuffling, that is to say, permuting?

import random
import numpy
from timeit import Timer 

def B_rand_M(N):
    a = numpy.arange(1,9)
    M = numpy.zeros(shape = (8, N))
    for i in range (0, N):
        M[:, i] = numpy.random.permutation(a)
    return M

# your original implementation
def J_rand_M(N):
    M = numpy.zeros(shape = (8, N))
    for i in range (0, N):
        M[:, i] = numpy.random.choice(8, size = 8, replace = False) + 1 
    return M

some timings:

def compare(N):
    for f in (J_rand_M, B_rand_M):
        t = Timer(lambda: f(N)).timeit(6)
        print 'time for %s(%s): %.6f' % (f.__name__, N, t)

for i in range(6):
    print 'N = 10^%s' % i
    compare(10**i)
    print

gives

N = 10^0
time for J_rand_M(1): 0.001199
time for B_rand_M(1): 0.000080

N = 10^1
time for J_rand_M(10): 0.001112
time for B_rand_M(10): 0.000335

N = 10^2
time for J_rand_M(100): 0.011118
time for B_rand_M(100): 0.003022

N = 10^3
time for J_rand_M(1000): 0.110887
time for B_rand_M(1000): 0.030528

N = 10^4
time for J_rand_M(10000): 1.100540
time for B_rand_M(10000): 0.304696

N = 10^5
time for J_rand_M(100000): 11.151576
time for B_rand_M(100000): 3.049474

3 Comments

thanks, however, on my computer for large N your approach is slower. I suspect this is due to the fact that you do not pre-allocation space in memory for the M. Additionally the M you return is not an array but a list. I do, however, like your permuations idea. Thanks!
@JacobH see edited answer, I added pre-allocation and made it return a numpy array - I'm on a different machine now but it still seems a bit faster than your code.
Thanks, this looks great!
0

Just a comment on your runtime analysis of the problem - my intuition is that O(n) is the best possible runtime you can possibly obtain when generating O(n) truly random numbers.

Have you tried actually running your code with n = 10 million? Your assumption that the runtime will scale by 1000 when the input grows by a factor of 1000 may not be true in practice, as there is usually a constant term when executing any program (loading libraries, etc.), which may be significant depending on the problem.

That being said, it looks like the question linked by Eric Wright does a very thorough job and can easily be adapted to fit your question.

1 Comment

yep your right, I will not be able to get a better algorithm than O(n) complexity. However, I'm however, looking for a faster approach
0

Use below code for your array generation

import numpy as np
N=1e7 # THe value you want to have
np.random.randint(1,high=8,size=(8,N))

Hope this helps, it will surely not going to take that much time.

2 Comments

this does not generate the desired result. I'll rewrite to improve clarity.
your approach does not work for me. Please see if above constraint in bold italics above.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.