Randomly grow values in a NumPy Array

Question

I have a program that takes some large NumPy arrays and, based on some outside data, grows them by adding one to randomly selected cells until the array's sum is equal to the outside data. A simplified and smaller version looks like:

import numpy as np
my_array = np.random.random_integers(0, 100, [100, 100])
## Just creating a sample version of the array, then getting it's sum:
np.sum(my_array)
499097

So, supposing I want to grow the array until its sum is 1,000,000, and that I want to do so by repeatedly selecting a random cell and adding 1 to it until we hit that sum, I'm doing something like:

diff = 1000000 - np.sum(my_array)
counter = 0
while counter < diff:
    row = random.randrange(0,99)
    col = random.randrange(0,99)
    coordinate = [row, col]
    my_array[coord] += 1
    counter += 1

Where row/col combine to return a random cell in the array, and then that cell is grown by 1. It repeats until the number of times by which it has added 1 to a random cell == the difference between the original array's sum and the target sum (1,000,000).

However, when I check the result after running this - the sum is always off. In this case after running it with the same numbers as above:

np.sum(my_array)
99667203

I can't figure out what is accounting for this massive difference. And is there a more pythonic way to go about this?

grovesNL · Accepted Answer · 2015-07-11 02:42:47Z

1

my_array[coordinate] does not do what you expect. It is selecting multiple rows and adding 1 to all of those entries. You could simply use my_array[row, col] instead.

You could simply write something like:

for _ in range(1000000 - np.sum(my_array)):
    my_array[random.randrange(0, 99), random.randrange(0, 99)] += 1

(or xrange instead of range if using Python 2.x)

edited Jul 11, 2015 at 2:42

answered Jul 10, 2015 at 19:50

grovesNL

6,1152 gold badges22 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

grovesNL Over a year ago

@WarrenWeckesser: Thanks, corrected. I had actually written that in my example, but not in the first instance for some reason.

Michael Bird · Accepted Answer · 2015-07-10 19:46:36Z

0

Replace my_array[coord] with my_array[row][col]. Your method chose two random integers and added 1 to every entry in the rows corresponding to both integers.

Basically you had a minor misunderstanding of how numpy indexes arrays.

Edit: To make this clearer. The code posted chose two numbers, say 30 and 45, and added 1 to all 100 entries of row 30 and all 100 entries of row 45.

From this you would expect the total sum to be 100,679,697 = 200*(1,000,000 - 499,097) + 499,097

However when the random integers are identical (say, 45 and 45), only 1 is added to every entry of column 45, not 2, so in that case the sum only jumps by 100.

edited Jul 10, 2015 at 19:46

answered Jul 10, 2015 at 19:38

Michael Bird

788 bronze badges

1 Comment

Warren Weckesser Over a year ago

That should be my_array[row, col]. my_array[row][col] will work, but it is inefficient.

ali_m · Accepted Answer · 2015-07-11 01:13:50Z

The problem with your original approach is that you are indexing your array with a list, which is interpreted as a sequence of indices into the row dimension, rather than as separate indices into the row/column dimensions (see here). Try passing a tuple instead of a list:

coord = row, col
my_array[coord] += 1

A much faster approach would be to find the difference between the sum over the input array and the target value, then generate an array containing the same number of random indices into the array and increment them all in one go, thus avoiding looping in Python:

import numpy as np

def grow_to_target(A, target=1000000, inplace=False):

    if not inplace:
        A = A.copy()

    # how many times do we need to increment A?
    n = target - A.sum()

    # pick n random indices into the flattened array
    idx = np.random.random_integers(0, A.size - 1, n)

    # how many times did we sample each unique index?
    uidx, counts = np.unique(idx, return_counts=True) 

    # increment the array counts times at each unique index
    A.flat[uidx] += counts

    return A

For example:

a = np.zeros((100, 100), dtype=np.int)

b = grow_to_target(a)
print(b.sum())
# 1000000

%timeit grow_to_target(a)
# 10 loops, best of 3: 91.5 ms per loop

Collectives™ on Stack Overflow

Randomly grow values in a NumPy Array

3 Answers 3

1 Comment

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related