2

I have a program that takes some large NumPy arrays and, based on some outside data, grows them by adding one to randomly selected cells until the array's sum is equal to the outside data. A simplified and smaller version looks like:

import numpy as np
my_array = np.random.random_integers(0, 100, [100, 100])
## Just creating a sample version of the array, then getting it's sum:
np.sum(my_array)
499097

So, supposing I want to grow the array until its sum is 1,000,000, and that I want to do so by repeatedly selecting a random cell and adding 1 to it until we hit that sum, I'm doing something like:

diff = 1000000 - np.sum(my_array)
counter = 0
while counter < diff:
    row = random.randrange(0,99)
    col = random.randrange(0,99)
    coordinate = [row, col]
    my_array[coord] += 1
    counter += 1

Where row/col combine to return a random cell in the array, and then that cell is grown by 1. It repeats until the number of times by which it has added 1 to a random cell == the difference between the original array's sum and the target sum (1,000,000).

However, when I check the result after running this - the sum is always off. In this case after running it with the same numbers as above:

np.sum(my_array)
99667203

I can't figure out what is accounting for this massive difference. And is there a more pythonic way to go about this?

3 Answers 3

1

my_array[coordinate] does not do what you expect. It is selecting multiple rows and adding 1 to all of those entries. You could simply use my_array[row, col] instead.

You could simply write something like:

for _ in range(1000000 - np.sum(my_array)):
    my_array[random.randrange(0, 99), random.randrange(0, 99)] += 1

(or xrange instead of range if using Python 2.x)

Sign up to request clarification or add additional context in comments.

1 Comment

@WarrenWeckesser: Thanks, corrected. I had actually written that in my example, but not in the first instance for some reason.
0

Replace my_array[coord] with my_array[row][col]. Your method chose two random integers and added 1 to every entry in the rows corresponding to both integers.

Basically you had a minor misunderstanding of how numpy indexes arrays.

Edit: To make this clearer. The code posted chose two numbers, say 30 and 45, and added 1 to all 100 entries of row 30 and all 100 entries of row 45.

From this you would expect the total sum to be 100,679,697 = 200*(1,000,000 - 499,097) + 499,097

However when the random integers are identical (say, 45 and 45), only 1 is added to every entry of column 45, not 2, so in that case the sum only jumps by 100.

1 Comment

That should be my_array[row, col]. my_array[row][col] will work, but it is inefficient.
0

The problem with your original approach is that you are indexing your array with a list, which is interpreted as a sequence of indices into the row dimension, rather than as separate indices into the row/column dimensions (see here). Try passing a tuple instead of a list:

coord = row, col
my_array[coord] += 1

A much faster approach would be to find the difference between the sum over the input array and the target value, then generate an array containing the same number of random indices into the array and increment them all in one go, thus avoiding looping in Python:

import numpy as np

def grow_to_target(A, target=1000000, inplace=False):

    if not inplace:
        A = A.copy()

    # how many times do we need to increment A?
    n = target - A.sum()

    # pick n random indices into the flattened array
    idx = np.random.random_integers(0, A.size - 1, n)

    # how many times did we sample each unique index?
    uidx, counts = np.unique(idx, return_counts=True) 

    # increment the array counts times at each unique index
    A.flat[uidx] += counts

    return A

For example:

a = np.zeros((100, 100), dtype=np.int)

b = grow_to_target(a)
print(b.sum())
# 1000000

%timeit grow_to_target(a)
# 10 loops, best of 3: 91.5 ms per loop

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.