I have a program that takes some large NumPy arrays and, based on some outside data, grows them by adding one to randomly selected cells until the array's sum is equal to the outside data. A simplified and smaller version looks like:
import numpy as np
my_array = np.random.random_integers(0, 100, [100, 100])
## Just creating a sample version of the array, then getting it's sum:
np.sum(my_array)
499097
So, supposing I want to grow the array until its sum is 1,000,000, and that I want to do so by repeatedly selecting a random cell and adding 1 to it until we hit that sum, I'm doing something like:
diff = 1000000 - np.sum(my_array)
counter = 0
while counter < diff:
row = random.randrange(0,99)
col = random.randrange(0,99)
coordinate = [row, col]
my_array[coord] += 1
counter += 1
Where row/col combine to return a random cell in the array, and then that cell is grown by 1. It repeats until the number of times by which it has added 1 to a random cell == the difference between the original array's sum and the target sum (1,000,000).
However, when I check the result after running this - the sum is always off. In this case after running it with the same numbers as above:
np.sum(my_array)
99667203
I can't figure out what is accounting for this massive difference. And is there a more pythonic way to go about this?