I would like to create a NumPy array with a somewhat repetitive structure: A particular function (here, as an example, shuffle()), takes two numbers and returns an array (here with length 8, could be more though). These arrays are then concatenated.
import numpy
def shuffle(a, b):
return numpy.array([
[+a, +b], [-a, +b], [+a, -b], [-a, -b],
[+b, +a], [-b, +a], [+b, -a], [-b, -a],
])
pairs = [
(0.1, 0.2),
(3.14, 2.71),
# ... many, without a particular pattern ...
(0.707, 0.577)
]
out = numpy.concatenate([shuffle(*pair) for pair in pairs])
I suppose what happens here is that all subarrays of length 8 are independently created in memory, just to be copied over right away to form the larger array out. This gets needlessly inefficient when there are lots of pairs (a, b) or when shuffle is replaced by something that returns more data.
One way around this would be to hardcode out à la
out = numpy.array([
[+0.1, +0.2],
[-0.1, +0.2],
# ...
[-0.2, -0.1],
[+3.14, +2.71],
# ...
])
but that's obviously not desirable either.
In C, I'd perhaps use a macro parsed by the preprocessor.
Any hints on how to arrange the above code to avoid unnecessary copies?
itertools.permutationsnp.empty(dims)then fill it block-by-block, that would avoid it.