3

For image processing, I need an array (array type, not numpy array) for 2 million 32 bit words. If I use something like:

tb = array.array( 'i', ,(0,)*2000000)

it requires 126 msec. It's large and I don't even need to initialize the array. I don't know Python internals but I assume that the statement generate tons on malloc() (memory allocator) and free() (memory deallocator).

Is there another way to create a very large Python array?

9
  • 1
    Why wouldn't a NumPy array work for your image processing? Using NumPy for images is certainly something that's been done: scikit-image.org/docs/dev/user_guide/numpy_images.html Commented Nov 19, 2021 at 21:23
  • 2
    Is there a reason you wouldn't use numpy? Commented Nov 19, 2021 at 21:26
  • 1
    @pts Thanks for intelligent solution. It's one hundred time faster (approx 2sec). Commented Nov 20, 2021 at 21:55
  • 1
    But bytearray( 8_000_000) requires only 1msec, same for bytes(8_000_000). That's a good argument to ask for a size argument when we create an array.array(). Thanks everybody. Commented Nov 20, 2021 at 22:04
  • 1
    @pts Thanks for your intelligent solution. It's one hundred time faster (approx 2msec). Commented Nov 20, 2021 at 22:09

2 Answers 2

3

This is much faster, because it doesn't create a long, temporary tuple:

tb = array.array('i', (0,)) * 2000000
Sign up to request clarification or add additional context in comments.

1 Comment

Yup, and that's also materially faster than my .frombytes(b'\0' * 8_000_000). Cool!
1

This does the same thing, but "should" run at least 10 times faster, by avoiding the needless expense of creating, and crawling over, a multi-million element tuple of unbounded Python ints:

>>> tb = array.array('i')
>>> tb.frombytes(b'\0' * 8_000_000)
>>> len(tb)
2000000
>>> all(i == 0 for i in tb)
True

Note: I'm assuming you're running on a platform where the array typecode i denotes a 4-byte integer type (that's why I changed your 2 million to 8 million). That's very likely, but if you're not sure of that, then slightly fancier code is needed:

>>> tb = array('i')
>>> tb.frombytes(b'\0' * (2_000_000 * tb.itemsize))

Of course tb.itemsize there returns 4 on my box.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.