2
\$\begingroup\$

I need to do the following in Python 3.x:

  1. Interpret an array of bytes as an array of single-precision floats.
  2. Then group each four consecutive floats into subarrays, i.e. transform [a,b,c,d,e,f,g,h] into [[a,b,c,d], [e,f,g,h]]. The subarrays are called pixels, and the array of pixels forms an image.
  3. Flip the image vertically.

Here is what I have now:

floats = array.array('f')
floats.fromstring(tile_data)
pix = []
for y in range(tile_h - 1, -1, -1):
    stride = tile_w * 4
    start_index = y * stride
    end_index = start_index + stride
    pix.extend(floats[i:i + 4] for i in range(start_index, end_index, 4))

tile_data is the input array of raw bytes, tile_w and tile_h are respectively the width and height of the image, pix is the final upside down image.

While this code works correctly, it takes around 50 ms to complete on my machine for a 256x256 image.

Is there anything obviously slow in this code? Would numpy be a potentially good avenue for optimization?


Edit: here is a standalone program to run the code and measure performance:

import array
import random
import struct
import time

# Size of the problem.
tile_w = 256
tile_h = 256

# Generate input data.
tile_data = []
for f in (random.uniform(0.0, 1.0) for _ in range(tile_w * tile_h * 4)):
    tile_data.extend(struct.pack("f", f))
tile_data = bytes(tile_data)

start_time = time.time()

# Code of interest.
floats = array.array('f')
floats.fromstring(tile_data)
pix = []
for y in range(tile_h - 1, -1, -1):
    stride = tile_w * 4
    start_index = y * stride
    end_index = start_index + stride
    pix.extend(floats[i:i + 4] for i in range(start_index, end_index, 4))

print("runtime: {0} ms".format((time.time() - start_time) * 1000))
\$\endgroup\$

1 Answer 1

1
\$\begingroup\$

Would numpy be a potentially good avenue for optimization?

Yes. In general, pushing python loops into C extensions often makes sense.

You might prefer to start the timer after pix = [], since you're not focused on improving fromstring's performance.

There's nothing obviously slow, beyond bulk data movement and regrouping, and it would take changing the problem if you wanted to introduce a level of indirection to avoid that work. One could hoist the stride constant out of the loop, but that's pretty far down the list of worries.

\$\endgroup\$
1
  • \$\begingroup\$ Thanks for your input! Regarding "you're not focused on improving fromstring's performance", yes I actually am. Basically I'm reading pixel data from a child process' stdout so I need the bytes->floats conversion. Turns out it insignicant compared to the loop below. As expected, hoisting stride out of the loop has no measurable incidence. Thanks again! \$\endgroup\$ Commented Jul 30, 2017 at 20:27

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.