1

What I want to do is to loop over an image pixel by pixel using each pixel value to draw a circle in another corresponding image. enter image description here My approach is as follows:

it = np.nditer(pixels, flags=['multi_index'])
while not it.finished:
    y, x = it.multi_index
    color = it[0]
    it.iternext()
    center = (x*20 + 10, y*20 + 10) # corresponding circle center
    cv2.circle(circles, center, int(8 * color/255), 255, -1)

Looping this way is somewhat slow. I tried adding the @njit decorator of numba, but apparently it has problems with opencv.

Input images are 32x32 pixels They map to output images that are 32x32 circles each circle is drawn inside a 20x20 pixels square That is, the output image is 640x640 pixels

A single image takes around 100ms to be transformed to circles, and I was hoping to lower that to 30ms or lower

Any recommendations?

9
  • 3
    In this one only 4 circles should be drawn. How slow it is and how fast should it be? The input image is 2D and the output is 3D? Are input images squares? They only contain squares? Please add some sample (input) images. Commented Apr 10, 2022 at 15:50
  • maybe first run without cv2.circle to check if problem makes looping or cv2.circle Commented Apr 10, 2022 at 16:43
  • "efficiently" is irrelevant if you have few pixels. what performance do you have, and what do you need? -- numba's njit only works on python code. any calls into library code (numpy, opencv) will not be optimized because that's already compiled Commented Apr 10, 2022 at 17:09
  • @CristiFati The input images are grayscale raster images, and the output image is also a 2D pixels image. I usually work on 32 x 32 pixel images, but a single image with these dimensions takes around 100 ms to be transformed into circles. Applying that to videos, only FPS of 10 will be achieved in the best case scenario. I was hoping to optimize that to reach 30 FPS. Commented Apr 10, 2022 at 17:52
  • 1
    use lookup tables then. pre-draw a circle for each level of 0 .. 255 and then copy those sprites as needed. Commented Apr 10, 2022 at 19:51

1 Answer 1

5

When:

  • Dealing with drawings

  • The number of possible options does not exceed a common sense value (in this case: 256)

  • Speed is important (I guess that's always the case)

  • There's no other restriction preventing this approach

the best way would be to "cache" the drawings (draw them upfront (or on demand depending on the needed overhead) in another array), and when the drawing should normally take place, simply take the appropriate drawing from the cache and place it in the target area (as @ChristophRackwitz stated in one of the comments), which is a very fast NumPy operation (compared to drawing).

As a side note, this is a generic method not necessarily limited to drawings.

But the results you claim you're getting: ~100 ms per one 32x32 image (to a 640x640 circles one), didn't make any sense to me (as OpenCV is also fast, and 1024 circles shouldn't be such a big deal), so I created a program to convince myself.

code00.py:

#!/usr/bin/env python

import itertools as its
import sys
import time

import cv2
import numpy as np


def draw_img_orig(arr_in, arr_out):
    factor = round(arr_out.shape[0] / arr_in.shape[0])
    factor_2 = factor // 2
    it = np.nditer(arr_in, flags=["multi_index"])
    while not it.finished:
        y, x = it.multi_index
        color = it[0]
        it.iternext()
        center = (x * factor + factor_2, y * factor + factor_2) # corresponding circle center
        cv2.circle(arr_out, center, int(8 * color / 255), 255, -1)


def draw_img_regular_iter(arr_in, arr_out):
    factor = round(arr_out.shape[0] / arr_in.shape[0])
    factor_2 = factor // 2
    for row_idx, row in enumerate(arr_in):
        for col_idx, col in enumerate(row):
            cv2.circle(arr_out, (col_idx * factor + factor_2, row_idx * factor + factor_2), int(8 * col / 255), 255, -1)


def draw_img_cache(arr_in, arr_out, cache):
    factor = round(arr_out.shape[0] / arr_in.shape[0])
    it = np.nditer(arr_in, flags=["multi_index"])
    while not it.finished:
        y, x = it.multi_index
        yf = y * factor
        xf = x *factor
        arr_out[yf: yf + factor, xf: xf + factor] = cache[it[0]]
        it.iternext()


def generate_input_images(shape, count, dtype=np.uint8):
    return np.random.randint(256, size=(count,) + shape, dtype=dtype)


def generate_circles(shape, dtype=np.uint8, count=256, rad_func=lambda arg: int(8 * arg / 255), color=255):
    ret = np.zeros((count,) + shape, dtype=dtype)
    cy = shape[0] // 2
    cx = shape[1] // 2
    for idx, arr in enumerate(ret):
        cv2.circle(arr, (cx, cy), rad_func(idx), color, -1)
    return ret


def test_draw(imgs_in, img_out, count, draw_func, *draw_func_args):
    print("\nTesting {:s}".format(draw_func.__name__))
    start = time.time()
    for i, e in enumerate(its.cycle(range(imgs_in.shape[0]))):
        draw_func(imgs_in[e], img_out, *draw_func_args)
        if i >= count:
            break
    print("Took {:.3f} seconds ({:d} images)".format(time.time() - start, count))


def test_speed(shape_in, shape_out, dtype=np.uint8):
    imgs_in = generate_input_images(shape_in, 50, dtype=dtype)
    #print(imgs_in.shape, imgs_in)
    img_out = np.zeros(shape_out, dtype=dtype)
    circles = generate_circles((shape_out[0] // shape_in[0], shape_out[1] // shape_in[1]))
    count = 250
    funcs_data = (
        (draw_img_orig,),
        (draw_img_regular_iter,),
        (draw_img_cache, circles),
    )
    for func_data in funcs_data:
        test_draw(imgs_in, img_out, count, func_data[0], *func_data[1:])


def test_accuracy(shape_in, shape_out, dtype=np.uint8):
    img_in = np.arange(np.product(shape_in), dtype=dtype).reshape(shape_in)
    circles = generate_circles((shape_out[0] // shape_in[0], shape_out[1] // shape_in[1]))
    funcs_data = (
        (draw_img_orig, "orig.png"),
        (draw_img_regular_iter, "regit.png"),
        (draw_img_cache, "cache.png", circles),
    )
    imgs_out = [np.zeros(shape_out, dtype=dtype) for _ in funcs_data]
    for idx, func_data in enumerate(funcs_data):
        func_data[0](img_in, imgs_out[idx], *func_data[2:])
        cv2.imwrite(func_data[1], imgs_out[idx])
    for idx, img in enumerate(imgs_out[1:], start=1):
        if not np.array_equal(img, imgs_out[0]):
            print("Image index different: {:d}".format(idx))


def main(*argv):
    dt = np.uint8
    shape_in = (32, 32)
    factor_io = 20
    shape_out = tuple(i * factor_io for i in shape_in)
    test_speed(shape_in, shape_out, dtype=dt)
    test_accuracy(shape_in, shape_out, dtype=dt)


if __name__ == "__main__":
    print("Python {:s} {:03d}bit on {:s}\n".format(" ".join(elem.strip() for elem in sys.version.split("\n")),
                                                   64 if sys.maxsize > 0x100000000 else 32, sys.platform))
    rc = main(*sys.argv[1:])
    print("\nDone.\n")
    sys.exit(rc)

Notes:

  • Besides your implementation that uses np.nditer (which I placed in a function called draw_img_orig), I created 2 more:

    • One that iterates the input array Pythonicly (draw_img_regular_iter)

    • One that uses cached circles, and also iterates via np.nditer (draw_img_cache)

  • In terms of tests, there are 2 of them - each being performed on every of the 3 (above) approaches:

    • Speed: measure the time took to process a number of images

    • Accuracy: measure the output for a 32x32 input containing the interval [0, 255] (4 times)

Output:

[cfati@CFATI-5510-0:e:\Work\Dev\StackOverflow\q071818080]> sopr.bat
### Set shorter prompt to better fit when pasted in StackOverflow (or other) pages ###

[prompt]> dir /b
code00.py

[prompt]> "e:\Work\Dev\VEnvs\py_pc064_03.09_test0\Scripts\python.exe" code00.py
Python 3.9.9 (tags/v3.9.9:ccb0e6a, Nov 15 2021, 18:08:50) [MSC v.1929 64 bit (AMD64)] 064bit on win32


Testing draw_img_orig
Took 0.908 seconds (250 images)

Testing draw_img_regular_iter
Took 1.061 seconds (250 images)

Testing draw_img_cache
Took 0.426 seconds (250 images)

Done.

[prompt]>
[prompt]> dir /b
cache.png
code00.py
orig.png
regit.png

Above there are the speed test results: as seen, your approach took a bit less than a second for 250 images!!! So I was right, I don't know where your slowness comes from, but it's not from here (maybe you got the measurements wrong?).
The regular method is a bit slower, while the cached one is ~2X faster.
I ran the code on my laptop:

  • Win 10 pc064
  • CPU: Intel i7 6820HQ @ 2.70GHz (fairly old)
  • GPU: not relevant, as I didn't notice any spikes during execution

Regarding the accuracy test, all (3) output arrays are identical (there's no message saying otherwise), here's one saved image:

img0

Sign up to request clarification or add additional context in comments.

1 Comment

Turns out I was not calculating the FPS accurately. The proper measurement was 170 FPS without caching. After caching the circles, the FPS was 270. Thanks a lot buddy <3

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.