numpy array indicator operation [duplicate]

Question

I want to modify an empty bitmap by given indicators (x and y axis). For every coordinate given by the indicators the value should be raised by one.

So far so good everything seems to work. But if I have some similar indicators in my array of indicators it will only raise the value once.

>>> img
array([[0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0]])

>>> inds
array([[0, 0],
       [3, 4],
       [3, 4]])

Operation:

>>> img[inds[:,1], inds[:,0]] += 1

Result:

>>> img
    array([[1, 0, 0, 0, 0],
           [0, 0, 0, 0, 0],
           [0, 0, 0, 0, 0],
           [0, 0, 0, 0, 0],
           [0, 0, 0, 1, 0]])

Expected result:

>>> img
    array([[1, 0, 0, 0, 0],
           [0, 0, 0, 0, 0],
           [0, 0, 0, 0, 0],
           [0, 0, 0, 0, 0],
           [0, 0, 0, 2, 0]])

Does someone have an idea how to solve this? Preferably a fast approach without the use of loops.

Also see stackoverflow.com/a/7435155/8033585

AGN Gazer
– AGN Gazer

2018-05-30 01:26:37 +00:00
Commented May 30, 2018 at 1:26 — AGN Gazer
– AGN Gazer, Commented May 30, 2018 at 1:26

jpp · Accepted Answer · 2018-05-30 01:18:34Z

6

This is one way. Counting algorithm courtesy of @AlexRiley.

For performance implications of relative sizes of img and inds, see @PaulPanzer's answer.

# count occurrences of each row and return array
counts = (inds[:, None] == inds).all(axis=2).sum(axis=1)

# apply indices and counts
img[inds[:,1], inds[:,0]] += counts

print(img)

array([[1, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 2, 0]])

edited May 30, 2018 at 1:18

answered May 29, 2018 at 23:27

jpp

166k37 gold badges301 silver badges362 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

miradulo · Accepted Answer · 2018-05-30 00:44:00Z

5

You could use numpy.add.at with a bit of manipulation to get the indices ready.

np.add.at(img, tuple(inds[:, [1, 0]].T), 1)

If you have larger inds arrays, this approach should remain fast... (though Paul Panzer's solution is faster)

edited May 30, 2018 at 0:44

answered May 29, 2018 at 23:41

miradulo

29.8k7 gold badges86 silver badges97 bronze badges

Comments

Paul Panzer · Accepted Answer · 2018-05-30 01:01:47Z

4

Two remarks on the other two answers:

1) @jpp's can be improved by using np.unique with the axis and return_counts keywords.

2) If we translate to flat indexing we can use np.bincount which often (but not always, see last test case in benchmarks) is faster than np.add.at.

Thanks @miradulo for initial version of benchmarks.

import numpy as np

def jpp(img, inds):
    counts = (inds[:, None] == inds).all(axis=2).sum(axis=1)
    img[inds[:,1], inds[:,0]] += counts

def jpp_pp(img, inds):
    unq, cnts = np.unique(inds, axis=0, return_counts=True)
    img[unq[:,1], unq[:,0]] += cnts

def miradulo(img, inds):
    np.add.at(img, tuple(inds[:, [1, 0]].T), 1)

def pp(img, inds):
    imgf = img.ravel()
    indsf = np.ravel_multi_index(inds.T[::-1], img.shape[::-1])
    imgf += np.bincount(indsf, None, img.size)

inds = np.random.randint(0, 5, (3, 2))
big_inds = np.random.randint(0, 5, (10000, 2))
sml_inds = np.random.randint(0, 1000, (5, 2))
from timeit import timeit


for f in jpp, jpp_pp, miradulo, pp:
    print(f.__name__)
    for i, n, a in [(inds, 1000, 5), (big_inds, 10, 5), (sml_inds, 10, 1000)]:
        img = np.zeros((a, a), int)
        print(timeit("f(img, i)", globals=dict(img=img, i=i, f=f), number=n) * 1000 / n, 'ms')

Output:

jpp
0.011815106990979984 ms
2623.5026352020213 ms
0.04642329877242446 ms
jpp_pp
0.041291153989732265 ms
5.418520100647584 ms
0.05826510023325682 ms
miradulo
0.007099648006260395 ms
0.7788308983435854 ms
0.009103797492571175 ms
pp
0.0035401539935264736 ms
0.06540440081153065 ms
3.486583800986409 ms

edited May 30, 2018 at 1:01

answered May 30, 2018 at 0:42

Paul Panzer

53.3k3 gold badges59 silver badges103 bronze badges

2 Comments

miradulo Over a year ago

Ahh this is nice, I'll get rid of my benchmark in favor of yours.

Paul Panzer Over a year ago

@jpp There is one scenario where bincount should not be the best which is large img, small inds. I'll try and add that to the benchmarks.

Collectives™ on Stack Overflow

numpy array indicator operation [duplicate]

3 Answers 3

Comments

Comments

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

2 Comments

Linked

Related