9

Seemingly simple question: I have an array with two columns, the first represents an ID and the second a count. I'd like to update it with another, similar array such that

import numpy as np

a = np.array([[1, 2],
              [2, 2],
              [3, 1],
              [4, 5]])

b = np.array([[2, 2],
              [3, 1],
              [4, 0],
              [5, 3]])

a.update(b)  # ????
>>> np.array([[1, 2],
              [2, 4],
              [3, 2],
              [4, 5],
              [5, 3]])

Is there a way to do this with indexing/slicing such that I don't simply have to iterate over each row?

1
  • Are those ID columns sorted? Commented Jun 4, 2015 at 20:32

3 Answers 3

4

Generic case

Approach #1: You can use np.add.at to do such an ID-based adding operation like so -

# First column of output array as the union of first columns of a,b              
out_id = np.union1d(a[:,0],b[:,0])

# Initialize second column of output array
out_count = np.zeros_like(out_id)

# Find indices where the first columns of a,b are placed in out_id
_,a_idx = np.where(a[:,None,0]==out_id)
_,b_idx = np.where(b[:,None,0]==out_id)
    
# Place second column of a into out_id & add in second column of b
out_count[a_idx] = a[:,1]
np.add.at(out_count, b_idx,b[:,1])

# Stack the ID and count arrays into a 2-column format
out = np.column_stack((out_id,out_count))

To find a_idx and b_idx, as probably a faster alternative, np.searchsorted could be used like so -

a_idx = np.searchsorted(out_id, a[:,0], side='left')
b_idx = np.searchsorted(out_id, b[:,0], side='left')

Sample input-output :

In [538]: a
Out[538]: 
array([[1, 2],
       [4, 2],
       [3, 1],
       [5, 5]])

In [539]: b
Out[539]: 
array([[3, 7],
       [1, 1],
       [4, 0],
       [2, 3],
       [6, 2]])

In [540]: out
Out[540]: 
array([[1, 3],
       [2, 3],
       [3, 8],
       [4, 2],
       [5, 5],
       [6, 2]])

Approach #2: You can use np.bincount to do the same ID based adding -

# First column of output array as the union of first columns of a,b  
out_id = np.union1d(a[:,0],b[:,0])

# Get all IDs and counts in a single arrays
id_arr = np.concatenate((a[:,0],b[:,0]))
count_arr = np.concatenate((a[:,1],b[:,1]))

# Get binned summations
summed_vals = np.bincount(id_arr,count_arr)

# Get mask of valid bins
mask = np.in1d(np.arange(np.max(out_id)+1),out_id)

# Mask valid summed bins for final counts array output
out_count = summed_vals[mask]

# Stack the ID and count arrays into a 2-column format
out = np.column_stack((out_id,out_count))

Specific case

If the ID columns in a and b are sorted, it becomes easier, as we can just use masks with np.in1d to index into the output ID array created with np.union like so -

# First column of output array as the union of first columns of a,b  
out_id = np.union1d(a[:,0],b[:,0])

# Masks of first columns of a and b matches in the output ID array
mask1 = np.in1d(out_id,a[:,0])
mask2 = np.in1d(out_id,b[:,0])

# Initialize second column of output array
out_count = np.zeros_like(out_id)

# Place second column of a into out_id & add in second column of b
out_count[mask1] = a[:,1]
np.add.at(out_count, np.where(mask2)[0],b[:,1])

# Stack the ID and count arrays into a 2-column format
out = np.column_stack((out_id,out_count))

Sample run -

In [552]: a
Out[552]: 
array([[1, 2],
       [2, 2],
       [3, 1],
       [4, 5],
       [8, 5]])

In [553]: b
Out[553]: 
array([[2, 2],
       [3, 1],
       [4, 0],
       [5, 3],
       [6, 2],
       [8, 2]])

In [554]: out
Out[554]: 
array([[1, 2],
       [2, 4],
       [3, 2],
       [4, 5],
       [5, 3],
       [6, 2],
       [8, 7]])
Sign up to request clarification or add additional context in comments.

Comments

3
>>> col=np.unique(np.hstack((b[:,0],a[:,0])))
>>> dif=np.setdiff1d(col,a[:,0])
>>> val=b[np.in1d(b[:,0],dif)]
>>> result=np.concatenate((a,val))
array([[1, 2],
       [2, 2],
       [3, 1],
       [4, 5],
       [5, 3]])

Note that if you want the result become sorted you can use np.lexsort :

result[np.lexsort((result[:,0],result[:,0]))]

Explanation :

First you can find the unique ids with following command :

>>> col=np.unique(np.hstack((b[:,0],a[:,0])))
>>> col
array([1, 2, 3, 4, 5])

Then find the different between the ids if a and all of ids :

>>> dif=np.setdiff1d(col,a[:,0])
>>> dif
array([5])

Then find the items within b with the ids in diff :

>>> val=b[np.in1d(b[:,0],dif)]
>>> val
array([[5, 3]])

And at last concatenate the result with list a:

>>> np.concatenate((a,val))

consider another example with sorting :

>>> a = np.array([[1, 2],
...               [2, 2],
...               [3, 1],
...               [7, 5]])
>>> 
>>> b = np.array([[2, 2],
...               [3, 1],
...               [4, 0],
...               [5, 3]])
>>> 
>>> col=np.unique(np.hstack((b[:,0],a[:,0])))
>>> dif=np.setdiff1d(col,a[:,0])
>>> val=b[np.in1d(b[:,0],dif)]

>>> result=np.concatenate((a,val))
>>> result[np.lexsort((result[:,0],result[:,0]))]
array([[1, 2],
       [2, 2],
       [3, 1],
       [4, 0],
       [5, 3],
       [7, 5]])

Comments

1

That's an old question but here is a solution with pandas (that could be generalized for other aggregation functions than sum). Also sorting will occur automatically:

import pandas as pd
import numpy as np

a = np.array([[1, 2],
              [2, 2],
              [3, 1],
              [4, 5]])

b = np.array([[2, 2],
              [3, 1],
              [4, 0],
              [5, 3]])

print((pd.DataFrame(a[:, 1], index=a[:, 0])
        .add(pd.DataFrame(b[:, 1], index=b[:, 0]), fill_value=0)
        .astype(int))
        .reset_index()
        .to_numpy())

Output:

[[1 2]
 [2 4]
 [3 2]
 [4 5]
 [5 3]]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.