8

Suppose that I have two numpy arrays of the form

x = [[1,2]
     [2,4]
     [3,6]
     [4,NaN]
     [5,10]]

y = [[0,-5]
     [1,0]
     [2,5]
     [5,20]
     [6,25]]

is there an efficient way to merge them such that I have

xmy = [[0, NaN, -5  ]
       [1, 2,    0  ]
       [2, 4,    5  ]
       [3, 6,    NaN]
       [4, NaN,  NaN]
       [5, 10,   20 ]
       [6, NaN,  25 ]

I can implement a simple function using search to find the index but this is not elegant and potentially inefficient for a lot of arrays and large dimensions. Any pointer is appreciated.

1 Answer 1

10

See numpy.lib.recfunctions.join_by

It only works on structured arrays or recarrays, so there are a couple of kinks.

First you need to be at least somewhat familiar with structured arrays. See here if you're not.

import numpy as np
import numpy.lib.recfunctions

# Define the starting arrays as structured arrays with two fields ('key' and 'field')
dtype = [('key', np.int), ('field', np.float)]
x = np.array([(1, 2),
             (2, 4),
             (3, 6),
             (4, np.NaN),
             (5, 10)],
             dtype=dtype)

y = np.array([(0, -5),
             (1, 0),
             (2, 5),
             (5, 20),
             (6, 25)],
             dtype=dtype)

# You want an outer join, rather than the default inner join
# (all values are returned, not just ones with a common key)
join = np.lib.recfunctions.join_by('key', x, y, jointype='outer')

# Now we have a structured array with three fields: 'key', 'field1', and 'field2'
# (since 'field' was in both arrays, it renamed x['field'] to 'field1', and
#  y['field'] to 'field2')

# This returns a masked array, if you want it filled with
# NaN's, do the following...
join.fill_value = np.NaN
join = join.filled()

# Just displaying it... Keep in mind that as a structured array,
#  it has one dimension, where each row contains the 3 fields
for row in join: 
    print row

This outputs:

(0, nan, -5.0)
(1, 2.0, 0.0)
(2, 4.0, 5.0)
(3, 6.0, nan)
(4, nan, nan)
(5, 10.0, 20.0)
(6, nan, 25.0)

Hope that helps!

Edit1: Added example Edit2: Really shouldn't join with floats... Changed 'key' field to an int.

Sign up to request clarification or add additional context in comments.

5 Comments

Thanks for this insightful respond. For my stupidity, is there a simple way to convert structure array to ndarray? Thanks.
@leon - Here's one way (using the "join" array in the example...): join.view(np.float).reshape((join.size,3)) Hope that helps!
this actually doesnt work because the first column is casted as int. This is why I was asking.
@leon - Woops! I tested it, but I had everything as floats... Hmm... As far as I know there isn't a catch-all way to convert structured arrays with a mixed (e.g. int & float) dtype back to a 2d numpy array of uniform dtype... Maybe it's best to make the 'key' back to being a float? You take a risk joining based on floats, but it should let you view things as a uniform 2d array... That's not a great answer, though...
Well, this is ugly, but it works even with the mixed dtype... np.vstack([join[name] for name in join.dtype.names]).T

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.