merging indexed array in Python

Question

Suppose that I have two numpy arrays of the form

x = [[1,2]
     [2,4]
     [3,6]
     [4,NaN]
     [5,10]]

y = [[0,-5]
     [1,0]
     [2,5]
     [5,20]
     [6,25]]

is there an efficient way to merge them such that I have

xmy = [[0, NaN, -5  ]
       [1, 2,    0  ]
       [2, 4,    5  ]
       [3, 6,    NaN]
       [4, NaN,  NaN]
       [5, 10,   20 ]
       [6, NaN,  25 ]

I can implement a simple function using search to find the index but this is not elegant and potentially inefficient for a lot of arrays and large dimensions. Any pointer is appreciated.

unutbu · Accepted Answer · 2010-05-05 17:44:11Z

10

See numpy.lib.recfunctions.join_by

It only works on structured arrays or recarrays, so there are a couple of kinks.

First you need to be at least somewhat familiar with structured arrays. See here if you're not.

import numpy as np
import numpy.lib.recfunctions

# Define the starting arrays as structured arrays with two fields ('key' and 'field')
dtype = [('key', np.int), ('field', np.float)]
x = np.array([(1, 2),
             (2, 4),
             (3, 6),
             (4, np.NaN),
             (5, 10)],
             dtype=dtype)

y = np.array([(0, -5),
             (1, 0),
             (2, 5),
             (5, 20),
             (6, 25)],
             dtype=dtype)

# You want an outer join, rather than the default inner join
# (all values are returned, not just ones with a common key)
join = np.lib.recfunctions.join_by('key', x, y, jointype='outer')

# Now we have a structured array with three fields: 'key', 'field1', and 'field2'
# (since 'field' was in both arrays, it renamed x['field'] to 'field1', and
#  y['field'] to 'field2')

# This returns a masked array, if you want it filled with
# NaN's, do the following...
join.fill_value = np.NaN
join = join.filled()

# Just displaying it... Keep in mind that as a structured array,
#  it has one dimension, where each row contains the 3 fields
for row in join: 
    print row

This outputs:

(0, nan, -5.0)
(1, 2.0, 0.0)
(2, 4.0, 5.0)
(3, 6.0, nan)
(4, nan, nan)
(5, 10.0, 20.0)
(6, nan, 25.0)

Hope that helps!

Edit1: Added example Edit2: Really shouldn't join with floats... Changed 'key' field to an int.

edited May 5, 2010 at 17:44

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

answered May 5, 2010 at 16:22

Joe Kington

287k73 gold badges621 silver badges474 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

leon Over a year ago

Thanks for this insightful respond. For my stupidity, is there a simple way to convert structure array to ndarray? Thanks.

Joe Kington Over a year ago

@leon - Here's one way (using the "join" array in the example...): join.view(np.float).reshape((join.size,3)) Hope that helps!

leon Over a year ago

this actually doesnt work because the first column is casted as int. This is why I was asking.

Joe Kington Over a year ago

@leon - Woops! I tested it, but I had everything as floats... Hmm... As far as I know there isn't a catch-all way to convert structured arrays with a mixed (e.g. int & float) dtype back to a 2d numpy array of uniform dtype... Maybe it's best to make the 'key' back to being a float? You take a risk joining based on floats, but it should let you view things as a uniform 2d array... That's not a great answer, though...

Joe Kington Over a year ago

Well, this is ugly, but it works even with the mixed dtype... np.vstack([join[name] for name in join.dtype.names]).T

Collectives™ on Stack Overflow

merging indexed array in Python

1 Answer 1

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related