I'm trying to combine these three arrays into the one below. Basically the equivalent of a SQL outer join (where the 'pos' field is the key/index)
a1 = array([('2:6506', 4.6725971801473496e-25, 0.99999999995088695),
('2:6601', 2.2452745388799898e-27, 0.99999999995270605),
('2:21801', 1.9849650921836601e-31, 0.99999999997999001),],
dtype=[('pos', '|S100'), ('col1', '<f8'), ('col2', '<f8')])
a2 = array([('3:6506', 4.6725971801473496e-25, 0.99999999995088695),
('3:6601', 2.2452745388799898e-27, 0.99999999995270605),
('3:21801', 1.9849650921836601e-31, 0.99999999997999001),],
dtype=[('pos', '|S100'), ('col1', '<f8'), ('col2', '<f8')])
a3 = array([('2:6506', 4.6725971801473496e-25, 0.99999999995088695),
('2:6601', 2.2452745388799898e-27, 0.99999999995270605),
('2:21801', 1.9849650921836601e-31, 0.99999999997999001),],
dtype=[('pos', '|S100'), ('col3', '<f8'), ('col4', '<f8')])
Desired result:
array([('2:6506', 4.6725971801473496e-25, 0.99999999995088695, 4.6725971801473496e-25, 0.99999999995088695),
('2:6601', 2.2452745388799898e-27, 0.99999999995270605, 2.2452745388799898e-27, 0.99999999995270605),
('2:21801', 1.9849650921836601e-31, 0.99999999997999001, 1.9849650921836601e-31, 0.99999999997999001),
('3:6506', 4.6725971801473496e-25, 0.99999999995088695, NaN, NaN),
('3:6601', 2.2452745388799898e-27, 0.99999999995270605, NaN, NaN),
('3:21801', 1.9849650921836601e-31, 0.99999999997999001, NaN, NaN),
],
dtype=[('pos', '|S100'), ('col1', '<f8'), ('col2', '<f8'), ('col3', '<f8'), ('col4', '<f8')])
I think this answer might be on the right track, I just can't quite see how to apply it.
Update:
I tried running unutbu's answer but I'm getting this error:
Traceback (most recent call last):
File "fail2.py", line 21, in <module>
a4 = recfunctions.join_by('pos', a4, a, jointype='outer')
File "/usr/local/msg/lib/python2.6/site-packages/numpy/lib/recfunctions.py", line 973, in join_by
current = output[f]
File "/usr/local/msg/lib/python2.6/site-packages/numpy/ma/core.py", line 2943, in __getitem__
dout = ndarray.__getitem__(_data, indx)
ValueError: field named col12 not found.
Update 2
I only got this error on numpy 1.5.1. I upgraded to 1.8.1 and it went away.