72

I have two different arrays, one with strings and another with ints. I want to concatenate them, into one array where each column has the original datatype. My current solution for doing this (see below) converts the entire array into dtype = string, which seems very memory inefficient.

combined_array = np.concatenate((A, B), axis = 1)

Is it possible to mutiple dtypes in combined_array when A.dtype = string and B.dtype = int?

1
  • 5
    The question is about using a NumPy array. However, if having a NumPy array is not essential then a Pandas DataFrame would work well for this situation. Commented May 3, 2015 at 1:42

3 Answers 3

57

One approach might be to use a record array. The "columns" won't be like the columns of standard numpy arrays, but for most use cases, this is sufficient:

>>> a = numpy.array(['a', 'b', 'c', 'd', 'e'])
>>> b = numpy.arange(5)
>>> records = numpy.rec.fromarrays((a, b), names=('keys', 'data'))
>>> records
rec.array([('a', 0), ('b', 1), ('c', 2), ('d', 3), ('e', 4)], 
      dtype=[('keys', '|S1'), ('data', '<i8')])
>>> records['keys']
rec.array(['a', 'b', 'c', 'd', 'e'], 
      dtype='|S1')
>>> records['data']
array([0, 1, 2, 3, 4])

Note that you can also do something similar with a standard array by specifying the datatype of the array. This is known as a "structured array":

>>> arr = numpy.array([('a', 0), ('b', 1)], 
                      dtype=([('keys', '|S1'), ('data', 'i8')]))
>>> arr
array([('a', 0), ('b', 1)], 
      dtype=[('keys', '|S1'), ('data', '<i8')])

The difference is that record arrays also allow attribute access to individual data fields. Standard structured arrays do not.

>>> records.keys
chararray(['a', 'b', 'c', 'd', 'e'], 
      dtype='|S1')
>>> arr.keys
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'numpy.ndarray' object has no attribute 'keys'
Sign up to request clarification or add additional context in comments.

1 Comment

arr = np.array([('cat', 5), ('dog', 20)], dtype=[('name', np.object), ('age',np.int)]) name column can be accessed by arr['name'] in structured array
14

A simple solution: convert your data to object 'O' type

z = np.zeros((2,2), dtype='U2')
o = np.ones((2,1), dtype='O')
np.hstack([o, z])

creates the array:

array([[1, '', ''],
       [1, '', '']], dtype=object)

4 Comments

This causes all kinds of problems down the line if you actually want to do any meaningful operations on the slices of that array.
What kind of problems ?
@Astrid could you elaborate on your thoughts?
Suppose, for argument's sake, that you turned that into a dataframe. And then you wanted to filter objects in that dataframe say df.loc[(df.col == item)] well that would not work because when pandas does the filtering it expects all the items to be of the same type. So if, for example, you were to mix strings and integers in the same column then you would be comparing apples and oranges effectively. And hence pandas would throw an error.
3

Refering Numpy doc, there is a function named numpy.lib.recfunctions.merge_arraysfunction which can be used to merge numpy arrays in different data type into either structured array or record array.

Example:

>>> from numpy.lib import recfunctions as rfn
>>> A = np.array([1, 2, 3])
>>> B = np.array(['a', 'b', 'c'])
>>> b = rfn.merge_arrays((A, B))
>>> b
array([(1, 'a'), (2, 'b'), (3, 'c')], dtype=[('f0', '<i4'), ('f1', '<U1')])

For more detail please refer the link above.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.