Store different datatypes in one NumPy array?

Question

I have two different arrays, one with strings and another with ints. I want to concatenate them, into one array where each column has the original datatype. My current solution for doing this (see below) converts the entire array into dtype = string, which seems very memory inefficient.

combined_array = np.concatenate((A, B), axis = 1)

Is it possible to mutiple dtypes in combined_array when A.dtype = string and B.dtype = int?

The question is about using a NumPy array. However, if having a NumPy array is not essential then a Pandas DataFrame would work well for this situation. — crayzeewulf
– crayzeewulf, Commented May 3, 2015 at 1:42

Nate Anderson · Accepted Answer · 2023-06-26 22:49:35Z

57

One approach might be to use a record array. The "columns" won't be like the columns of standard numpy arrays, but for most use cases, this is sufficient:

>>> a = numpy.array(['a', 'b', 'c', 'd', 'e'])
>>> b = numpy.arange(5)
>>> records = numpy.rec.fromarrays((a, b), names=('keys', 'data'))
>>> records
rec.array([('a', 0), ('b', 1), ('c', 2), ('d', 3), ('e', 4)], 
      dtype=[('keys', '|S1'), ('data', '<i8')])
>>> records['keys']
rec.array(['a', 'b', 'c', 'd', 'e'], 
      dtype='|S1')
>>> records['data']
array([0, 1, 2, 3, 4])

Note that you can also do something similar with a standard array by specifying the datatype of the array. This is known as a "structured array":

>>> arr = numpy.array([('a', 0), ('b', 1)], 
                      dtype=([('keys', '|S1'), ('data', 'i8')]))
>>> arr
array([('a', 0), ('b', 1)], 
      dtype=[('keys', '|S1'), ('data', '<i8')])

The difference is that record arrays also allow attribute access to individual data fields. Standard structured arrays do not.

>>> records.keys
chararray(['a', 'b', 'c', 'd', 'e'], 
      dtype='|S1')
>>> arr.keys
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'numpy.ndarray' object has no attribute 'keys'

edited Jun 26, 2023 at 22:49

Nate Anderson

21.6k22 gold badges113 silver badges154 bronze badges

answered Jul 3, 2012 at 11:41

senderle

152k36 gold badges218 silver badges244 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Bharath Ram Over a year ago

arr = np.array([('cat', 5), ('dog', 20)], dtype=[('name', np.object), ('age',np.int)]) name column can be accessed by arr['name'] in structured array

astromancer · Accepted Answer · 2017-05-18 21:47:41Z

14

A simple solution: convert your data to object 'O' type

z = np.zeros((2,2), dtype='U2')
o = np.ones((2,1), dtype='O')
np.hstack([o, z])

creates the array:

array([[1, '', ''],
       [1, '', '']], dtype=object)

answered May 18, 2017 at 21:47

astromancer

62110 silver badges21 bronze badges

4 Comments

Astrid Over a year ago

This causes all kinds of problems down the line if you actually want to do any meaningful operations on the slices of that array.

matthieu Over a year ago

What kind of problems ?

flow2k Over a year ago

@Astrid could you elaborate on your thoughts?

Astrid Over a year ago

Suppose, for argument's sake, that you turned that into a dataframe. And then you wanted to filter objects in that dataframe say df.loc[(df.col == item)] well that would not work because when pandas does the filtering it expects all the items to be of the same type. So if, for example, you were to mix strings and integers in the same column then you would be comparing apples and oranges effectively. And hence pandas would throw an error.

lX-Xl · Accepted Answer · 2020-03-09 23:05:26Z

3

Refering Numpy doc, there is a function named numpy.lib.recfunctions.merge_arraysfunction which can be used to merge numpy arrays in different data type into either structured array or record array.

Example:

>>> from numpy.lib import recfunctions as rfn
>>> A = np.array([1, 2, 3])
>>> B = np.array(['a', 'b', 'c'])
>>> b = rfn.merge_arrays((A, B))
>>> b
array([(1, 'a'), (2, 'b'), (3, 'c')], dtype=[('f0', '<i4'), ('f1', '<U1')])

For more detail please refer the link above.

answered Mar 9, 2020 at 23:05

lX-Xl

1601 silver badge8 bronze badges

Collectives™ on Stack Overflow

Store different datatypes in one NumPy array?

3 Answers 3

1 Comment

4 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related