76

I have a dataframe in which I would like to store 'raw' numpy.array:

df['COL_ARRAY'] = df.apply(lambda r: np.array(do_something_with_r), axis=1)

but it seems that pandas tries to 'unpack' the numpy.array.

Is there a workaround? Other than using a wrapper (see edit below)?

I tried reduce=False with no success.

EDIT

This works, but I have to use the 'dummy' Data class to wrap around the array, which is unsatisfactory and not very elegant.

class Data:
    def __init__(self, v):
        self.v = v

meas = pd.read_excel(DATA_FILE)
meas['DATA'] = meas.apply(
    lambda r: Data(np.array(pd.read_csv(r['filename'])))),
    axis=1
)

9 Answers 9

74

Use a wrapper around the numpy array i.e. pass the numpy array as list

a = np.array([5, 6, 7, 8])
df = pd.DataFrame({"a": [a]})

Output:

             a
0  [5, 6, 7, 8]

Or you can use apply(np.array) by creating the tuples i.e. if you have a dataframe

df = pd.DataFrame({'id': [1, 2, 3, 4],
                   'a': ['on', 'on', 'off', 'off'],
                   'b': ['on', 'off', 'on', 'off']})

df['new'] = df.apply(lambda r: tuple(r), axis=1).apply(np.array)

Output :

     a    b  id            new
0   on   on   1    [on, on, 1]
1   on  off   2   [on, off, 2]
2  off   on   3   [off, on, 3]
3  off  off   4  [off, off, 4]
df['new'][0]

Output :

array(['on', 'on', '1'], dtype='<U2')
Sign up to request clarification or add additional context in comments.

5 Comments

That works, but then I'd rather use a dummy class instead of a list.
Does that work if instead of tuple(r) you do something like np.array([[1,2],[3,4]]), ie. a 2-dim array?
tuple(r) works with 2D array too. Did you mean replacing tuple(r) with 2D np.array?
Yes. I mean, I understand your solution, and it works, but what if I want to have a 2D np.array in the new column (and not a 1D array as shown)?
Can you add the expected output in your question? All the elements in the row should be of numpy array if you want to create a new 2D array. My solution works in that case. If its mixed type you have to use if else first to make it a numpy array.
33

If you first set a column to have type object, you can insert an array without any wrapping:

df = pd.DataFrame(columns=[1])
df[1] = df[1].astype(object)
df.loc[1, 1] = np.array([5, 6, 7, 8])
df

Output:

    1
1   [5, 6, 7, 8]

1 Comment

7

You can wrap the Data Frame data args in square brackets to maintain the np.array in each cell:

one_d_array = np.array([1,2,3])
two_d_array = one_d_array*one_d_array[:,np.newaxis]
two_d_array

array([[1, 2, 3],
       [2, 4, 6],
       [3, 6, 9]])


pd.DataFrame([
    [one_d_array],
    [two_d_array] ])

                                   0
0                          [1, 2, 3]
1  [[1, 2, 3], [2, 4, 6], [3, 6, 9]]

3 Comments

Those become lists no?
@javadba I set the above output to df and see print(type(df.iloc[0,0])) --> <class 'numpy.ndarray'> This is Pandas 0.23.0. Are you seeing something different with another version?
You are correct actually - one of the more highly upvoted answers made it sound like the ndarray would be converted to a list. I just tested this. without the brackets we get rows in the df and with brackets we get ndarrays as you say
2

Preset the type of your column to object, this will allow you to store a NumPy array as-is:

df['COL_ARRAY'] = pd.Series(dtype='object')

df['COL_ARRAY'] = df.apply(lambda r: np.array(do_something_with_r), axis=1)

Comments

1

choose eval buildin function is easy to use and easy to read.

# First ensure use object store str
df['col2'] = self.df['col2'].astype(object)
# read
arr_obj = eval(df.at[df[df.col_1=='xyz'].index[0], 'col2']))
# write
df.at[df[df.col_1=='xyz'].index[0], 'col2'] = str(arr_obj)

real store display perfect human readable value:

col_1,  col_2
xyz,    "['aaa', 'bbb', 'ccc', 'ddd']"

1 Comment

Using eval() is usually not a good idea. Transforming freely between strings and code is insecure and confuses the IDE/linter/type-checker/other people reading your code. Consider using the other methods mentioned in this post.
1

Suppose you have a DataFrame ds and it has a column named 'class'. If ds['class'] contains strings or numbers, and you want to change them with numpy.ndarrays or lists, the following code would help. In the code, class2vector is a numpy.ndarray or list and ds_class is a filter condition.

ds['class'] = ds['class'].map(lambda x: class2vector if (isinstance(x, str) and (x == ds_class)) else x)

Comments

0

Just wrap what you want to store in a cell to a list object through first apply, and extract it by index 0of that list through second apply:

import pandas as pd
import numpy as np

df = pd.DataFrame({'id': [1, 2, 3, 4],
                   'a': ['on', 'on', 'off', 'off'],
                   'b': ['on', 'off', 'on', 'off']})


df['new'] = df.apply(lambda x: [np.array(x)], axis=1).apply(lambda x: x[0])

df

output:

    id  a       b       new
0   1   on      on      [1, on, on]
1   2   on      off     [2, on, off]
2   3   off     on      [3, off, on]
3   4   off     off     [4, off, off]

Comments

0

Here goes my 2 cents contribution (tested on Python 3.7):

import pandas as pd
import numpy as np

dataArray = np.array([0.0, 1.0, 2.0])
df = pd.DataFrame()
df['User Col A'] = [1]
df['Array'] = [dataArray]

Comments

0

If you only want some of the columns you could do something lie this. Taking the example of @allenyllee,

df = pd.DataFrame({'id': [1, 2, 3, 4],
                   'a': ['on', 'on', 'off', 'off'],
                   'b': ['on', 'off', 'on', 'off']})

df['new'] = df[['a','b']].apply(lambda x: np.array(x), axis=1)

which outputs

   id    a    b         new
0   1   on   on    [on, on]
1   2   on  off   [on, off]
2   3  off   on   [off, on]
3   4  off  off  [off, off]

you can also change the order of [['a', 'b']] if you need a specific order.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.