15

I'm trying to apply a function to a pandas dataframe, such a function required two np.array as input and it fit them using a well defined model.

The point is that I'm not able to apply this function starting from the selected columns since their "rows" contain list read from a JSON file and not np.array.

Now, I've tried different solutions:

#Here is where I discover the problem

train_df['result'] = train_df.apply(my_function(train_df['col1'],train_df['col2']))

#so I've tried to cast the Series before passing them to the function in both these ways:

X_col1_casted = trai_df['col1'].dtype(np.array)
X_col2_casted = trai_df['col2'].dtype(np.array)

doesn't work.

X_col1_casted = trai_df['col1'].astype(np.array)
X_col2_casted = trai_df['col2'].astype(np.array)

doesn't work.

X_col1_casted = trai_df['col1'].dtype(np.array)
X_col2_casted = trai_df['col2'].dtype(np.array)

does'nt work.

What I'm thinking to do now is a long procedure like:

starting from the uncasted column-series, convert them into list(), iterate on them apply the function to the np.array() single elements, and append the results into a temporary list. Once done I will convert this list into a new column. ( clearly, I don't know if it will work )

Does anyone of you know how to help me ?

EDIT: I add one example to be clear:

The function assume to have as input two np.arrays. Now it has two lists since they are retrieved form a json file. The situation is this one:

col1        col2    result
[1,2,3]     [4,5,6]  [5,7,9]
[0,0,0]     [1,2,3]  [1,2,3]

Clearly the function is not the sum one, but a own function. For a moment assume that this sum can work only starting from arrays and not form lists, what should I do ?

Thanks in advance

5
  • Use the .values attribute to convert it into an array. Commented Sep 21, 2016 at 13:59
  • may you also tell me how ? I need to use it to single cell elements, not to the whole columns in one shot. I need one array per row. Commented Sep 21, 2016 at 14:00
  • what do you mean one array per row? I understood from the question that you want to convert a whole column to a numpy array. Commented Sep 21, 2016 at 14:03
  • I've edited the question with an example. The functoin that work per row, assume to have as input np.array and not lists. That's the point. Hoping to be clear now. Commented Sep 21, 2016 at 14:10
  • I actually have the opposite requirement, My pandas dataframe have numpy.ndarray that I want to convert to list so that It cant be stored into DynamoDB table. Does anyone have any inputs on how can I do that Commented Nov 17, 2022 at 18:37

2 Answers 2

29

Use apply to convert each element to it's equivalent array:

df['col1'] = df['col1'].apply(lambda x: np.array(x))

type(df['col1'].iloc[0])
numpy.ndarray

Data:

df = pd.DataFrame({'col1': [[1,2,3],[0,0,0]]})
df

Image

Sign up to request clarification or add additional context in comments.

2 Comments

df['col1'] = df['col1'].apply(np.array) works as well
I came here because I wanted to get a one big np.array from a column of type list (i.e. no pandas types at all). For those who want that you can do this: np.stack(df['col1']) (e.g. necessary for keras)
0

You can apply pd.Series on top of the list. e.g.,

>>> X_train = df.col1.apply(pd.Series).to_numpy()

>>> type(X_train)
numpy.ndarray

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.