Use pandas dataframe apply to replace row values from a numpy array

Question

I Have a pandas dataframe with int unique values from 0 to 4.

df.head()
            Labels
Date
2020-01-02       0
2020-01-03       0
2020-01-06       1
2020-01-07       2
2020-01-08       2

I have a numpy array

np_arr
array([[12., 17., 10.,  3.],
       [10., 23.,  9.,  6.],
       [16.,  9.,  5.,  9.],
       [17., 22., 14.,  9.],
       [19., 14., 10.,  8.]])

I Have another null dataframe with same shape as df.

df_final.head()
           col_0  col_1  col_2  col_3  
Date
2020-01-02    0.0    0.0    0.0    0.0 
2020-01-03    0.0    0.0    0.0    0.0 
2020-01-06    0.0    0.0    0.0    0.0 
2020-01-07    0.0    0.0    0.0    0.0 
2020-01-08    0.0    0.0    0.0    0.0

I would like to use apply on df_final to replace row values from np_arr based on the Labels value of dataframe df.

For ex:

if df.values[0]=x:
    df_final.values[0]=np_arr[x]

Thank you for your help.

nizarcan · Accepted Answer · 2021-03-01 14:24:24Z

1

The code below should do the trick just fine and also is much more efficient rather than operating row-by-row.

df_final = pd.DataFrame(np_arr[df.Labels].reshape(df_final.shape[0], df_final.shape[1]), index=df_final.index, columns=df_final.columns)

edited Mar 1, 2021 at 14:24

answered Mar 1, 2021 at 14:01

nizarcan

5353 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

SultanOrazbayev · Accepted Answer · 2021-03-01 14:14:00Z

Here's another approach. You also added a dask tag, so the code at the end shows how to use dask.

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(5, size=(10,1)), columns=['Labels'])
np_arr = np.random.randint(1, 10, size=(5,4))
np_arr_to_df = pd.DataFrame(np_arr, columns=[f'col_{x}' for x in range(4)])

df_final = pd.merge(df, np_arr_to_df, how='left', left_on=['Labels'], right_index=True)
print(df_final)

#    Labels  col_0  col_1  col_2  col_3
# 0       0      6      2      5      1
# 1       4      8      8      3      2
# 2       2      8      5      5      1
# 3       1      2      3      1      3
# 4       0      6      2      5      1
# 5       4      8      8      3      2
# 6       1      2      3      1      3
# 7       0      6      2      5      1
# 8       4      8      8      3      2
# 9       0      6      2      5      1


import dask.dataframe as dd

ddf = dd.from_pandas(df, npartitions=2)
ddf_final = dd.merge(ddf, np_arr_to_df, how='left', left_on='Labels', right_index=True)

# if ddf_final is large, you do not want to use .compute() until necessary
print(ddf_final.compute())

#    Labels  col_0  col_1  col_2  col_3
# 0       0      6      2      5      1
# 1       4      8      8      3      2
# 2       2      8      5      5      1
# 3       1      2      3      1      3
# 4       0      6      2      5      1
# 5       4      8      8      3      2
# 6       1      2      3      1      3
# 7       0      6      2      5      1
# 8       4      8      8      3      2
# 9       0      6      2      5      1

Collectives™ on Stack Overflow

Use pandas dataframe apply to replace row values from a numpy array

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related