1

I Have a pandas dataframe with int unique values from 0 to 4.

df.head()
            Labels
Date
2020-01-02       0
2020-01-03       0
2020-01-06       1
2020-01-07       2
2020-01-08       2

I have a numpy array

np_arr
array([[12., 17., 10.,  3.],
       [10., 23.,  9.,  6.],
       [16.,  9.,  5.,  9.],
       [17., 22., 14.,  9.],
       [19., 14., 10.,  8.]])

I Have another null dataframe with same shape as df.

df_final.head()
           col_0  col_1  col_2  col_3  
Date
2020-01-02    0.0    0.0    0.0    0.0 
2020-01-03    0.0    0.0    0.0    0.0 
2020-01-06    0.0    0.0    0.0    0.0 
2020-01-07    0.0    0.0    0.0    0.0 
2020-01-08    0.0    0.0    0.0    0.0 

I would like to use apply on df_final to replace row values from np_arr based on the Labels value of dataframe df.

For ex:

if df.values[0]=x:
    df_final.values[0]=np_arr[x]

Thank you for your help.

2 Answers 2

1

The code below should do the trick just fine and also is much more efficient rather than operating row-by-row.

df_final = pd.DataFrame(np_arr[df.Labels].reshape(df_final.shape[0], df_final.shape[1]), index=df_final.index, columns=df_final.columns)
Sign up to request clarification or add additional context in comments.

Comments

0

Here's another approach. You also added a dask tag, so the code at the end shows how to use dask.

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(5, size=(10,1)), columns=['Labels'])
np_arr = np.random.randint(1, 10, size=(5,4))
np_arr_to_df = pd.DataFrame(np_arr, columns=[f'col_{x}' for x in range(4)])

df_final = pd.merge(df, np_arr_to_df, how='left', left_on=['Labels'], right_index=True)
print(df_final)

#    Labels  col_0  col_1  col_2  col_3
# 0       0      6      2      5      1
# 1       4      8      8      3      2
# 2       2      8      5      5      1
# 3       1      2      3      1      3
# 4       0      6      2      5      1
# 5       4      8      8      3      2
# 6       1      2      3      1      3
# 7       0      6      2      5      1
# 8       4      8      8      3      2
# 9       0      6      2      5      1


import dask.dataframe as dd

ddf = dd.from_pandas(df, npartitions=2)
ddf_final = dd.merge(ddf, np_arr_to_df, how='left', left_on='Labels', right_index=True)

# if ddf_final is large, you do not want to use .compute() until necessary
print(ddf_final.compute())

#    Labels  col_0  col_1  col_2  col_3
# 0       0      6      2      5      1
# 1       4      8      8      3      2
# 2       2      8      5      5      1
# 3       1      2      3      1      3
# 4       0      6      2      5      1
# 5       4      8      8      3      2
# 6       1      2      3      1      3
# 7       0      6      2      5      1
# 8       4      8      8      3      2
# 9       0      6      2      5      1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.