0

the first dataframe is:

   data_date cookie_type   dau  next_dau  dau_7  dau_15
0   20181006    avg(0-d)  2288       NaN    NaN     NaN
1   20181006    avg(e-f)  2284       NaN    NaN     NaN
2   20181007    avg(e-f)  2296       NaN    NaN     NaN

the second dataframe is :

  data_date cookie_type  next_dau
0  20181006    avg(e-f)       908
1  20181006    avg(0-d)       904

how to update the first dataframe's next_dau from the second one i have tried combine_first and fillna, they seem not support multi-index:

cols = ['data_date', 'cookie_type']

    if (frame1 is not None and not frame1.empty):
        frame1.set_index(cols)
        print(frame1)
        print(next_day_dau)
        frame1.combine_first(next_day_dau.set_index(cols))
        frame1.combine_first(dau_7.set_index(cols))
        frame1.combine_first(dau_15.set_index(cols))

finally i solved this problem with help from "tianhua liao":

            frame1.index = frame1.data_date.astype(str) + frame1.cookie_type
            next_day_dau.index = next_day_dau.data_date.astype(str) + next_day_dau.cookie_type
            dau_7.index = dau_7.data_date.astype(str) + dau_7.cookie_type
            dau_15.index = dau_15.data_date.astype(str) + dau_15.cookie_type
            # get_index
            next_day_dau_idx = frame1.index.isin(next_day_dau.index)
            dau_7_idx = frame1.index.isin(dau_7.index)
            dau_15_idx = frame1.index.isin(dau_15.index)
            #
            if any(next_day_dau_idx):
                frame1.loc[next_day_dau_idx, "next_dau"] = next_day_dau.next_dau
            if any(dau_7_idx):
                frame1.loc[dau_7_idx, "dau_7"] = dau_7.dau_7
            if any(dau_15_idx):
                frame1.loc[dau_15_idx, "dau_15"] = dau_15.dau_15

1 Answer 1

3

Multi-index is a complicated one.

Here is a simple way to solve it.

frame1.index = frame1.data_date.astype(str) + frame1.cookie_type
frame2.index = frame2.data_date.astype(str) + frame2.cookie_type

frame1.loc[frame2.index,"next_dau"] = frame2.next_dau

After processing completed, you could remove the index.

Sign up to request clarification or add additional context in comments.

3 Comments

if frame2 dose not contain the index it raise a error:"Index(['20180930avg(0-d)'], dtype='object') not in index"
It may occur at latest pandas, .loc have raise a warning about with any missing label will raise KeyError in the future. My pandas version is 0.23.0. It also have some advise about this. You could use frame1 = frame1.reindex(frame2.index) first, after that, you could use frame1.loc[frame2.index,"next_dau"] = frame2.next_dau.
i use "index.isin" to do this next_day_dau_idx = frame1.index.isin(next_day_dau.index) if any(next_day_dau_idx): frame1.loc[next_day_dau_idx, "next_dau"] = next_day_dau.next_dau

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.