3

I have two data frames, I need to use the first dataframe to add a new column to second dataframe, Which has a value TRUE if it exists in first dataframe else FALSE.

The first dataframe has State and Region Name of University towns in USA

    State    RegionName
  0 Alabama  Auburn
  1 Alabama  Florence
  2 Alabama  Jacksonville
  3 Illinois Chicago

The second dataframe has growth rates per quarter. It is indexed on State and RegionName

                         2008q3         2008q4
State       RegionName                  
Alabama     Jacksonville 499766.666667  487933.333333
California  Los Angeles  469500.000000  443966.666667
Illinois    Chicago      232000.000000  227033.333333

So the output dataframe will be

                         2008q3         2008q4         univ_town
State       RegionName                  
Alabama     Jacksonville 499766.666667  487933.333333  TRUE
California  Los Angeles  469500.000000  443966.666667  FALSE
Illinois    Chicago      232000.000000  227033.333333  TRUE

Any help will be very much appreciated

3
  • 1
    This one is from the Coursera Introduction to Data Science... I just finished that course. You don't need to do what you describe, you can just perform an inner merge to obtain a university_towns dataframe and then take the difference between your all_towns and university_towns dataframes. Check out the pandas index.difference function Commented Dec 29, 2016 at 13:49
  • Sorry, I should have mentioned I have done that already, however I was trying to see if there is a more pythonic solution. Commented Dec 29, 2016 at 13:58
  • I understand, which is why I upvoted the answer. But added my solution as a comment in case you needed a quick fix :-) Commented Dec 29, 2016 at 14:07

1 Answer 1

5

One of various possible ways to do would be to use Index.isin method to check if the index keys corresponding to the multi-index DF2 are present as respective columns in DF1 across both the levels.

Then use np.where to do the assignment operation (True) after the & condition of the boolean mask created gets satisfied, else (False).

cond1 = df2.index.isin(df1['State'], level=0)   # Check level=0 and df1['State']
cond2 = df2.index.isin(df1['RegionName'], level=1) # Check level=1 and df1['RegionName']

df2.assign(univ_town=np.where(cond1 & cond2, 'TRUE', 'FALSE'))

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.