3

If I have 2 dataframes like these two:

import pandas as pd

df1 = pd.DataFrame({'Type':list('AABAC')})
df2 = pd.DataFrame({'Type':list('ABCDEF'), 'Value':[1,2,3,4,5,6]})

  Type
0    A
1    A
2    B
3    A
4    C

  Type  Value
0    A      1
1    B      2
2    C      3
3    D      4
4    E      5
5    F      6

I would like to add a column in df1 based on the values in df2. df2 only contains unique values, whereas df1 has multiple entries of each value. So the resulting df1 should look like this:

  Type Value
0    A     1
1    A     1
2    B     2
3    A     1
4    C     3

My actual dataframe df1 is quite long, so I need something that is efficient (I tried it in a loop but this takes forever).

5
  • by 'the values' do you just mean the column 'Value' in df2? Commented Aug 5, 2016 at 9:13
  • 2
    There are lots of similar questions and approaches to this, have you considered merge for instance? Commented Aug 5, 2016 at 9:14
  • Yes, exactly, based on the column 'Value' Commented Aug 5, 2016 at 9:15
  • pd.merge will probably do it for you then. Commented Aug 5, 2016 at 9:19
  • @EdChum In the actual dataframe df2 I have more than one column, but I need only the information from 1 of those. Furthermore, when using merge, the resulting dataframe is sorted by 'Type' which I don't want: df3 = pd.merge(df1, df2, on="Type") will sort df3.Type = A A A B C and not keep the initial order of df1 Commented Aug 5, 2016 at 9:21

3 Answers 3

4

As requested I am posting a solution that uses map without the need to create a temporary dict:

In[3]:
df1['Value'] = df1['Type'].map(df2.set_index('Type')['Value'])
df1

Out[3]: 
  Type  Value
0    A      1
1    A      1
2    B      2
3    A      1
4    C      3

This relies on a couple things, that the key values that are being looked up exist otherwise we get a KeyError and that we don't have duplicate entries in df2 otherwise setting the index raises InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Sign up to request clarification or add additional context in comments.

2 Comments

It may help that one can always remove duplicate entries on df2. For example, one can use: df2.drop_duplicates().reset_index(drop=True)
@JairoAlves the OP has no duplicates in df2, if it did you wouldn't be able to set the index to the Type column
2

You could create dict from your df2 with to_dict method and then map result to Type column for df1:

replace_dict = dict(df2.to_dict('split')['data'])

In [50]: replace_dict
Out[50]: {'A': 1, 'B': 2, 'C': 3, 'D': 4, 'E': 5, 'F': 6}

df1['Value'] = df1['Type'].map(replace_dict)

In [52]: df1
Out[52]:
  Type  Value
0    A      1
1    A      1
2    B      2
3    A      1
4    C      3

9 Comments

you could have set the index to 'Type' on df2 so df1['Value'] = df1['Type'].map(df2.set_index('Type')['Value']) would've worked also
@EdChum this works with the above example, but not with my full dataset pandas.core.index.InvalidIndexError: Reindexing only valid with uniquely valued Index objects
It helps us if you post a representative example to stop wasting our time trying to help you, if you post a simplistic example you get a simplistic answer. Post either a representative example or your real data
@anton this doesn't work with my full dataset, as df2 has several columns which are however not used (I probably should have mentioned this in the first place, sorry!)
@Cleb have posted my answer now
|
0

Another way to do this is by using the label based indexer loc. First use the Type column as the index using .set_index, then access using the df1 column, and reset the index to the original with .reset_index:

df2.set_index('Type').loc[df1['Type'],:].reset_index()

Either use this as your new df1 or extract the Value column:

df1['Value'] = df2.set_index('Type').loc[df1['Type'],:].reset_index()['Value']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.