Python Pandas merge only certain columns

Question

Is it possible to only merge some columns? I have a DataFrame df1 with columns x, y, z, and df2 with columns x, a ,b, c, d, e, f, etc.

I want to merge the two DataFrames on x, but I only want to merge columns df2.a, df2.b - not the entire DataFrame.

The result would be a DataFrame with x, y, z, a, b.

I could merge then delete the unwanted columns, but it seems like there is a better method.

Andy: Holy cow that was easy...I need a break, I'm obviously making this too complicated. Thanks for the clarity! — BubbleGuppies
– BubbleGuppies, Commented Jul 31, 2013 at 19:07

Arthur D. Howland · Accepted Answer · 2017-03-13 14:18:52Z

301

You want to use TWO brackets, so if you are doing a VLOOKUP sort of action:

df = pd.merge(df,df2[['Key_Column','Target_Column']],on='Key_Column', how='left')

This will give you everything in the original df + add that one corresponding column in df2 that you want to join.

answered Mar 13, 2017 at 14:18

Arthur D. Howland

4,6473 gold badges23 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Gathide Over a year ago

Can Target_Column be a list of columns?

rmmariano Over a year ago

I believe this should be the accepted answer. @BubbleGuppies

Cornelius Roemer Over a year ago

@Gathide Yes, there can be multiple target columns like df2[['key','target1','target2']]

Andyrey Over a year ago

What are df, 'Key_Column','Target_Column' , why the answer is not in terms of the question?

Mitchell Leefers Over a year ago

@Andyrey The columns inside the double brackets are all of the columns you are using from the data frame you are merging in. You could have any number of 'Key_Columns' and 'Target_Columns'. You want to make sure you include any columns you want to match values on as well as merge into the new data frame.

beroe · Accepted Answer · 2015-10-27 07:05:27Z

109

You could merge the sub-DataFrame (with just those columns):

df2[list('xab')]  # df2 but only with columns x, a, and b

df1.merge(df2[list('xab')])

edited Oct 27, 2015 at 7:05

beroe

12.4k6 gold badges40 silver badges82 bronze badges

answered Jul 31, 2013 at 18:46

Andy Hayden

378k110 gold badges640 silver badges546 bronze badges

4 Comments

Andy Hayden Over a year ago

Hmmm, I wonder if there should be a native way to do this, like subset in dropna... will put together github issue

CoolDocMan Over a year ago

Hmmm ... I tried using this to merge column 'Unique_External_Users' from df2 to df1 but got an error ... "None of [Index(['U', 'n', 'i', 'q', 'u', 'e', '', 'E', 'x', 't', 'e', 'r', 'n', 'a',\n 'l', '', 'U', 's', 'e', 'r', 's'],\n dtype='object')] are in the [columns]" .

CoolDocMan Over a year ago

Here's the code . ... df1.merge(df2('Unique_External_Users')])

SOf_PUAR Over a year ago

@CoolDocMan I think you missed something from the proposed answer: list('xab') takes each element (letter) of the string 'xab' and converts it to a list element so list('xab') returns ['x', 'a', 'b']. That works if each column has a single letter as a name. In your case I think you need to do df1.merge(df2['Unique_External_Users'], *other_arguments). ...Most probably you already solved it by now, just leaving this for newbies around, like me

tonneofash · Accepted Answer · 2022-02-08 14:41:18Z

48

If you want to drop column(s) from the target data frame, but the column(s) are required for the join, you can do the following:

df1 = df1.merge(df2[['a', 'b', 'key1']], how = 'left',
                left_on = 'key2', right_on = 'key1').drop(columns = ['key1'])

The .drop(columns = 'key1') part will prevent 'key1' from being kept in the resulting data frame, despite it being required to join in the first place.

edited Feb 8, 2022 at 14:41

answered Oct 14, 2019 at 10:14

tonneofash

6797 silver badges14 bronze badges

5 Comments

Tanya Branagan Over a year ago

I get the following error if I try this: KeyError: "['key1'] not found in axis"

psangam Over a year ago

try .drop(columns= ['key1'])

tonneofash Over a year ago

Or .drop('key1', axis = 1)

maciejwww Over a year ago

or shorter: .drop('key1', 1)

pas-calc Over a year ago

Very good point. This is different than in SQL where you can SELECT df1..., df1.,, df2.a, df2.b FROM df1 LEFT JOIN df2 ON df1.key2=df2.key1 (whereby not needed to selecting either df1.key2 or df2.key1) Also good idea to replace the original df1 with the new df1 so that this will simply add new columns based on merge.

Ajean · Accepted Answer · 2016-12-22 01:09:53Z

12

You can use .loc to select the specific columns with all rows and then pull that. An example is below:

pandas.merge(dataframe1, dataframe2.iloc[:, [0:5]], how='left', on='key')

In this example, you are merging dataframe1 and dataframe2. You have chosen to do an outer left join on 'key'. However, for dataframe2 you have specified .iloc which allows you to specific the rows and columns you want in a numerical format. Using :, your selecting all rows, but [0:5] selects the first 5 columns. You could use .loc to specify by name, but if your dealing with long column names, then .iloc may be better.

edited Dec 22, 2016 at 1:09

Ajean

5,68814 gold badges54 silver badges72 bronze badges

answered Dec 14, 2016 at 20:33

Terrance DeJesus

2413 silver badges7 bronze badges

1 Comment

smci Over a year ago

Beware that .loc will make a copy, and on a large df that can be painful. It might be better to merge then immediately take a column slice in the same expression.

nick · Accepted Answer · 2019-01-14 23:19:22Z

10

This is to merge selected columns from two tables.

If table_1 contains t1_a,t1_b,t1_c..,id,..t1_z columns, and table_2 contains t2_a, t2_b, t2_c..., id,..t2_z columns, and only t1_a, id, t2_a are required in the final table, then

mergedCSV = table_1[['t1_a','id']].merge(table_2[['t2_a','id']], on = 'id',how = 'left')
# save resulting output file    
mergedCSV.to_csv('output.csv',index = False)

edited Jan 14, 2019 at 23:19

nick

1,1602 gold badges11 silver badges26 bronze badges

answered May 22, 2017 at 21:48

Marco167

3913 silver badges8 bronze badges

Comments

Cornelius Roemer · Accepted Answer · 2021-07-07 16:05:59Z

3

Slight extension of the accepted answer for multi-character column names, using inner join by default:

df1 = df1.merge(df2[["Key_Column", "Target_Column1", "Target_Column2"]])

This assumes that Key_Column is the only column both dataframes have in common.

answered Jul 7, 2021 at 16:05

Cornelius Roemer

10.3k6 gold badges62 silver badges121 bronze badges

Comments

Youssuf Abdula · Accepted Answer · 2025-05-08 10:12:09Z

0

tabela_contratos=pd.read_excel("Contratos.xlsx") 

tabela_emails = pd.read_excel("Emails.xlsx") 

#realizar o merge 

tabela_final=pd.merge(
    tabela_contratos,
    tabela_emails,
    left_on="Cliente",
    right_on="Nome",
    how="inner" 
)

display(tabela_final)

answered May 8 at 10:12

Youssuf Abdula

12 bronze badges

1 Comment

Dalija Prasnikar May 8 at 13:42

While this code may solve the question, including an explanation of how and why this solves the problem would really help to improve the quality of your post, and probably result in more up-votes. Remember that you are answering the question for readers in the future, not just the person asking now. Please edit your answer to add explanations and give an indication of what limitations and assumptions apply. Also, Stack Overflow is an English-only site. Thus, all answers must be written in English.

Collectives™ on Stack Overflow

Python Pandas merge only certain columns

7 Answers 7

5 Comments

4 Comments

5 Comments

1 Comment

Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

5 Comments

4 Comments

5 Comments

1 Comment

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related