1

My question is an expansion of the one answered here:

Adding two pandas dataframes

Assuming the same dataframes but with a new column containing strings.

import pandas as pd

df1 = pd.DataFrame([('Dog',1,2),('Cat',3,4),('Rabbit',5,6)], columns=['Animal','a','b'])

df2 = pd.DataFrame([('Dog',100,200),('Cat',300,400),('Rabbit',500,600)], columns=['Animal','a','b']

Using the solution would create this:

df_add = df1.add(df2, fill_value=0)

Out: 

       Animal        a    b
    0  DogDog       101  202
    1  CatCat       303  404
    2  RabbitRabbit 505  606

A potential solution could be just to index the Animal column and then run the .add function and then unindex the animal column again. But is there a more simple way that just adjusts this formula df_add = df1.add(df2, fill_value=0)so that the following solution is given:

Out: 

       Animal     a    b
    0  Dog       101  202
    1  Cat       303  404
    2  Rabbit    505  606

I tried df_add.iloc[:,1:] = df1.iloc[:,1:].add(df2.iloc[:,1:], fill_value=0) and it did not work.

3
  • what if you had "Parrot" in place of rabbit in df1? What should be the expected result? Commented Aug 1, 2024 at 8:30
  • 1
    Setting the keys temporarily as index is the best approach IMO. Commented Aug 1, 2024 at 8:34
  • Also, can you have duplicated keys in either input? If yes, what should happen? Commented Aug 1, 2024 at 8:39

2 Answers 2

1

Your question is not fully clear. Pandas performs addition operations after aligning the indexes (index + columns), thus if you want to ensure Dog is added to Dog irrespective of its position, setting the index is the way to go:

key = ['Animal']

out = df1.set_index(key).add(df2.set_index(key), fill_value=0).reset_index()

Output:

   Animal    a    b
0     Dog  101  202
1     Cat  303  404
2  Rabbit  505  606

This will ensure that there is no mismatch. For example:

df1 = pd.DataFrame([('Cat',3,4),('Dog',1,2),('Parrot',8,9)], columns=['Animal','a','b'])
df2 = pd.DataFrame([('Dog',100,200),('Cat',300,400),('Rabbit',800,00)], columns=['Animal','a','b'])
                   
key = ['Animal']
df_add = df1.set_index(key).add(df2.set_index(key), fill_value=0).reset_index()

Output:

   Animal      a      b
0     Cat  303.0  404.0
1     Dog  101.0  202.0
2  Parrot    8.0    9.0
3  Rabbit  500.0  600.0

Now, if your DataFrames are already aligned. I.e. the Animals are in the same order and the DataFrame indices are identical. You could use quick trick to ignore the Animal column: set that of df2 as empty string:

df_add = df1.add(df2.assign(Animal=''))

Output:

   Animal    a    b
0     Dog  101  202
1     Cat  303  404
2  Rabbit  505  606

This is however risky if you're not fully sure if the animals and DataFrame indices are identical.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you, in my case i am sure the indices are identical. But for safety i will use your 1st suggestion.
1

I would recommend you to keep the columns for the key and the aggregation as separate variables so the code would be prettier and more flexible:

key_columns = ['Animal']
sum_columns = ['a', 'b']

After that all you need to do is to concatenate the dataframes and aggregate them by the key columns:

pd.concat([df1, df2], ignore_index=True) \
    .fillna(0) \
    .groupby(key_columns) \
    .agg({col: 'sum' for col in sum_columns}) \
    .reset_index()

In case if you don't want to list all the columns in sum_columns, you can calculate it from all column names in df1.columns and key_columns like this, for example:

key_columns = ['Animal']
sum_columns = list(set(df1.columns) - set(key_columns))

Enjoy!

PS: fillna(0) fills NaN values with 0, reset_index() restores Animal as a column, because it became an index after groupby...agg procedure.

2 Comments

Thank you i think this is also a very useful solution, i like it. My only concern is that if you have a scenario where the number of columns is large (>10) writing out all the column names in sum_columns may not be practical.
@Dinks123 I edited the answer. You can calculate aggregation names as it mentioned.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.