how to update values in a DataFrame based on values in another DataFrame?

Question

Suppose I have the following DataFrames:

Containers:

Key ContainerCode       Quantity
1   P-A1-2097-05-B01    0
2   P-A1-1073-13-B04    0
3   P-A1-2024-09-H05    0
5   P-A1-2018-08-C05    0
6   P-A1-2089-03-C08    0
7   P-A1-3033-16-H07    0
8   P-A1-3035-18-C02    0
9   P-A1-4008-09-G01    0

Inventory:

Key SKU     ContainerCode       Quantity
1   22-3-1  P-A1-4008-09-G01    1
2   2132-12 P-A1-3033-16-H07    55
3   222-12  P-A1-4008-09-G01    3
4   4561-3  P-A1-3083-12-H01    126

How do I update the Quantity values in Containers to reflect the number of units in each container based on the information in Inventory? Note that multiple SKUs can reside in a single ContainerCode, so we need to add to the quantity, rather than just replace it, and there may be multiple entries in Containers for a particular ContainerCode.

What are the possible ways to accomplish this, and what are their relative pros and cons?

EDIT

The following code seems to serve as a good test case:

import itertools
import pandas as pd
import numpy as np

inventory = pd.DataFrame({'Container Code':['A1','A2','A2','A4'],
                               'Quantity':[10,87,2,44],
                               'SKU':['123-456','234-567','345-678','456-567']})

containers = pd.DataFrame({'Container Code':['A1','A2','A3','A4'],
                               'Quantity':[2,0,8,4],
                               'Path Order':[1,2,3,4]})

summedInventory = inventory.groupby('Container Code')['Quantity'].sum()

print('Containers Data Frame')
print(containers)
print('\nInventory Data Frame')
print(inventory)
print('\nSummed Inventory List')
print(summedInventory)
print('\n')

newContainers = containers.drop('Quantity', axis=1). \
     join(inventory.groupby('Container Code').sum(), on='Container Code')
print(newContainers)

This seems to produce the desired output.

I also tried using a regular merge:

pd.merge(containers.drop('Quantity', axis=1), \
    summedInventory,how='inner',left_on='Container Code', right_index=True)

But that produces an 'IndexError: list index out of range'

Any ideas?

Using pd.merge(containers.drop('Quantity', axis=1), summedInventory,how='inner', left_on='Container Code', right_on='Container Code') produces KeyError: 'Container Code' The desire to use the pd.merge() comes from the possibility (actually the fact, in my real world application of this) that the inventory DF can have different column names than the container DF. — PTTHomps
– PTTHomps, Commented Nov 7, 2014 at 16:59
Looks like the error with the pd.merge() is coming from the line summedInventory = inventory.groupby('Container Code')['Quantity'].sum() If we delete the ['Quantity'] portion, it goes through just fine. — PTTHomps
– PTTHomps, Commented Nov 7, 2014 at 17:36

tktk · Accepted Answer · 2014-11-06 20:19:36Z

2

I hope I got your scenario correctly. I think you can use:

containers.drop('Quantity', axis = 1).\
           join(inventory.groupby('ContainerCode').sum(), \
                on = 'ContainerCode')

I'm first dropping quantity from containers because you don't need it - we'll create it from inventory.
Then, we group by inventory by the container code, to sum the quantity relevant to each container.
We then perform the join between the two, and each containercode existent in containers would recieve the summed quantity from inventory

answered Nov 6, 2014 at 20:19

tktk

11.8k8 gold badges61 silver badges75 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

PTTHomps Over a year ago

Will this modify the original DataFrame, or create a new instance?

tktk Over a year ago

@TraxusIV This will create a new one. You can do the drop in place, but to the best of my knowledge you can't join in place.

PTTHomps Over a year ago

It works, but how would we handle a case where the column names are different between the left and right DataFrames?

tktk Over a year ago

@TraxusIV your comments on your question seems like you've solved it, correct?

Collectives™ on Stack Overflow

how to update values in a DataFrame based on values in another DataFrame?

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related