1

Suppose I have the following DataFrames:

Containers:

Key ContainerCode       Quantity
1   P-A1-2097-05-B01    0
2   P-A1-1073-13-B04    0
3   P-A1-2024-09-H05    0
5   P-A1-2018-08-C05    0
6   P-A1-2089-03-C08    0
7   P-A1-3033-16-H07    0
8   P-A1-3035-18-C02    0
9   P-A1-4008-09-G01    0

Inventory:

Key SKU     ContainerCode       Quantity
1   22-3-1  P-A1-4008-09-G01    1
2   2132-12 P-A1-3033-16-H07    55
3   222-12  P-A1-4008-09-G01    3
4   4561-3  P-A1-3083-12-H01    126

How do I update the Quantity values in Containers to reflect the number of units in each container based on the information in Inventory? Note that multiple SKUs can reside in a single ContainerCode, so we need to add to the quantity, rather than just replace it, and there may be multiple entries in Containers for a particular ContainerCode.

What are the possible ways to accomplish this, and what are their relative pros and cons?

EDIT

The following code seems to serve as a good test case:

import itertools
import pandas as pd
import numpy as np

inventory = pd.DataFrame({'Container Code':['A1','A2','A2','A4'],
                               'Quantity':[10,87,2,44],
                               'SKU':['123-456','234-567','345-678','456-567']})

containers = pd.DataFrame({'Container Code':['A1','A2','A3','A4'],
                               'Quantity':[2,0,8,4],
                               'Path Order':[1,2,3,4]})

summedInventory = inventory.groupby('Container Code')['Quantity'].sum()

print('Containers Data Frame')
print(containers)
print('\nInventory Data Frame')
print(inventory)
print('\nSummed Inventory List')
print(summedInventory)
print('\n')

newContainers = containers.drop('Quantity', axis=1). \
     join(inventory.groupby('Container Code').sum(), on='Container Code')
print(newContainers)

This seems to produce the desired output.

I also tried using a regular merge:

pd.merge(containers.drop('Quantity', axis=1), \
    summedInventory,how='inner',left_on='Container Code', right_index=True)

But that produces an 'IndexError: list index out of range'

Any ideas?

2
  • Using pd.merge(containers.drop('Quantity', axis=1), summedInventory,how='inner', left_on='Container Code', right_on='Container Code') produces KeyError: 'Container Code' The desire to use the pd.merge() comes from the possibility (actually the fact, in my real world application of this) that the inventory DF can have different column names than the container DF. Commented Nov 7, 2014 at 16:59
  • Looks like the error with the pd.merge() is coming from the line summedInventory = inventory.groupby('Container Code')['Quantity'].sum() If we delete the ['Quantity'] portion, it goes through just fine. Commented Nov 7, 2014 at 17:36

1 Answer 1

2

I hope I got your scenario correctly. I think you can use:

containers.drop('Quantity', axis = 1).\
           join(inventory.groupby('ContainerCode').sum(), \
                on = 'ContainerCode')
  1. I'm first dropping quantity from containers because you don't need it - we'll create it from inventory.
  2. Then, we group by inventory by the container code, to sum the quantity relevant to each container.
  3. We then perform the join between the two, and each containercode existent in containers would recieve the summed quantity from inventory
Sign up to request clarification or add additional context in comments.

4 Comments

Will this modify the original DataFrame, or create a new instance?
@TraxusIV This will create a new one. You can do the drop in place, but to the best of my knowledge you can't join in place.
It works, but how would we handle a case where the column names are different between the left and right DataFrames?
@TraxusIV your comments on your question seems like you've solved it, correct?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.