1

I have the following csv file:

type    sku quantity    country account
Order   CHG-FOOD1COMP-CA    1   usa hch
Order   CHG-FOOD2COMP-CA    1   usa hch
Order   CHG-FOOD2COMP-CA    1   usa hch
Order   CHG-FOOD1COMP-CA    1   usa hch
Order   CHG-FOODCONT1-CA    1   usa hch
Order           usa hch
Order   Q7-QDH0-EBB5-CA 1   usa hch
Order   CHG-FRY-12PT5-CA    1   usa hch
Order   Q7-QDH0-EBB5-CA 1   usa hch
Order   Q7-QDH0-EBB5-CA 1   usa hch
Order   CHG-FRY-12PT5-CA    1   usa hch
Order   CB-BB-CLR12-CA  1   usa hch
Order   CB-BB-AMB12-CA  1   usa hch

Order           usa hch
Order   CB-BB-AMB12-CA  1   usa hch
Order   CHG-FRY-12PT5-CA    1   usa hch
Order   CB-BB-CLR12-CA  1   usa hch
Order   CHG-FRY-12PT5-CA    1   usa hch
Order   CHG-FOODCONT1-CA    1   usa hch
Refund  CHG-FRY-9PT5-CA 1   usa hch
Order   CHG-FOOD1COMP-CA    1   usa hch

I have the following data. I want to get the the total quantity per sku.

SQL: Select sku sum(quantity) As TotalQty,  country, account
     From (usa_chc_Date.csv)
     group by sku,...

I dont mind getting the sum first and then adding those country/account columns that are always the same. My purpose is to store the info in these csv so they are easy to load into django and then delete the files. This is what I am looking for:

sku   TotalQty country account
sku1   7       mx     chc
sku3   4       mx     chc
sku4   2       mx     chc
sku5   1       mx     chc
sku6   7       mx     chc
sku7   9       mx     chc

I also named the file to include the country/account info. I guess I could just use the file, and strip the country and account as I save the model.

side note-the accounts do not change since they are on the same report. Once they are loaded they skus can have duplicates but they have different countries.

I tried this:

 df = df.groupby(['sku','quantity']).sum()
2
  • Where are the brand/country columns in the data? It's unclear what you're trying to do with the sample data provided. Commented Sep 3, 2017 at 11:34
  • @Andrew I changed my Question Acconnt and brand are the same, sorry. I hope it is a bit more clear. I am trying to get a total per sku. So if SKU1 came up on 7 orders, and 2 of those orders had 2 each, and the rest had 1, TotalQty would say 9 and the row would be: sku | Totalqty | Country | Account Commented Sep 3, 2017 at 12:35

2 Answers 2

1

You're using pd.groupby on the wrong colums.

Your question suggests that "country" and "account" are the same for all "sku". In this case you should use:

df.groupby(['sku', 'country', 'account'], as_index=False).quantity.sum()
Out []:
                sku country account  quantity
0    CB-BB-AMB12-CA     usa     hch         2
1    CB-BB-CLR12-CA     usa     hch         2
2  CHG-FOOD1COMP-CA     usa     hch         3
3  CHG-FOOD2COMP-CA     usa     hch         2
4  CHG-FOODCONT1-CA     usa     hch         2
5  CHG-FRY-12PT5-CA     usa     hch         4
6   CHG-FRY-9PT5-CA     usa     hch         1
7   Q7-QDH0-EBB5-CA     usa     hch         3

Note: I removed two lines from your example where there is no "sku" nor "quantity". It these cases should be handled, just tell is in comment.

Sign up to request clarification or add additional context in comments.

1 Comment

I tried that and it returned all the columns, you where right on the suggestion, I was also looking to add the header back since it doesn't show up on the csv.
0
df = df.groupby(['sku','Country','Account'],as_index=True)['actual sales'].sum()
df = df.reset_index()
df.rename(columns={0:'count'}, inplace=True)

I changed a column name for my convenience, otherwise irrelevant

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.