python combine rows in dataframe and add up values

Question

I have a dataframe:

 Type:  Volume:
 Q     10
 Q     20 
 T     10 
 Q     10
 T     20
 T     20
 Q     10

and I want to combine type T to one row and add up volume only if two(or more) Ts are consecutive

i.e. to :

 Q    10
 Q    20 
 T    10 
 Q    10 
 T    20+20=40
 Q    10

is there any way to achieve this? would DataFrame.groupby work?

This looks like it might start to address your question stackoverflow.com/a/45679091/4365003 — RagingRoosevelt
– RagingRoosevelt, Commented Sep 5, 2017 at 16:26
I think that's kind of different...I want to combine rows instead of count them — bing
– bing, Commented Sep 5, 2017 at 16:38
~~Wouldn't you just use a different aggregate function, then?~~ — RagingRoosevelt
– RagingRoosevelt, Commented Sep 5, 2017 at 16:49
I cant find the aggregate function that does this... sry im new to python — bing
– bing, Commented Sep 5, 2017 at 16:53

a.deshpande012 · Accepted Answer · 2017-09-06 01:42:56Z

1

I think this will help. This code can handle any number of consecutive 'T's, and you can even change which character to combine. I've added comments in the code to explain what it does.

https://pastebin.com/FakbnaCj

import pandas as pd

def combine(df):
    combined = [] # Init empty list
    length = len(df.iloc[:,0]) # Get the number of rows in DataFrame
    i = 0
    while i < length:
        num_elements = num_elements_equal(df, i, 0, 'T') # Get the number of consecutive 'T's
        if num_elements <= 1: # If there are 1 or less T's, append only that element to combined, with the same type
            combined.append([df.iloc[i,0],df.iloc[i,1]])
        else: # Otherwise, append the sum of all the elements to combined, with 'T' type
            combined.append(['T', sum_elements(df, i, i+num_elements, 1)])
        i += max(num_elements, 1) # Increment i by the number of elements combined, with a min increment of 1
    return pd.DataFrame(combined, columns=df.columns) # Return as DataFrame

def num_elements_equal(df, start, column, value): # Counts the number of consecutive elements
    i = start
    num = 0
    while i < len(df.iloc[:,column]):
        if df.iloc[i,column] == value:
            num += 1
            i += 1
        else:
            return num
    return num

def sum_elements(df, start, end, column): # Sums the elements from start to end
    return sum(df.iloc[start:end, column])

frame = pd.DataFrame({"Type":   ["Q", "Q", "T", "Q", "T", "T", "Q"],
               "Volume": [10,   20,  10,  10,  20,  20,  10]})
print(combine(frame))

edited Sep 6, 2017 at 1:42

answered Sep 6, 2017 at 1:37

a.deshpande012

7448 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

bing Over a year ago

Thank you very much for your reply. May I ask how can I change this code if I got a dataframe with more than 2 columns, and I only want to add up the value of one column and leave the rest unchanged? i.e. instead of 'Type' and 'Volume', I got 'Type', 'Time', 'Volume' and ect, and I only want to add up the value for 'Volume'

a.deshpande012 Over a year ago

When you append the element to the combined list (a) just put in df.iloc[i,col] where col is the column index of the 'Time' column. combined.append([df.iloc[i,0],df.iloc[i,1]]) becomes combined.append([df.iloc[i,0],df.iloc[i,1],df.iloc[i,2]]), and combined.append(['T', sum_elements(df, i, i+num_elements, 1)]) becomes combined.append(['T', df.iloc[i,1], sum_elements(df, i, i+num_elements, 2)])

bing Over a year ago

stackoverflow.com/questions/46099924/…

javidcf · Accepted Answer · 2017-09-05 17:01:21Z

1

If you just need the partial sums, here is a little trick to do that:

import numpy as np
import pandas as pd

df = pd.DataFrame({"Type":   ["Q", "Q", "T", "Q", "T", "T", "Q"],
                   "Volume": [10,   20,  10,  10,  20,  20,  10]})
s = np.diff(np.r_[0, df.Type == "T"])
s[s < 0] = 0
res = df.groupby(("Type", np.cumsum(s) - 1)).sum().loc["T"]
print(res)

Output:

   Volume
0      10
1      40

answered Sep 5, 2017 at 17:01

javidcf

59.9k7 gold badges87 silver badges134 bronze badges

3 Comments

bing Over a year ago

stackoverflow.com/questions/46099924/…

javidcf Over a year ago

@bing Is that the same question repeated?

bing Over a year ago

Not exactly the same, the new dataframe got more than two columns

Collectives™ on Stack Overflow

python combine rows in dataframe and add up values

2 Answers 2

3 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related