1

I have a pandas dataframe like this -

Time    1 A    2 A     3 A     4 A     5 A    6 A    100 A
    5    10     4       6       6       4      6      4
    3    7      19      2       7       7      9      18
    6    3      6       3       3       8      10     56
    2    5      9       1       1       9      12     13

The Time columns gives me the number of A columns that I need to sum up.So that the output looks like this -

 Time   1 A    2 A     3 A     4 A     5 A    6 A    100 A    Total
    5    10     4       6       6       4      6      4         30
    3    7      19      2       7       7      9      18        28
    6    3      6       3       3       8      10     56        33
    2    5      9       1       1       9      12     13        14

In other words, when the value in Time column is 3, it should sum up 1A, 2A and 3A when the value in Time column is 5, it should sum up 1A, 2A, 3A, 4A and 5A

Note: There are other columns also in between the As. So I cant sum using simple indexing.

Highly appreciate any help in finding a solution.

0

2 Answers 2

3

Use numpy - idea is compare array created by np.arange with length of columns with Time columns converted to index with broadcasting to 2d mask, get matched values by numpy.where and last sum:

df1 = df.set_index('Time')
m = np.arange(len(df1.columns)) < df1.index.values[:, None]
df['new'] = np.where(m, df1.values, 0).sum(axis=1)
print (df)
   Time  1 A  2 A  3 A  4 A  5 A  6 A  100 A  new
0     5   10    4    6    6    4    6      4   30
1     3    7   19    2    7    7    9     18   28
2     6    3    6    3    3    8   10     56   33
3     2    5    9    1    1    9   12     13   14

Details:

print (df1)
      1 A  2 A  3 A  4 A  5 A  6 A  100 A
Time                                     
5      10    4    6    6    4    6      4
3       7   19    2    7    7    9     18
6       3    6    3    3    8   10     56
2       5    9    1    1    9   12     13

print (m) 
[[ True  True  True  True  True False False]
 [ True  True  True False False False False]
 [ True  True  True  True  True  True False]
 [ True  True False False False False False]]

print (np.where(m, df1.values, 0))
[[10  4  6  6  4  0  0]
 [ 7 19  2  0  0  0  0]
 [ 3  6  3  3  8 10  0]
 [ 5  9  0  0  0  0  0]]
Sign up to request clarification or add additional context in comments.

4 Comments

This code doesnt work as expected if I insert another row in between the As
So you want filter only values with A ? then use df1 = df.set_index('Time').filter(regex=' A$')
@eiram_mahera - there are values of A from 1 to 100 ?
yes. not fixed from 1 to 100, it can be from 1 to 60 or anything
1

Try:

df['total'] = df.apply(lambda x: sum([x[i+1] for i in range(x['Time'])]), axis=1)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.