2

I need to find out how many of the first N rows of a dataframe make up (just over) 50% of the sum of values for that column.

Here's an example:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(10, 1), columns=list("A"))

0  0.681991
1  0.304026
2  0.552589
3  0.716845
4  0.559483
5  0.761653
6  0.551218
7  0.267064
8  0.290547
9  0.182846

therefore

sum_of_A = df["A"].sum()

4.868260213425804

and with this example I need to find, starting from row 0, how many rows I need to get a sum of at least 2.43413 (approximating 50% of sum_of_A).

Of course I could iterate through the rows and sum and break when I get over 50%, but is there a more concise/Pythonic/efficient way of doing this?

1
  • There is "cumsum" for a cumulative sum and (if column has no negative values) "searchsorted" to find the point where the sum is greater than a given value. Commented Jan 17, 2023 at 16:05

1 Answer 1

2

I would use .cumsum(), which we can use to get all the rows where the cumulative sum is at least half of the total sum:

df[df["A"].cumsum() < df["A"].sum() / 2]
Sign up to request clarification or add additional context in comments.

5 Comments

Very interesting idea but it seem to select the rows which go OVER the 50% value. Using the example above your code would select rows 5-9
Yes, did you want the rows under 50%? If so, change the >= to <=.
Yes thanks. The correct comparison for me is using '<'.
Got it. If this answer helped you, please consider accepting it for the benefit of future readers. Have a great day!
If anyone's curious about the real use case, I have a dataframe with usernames in a column and the number of times they've commented in the other and have it sorted descending by this second column. With this I am selecting the first N users which have contributed to around 50% of the comments total :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.