1

I am breaking my head over this probably pretty simply question and I just can't find the answer anywhere. I want to create a new column with a grouped sum of another column, but I want to keep all individual rows. So, this is what the docs say:

import polars as pl

df = pl.DataFrame(
    {
        "a": ["a", "b", "a", "b", "c"],
        "b": [1, 2, 1, 3, 3],
    }
)

df.group_by("a").agg(pl.col("b").sum())  

The output of this would be:

shape: (3, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪═════╡
│ a   ┆ 2   │
│ c   ┆ 3   │
│ b   ┆ 5   │
└─────┴─────┘

However, what I need would be this:

shape: (5, 3)
┌─────┬─────┬────────┐
│ a   ┆ b   ┆ sum(b) │
│ --- ┆ --- ┆ ------ │
│ str ┆ i64 ┆ i64    │
╞═════╪═════╪════════╡
│ a   ┆ 1   ┆ 2      │
│ b   ┆ 2   ┆ 5      │
│ a   ┆ 1   ┆ 2      │
│ b   ┆ 3   ┆ 5      │
│ c   ┆ 3   ┆ 3      │
└─────┴─────┴────────┘

I could create the sum in a separate df and then join it with the original one, but I am pretty sure, there is an easier solution.

1

1 Answer 1

3

All you need is a window function:

df.with_columns(
    b_sum=pl.col("b").sum().over(pl.col("a"))
)


shape: (5, 3)
┌─────┬─────┬───────┐
│ a   ┆ b   ┆ b_sum │
│ --- ┆ --- ┆ ---   │
│ str ┆ i64 ┆ i64   │
╞═════╪═════╪═══════╡
│ a   ┆ 1   ┆ 2     │
│ b   ┆ 2   ┆ 5     │
│ a   ┆ 1   ┆ 2     │
│ b   ┆ 3   ┆ 5     │
│ c   ┆ 3   ┆ 3     │
└─────┴─────┴───────┘
Sign up to request clarification or add additional context in comments.

2 Comments

Jeez, sometimes it's really that easy. Was totally unaware of the over(). This helps a lots. I'll accept it as soon as the 5 minutes have passed :D.
@gernophil Exactly. It's similar to SQL.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.