0

I have clickhouse table with 3 columns: respondent, spot и weight.

Data looks like this:

resp, spot, weight
1, 10, 100
2, 10, 200
3, 10, 300
1, 10, 100
1, 20, 100
3, 20, 300
4, 20, 400
1, 30, 100
2, 30, 200
3, 30, 300
5, 30, 500

spot column is sorted. I need to write a query to calculate the accumulated Weight for Spots in the order of 10, 20, 30 so that each respondent is counted only once. That is, in spot = 20 respondents that were in spot = 10 should be taken into cumulative sum, but only once. In spot = 30 respondents that were in spot = 10 and spot = 20 should be taken into cumulative sum, but only once.

The final result will be as follows:

spot, summed_weight
10, 600
20, 1000
30, 1500

I tried to use group_array function to combine respondent and its weight, and found last rows per spot via row_number function, but RAM consumption had become extremely high. Something like this:

resp, spot, weight, group_array, row_number
1, 10, 100, [(1,100)], 4
2, 10, 200, [(1,100), (2,200)], 3
3, 10, 300, [1,100), (2,200), (3,300)], 2
1, 10, 100, [1,100), (2,200), (3,300)], 1
1, 20, 100, [1,100), (2,200), (3,300)], 3
3, 20, 300, [1,100), (2,200), (3,300)], 2
4, 20, 400, [1,100), (2,200), (3,300), (4,400)], 1
1, 30, 100, [1,100), (2,200), (3,300), (4,400)], 4
2, 30, 200, [1,100), (2,200), (3,300), (4,400)], 3
3, 30, 300, [1,100), (2,200), (3,300), (4,400)], 2
5, 30, 500, [1,100), (2,200), (3,300), (4,400), (5,500)], 1
2
  • I'll add a reply below but always provide your current SQL query if you can + schema + INSERT. Makes it easier for everyone Commented Mar 5, 2024 at 9:40
  • Based on your description its hard to see how spot 20 has a result of 1000 10 = 100 (resp 1) +200 (resp 2) +300 (resp 3) = 600 so 20 = 100 + 100 (resp 1) + 300 + 300(resp 3) + 400 ( resp3) = 1200? Commented Mar 5, 2024 at 9:51

2 Answers 2

3

Thanks to @mark for making me realise what you meant. I think the following should be alittle more efficient

CREATE TABLE test
(
    `resp` Int64,
    `spot` Int64,
    `weight` Int64
)
ENGINE = Memory

INSERT INTO test VALUES (1,10,100),(2,10,200),(3,10,300),(1,10,100),(1,20,100),(3,20,300),(4,20,400),(1,30,100),(2,30,200),(3,30,300),(5,30,500)

SELECT
    min_spot AS spot,
    sum(sum(weight)) OVER (ORDER BY spot ASC) AS weight
FROM
(
    SELECT
        resp,
        min(spot) AS min_spot,
        argMin(weight, spot) AS weight
    FROM test
    GROUP BY resp
)
GROUP BY spot
Sign up to request clarification or add additional context in comments.

1 Comment

Yes, I agree with you. Your code should be really more efficient.
1

Try this:

SELECT spot, sum (sum (weight)) over (order by spot) as weigt
FROM 
(
SELECT *
FROM
(
SELECT 
  *
, row_number() over (partition by resp order by spot)
  as rn
FROM VALUES
(
  'resp UInt32, spot UInt32, weight UInt32'
, (1, 10, 100)
, (2, 10, 200)
, (3, 10, 300)
, (1, 10, 100)
, (1, 20, 100)
, (3, 20, 300)
, (4, 20, 400)
, (1, 30, 100)
, (2, 30, 200)
, (3, 30, 300)
, (5, 30, 500)
)
)
WHERE rn = 1
)
GROUP BY spot

The result is:

spot weigt
10 600
20 1000
30 1500

fiddle

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.