2

I have the following data:

WITH data as (
  SELECT 18 AS value, 1 AS id, "A" AS other_value,
  UNION ALL SELECT 20 AS value, 1 AS id, "B",
  UNION ALL SELECT 22 AS value, 2 AS id, "C"
  UNION ALL SELECT 30 AS value, 3 AS id, "A"
  UNION ALL SELECT 37 AS value, 2 AS id, "B"
  UNION ALL SELECT 31 AS value, 2 AS id, "C"
  UNION ALL SELECT 42 AS value, 1 AS id, "D"
)

I am using the following query

select
   FIRST_VALUE(id) over w1 as id
 , ARRAY_AGG(value) over w1  as data
 , FIRST_VALUE(other_value) over w1 as first_other_data
 , LAST_VALUE(other_value) over w1  as last_other_data
from data
WINDOW w1 as (PARTITION BY id order by value ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)

And I get

id  data  first_other_data  last_other_data 
1   18    A                 D 
    20
    42
1   18    A                 D
    20
    42
1   18    A                 D
    20
    42
2   22    C                 B
    31
    37
2   22    C                 B
    31
    37
2   22    C                 B
    31
    37
3   30    A                 A

But i am getting duplicates that I don't want. I was thinking to use distinct keyword, but bigquery do not like it. My expected result is :

id  data  first_other_data  last_other_data     
1   18     A                 D
    20
    42
2   22     C                 B
    31
    37
3   30     A                 A

I have found similar questions but not exactly this case. Thanks EDIT: In my attempt to simplify the scenario for this SO question I took out some essential components. I have modified this with a more accurate version of my problem.

2 Answers 2

3

I think you still want an aggregation with grouping :) Since value and other_value seem to be related (you order by value in the window to select other_value) I'd simply put them into the array, too:

WITH data as (
  SELECT 18 AS value, 1 AS id, "A" AS other_value,
  UNION ALL SELECT 20 AS value, 1 AS id, "B",
  UNION ALL SELECT 22 AS value, 2 AS id, "C"
  UNION ALL SELECT 30 AS value, 3 AS id, "A"
  UNION ALL SELECT 37 AS value, 2 AS id, "B"
  UNION ALL SELECT 31 AS value, 2 AS id, "C"
  UNION ALL SELECT 42 AS value, 1 AS id, "D"
)

select 
  id,
  array_agg(struct(value, other_value)) as data
from data
group by id

So, if you need it later you can write a subquery to get it - or you can add another step to do in one query with the aggregation in between:

WITH data as (
  SELECT 18 AS value, 1 AS id, "A" AS other_value,
  UNION ALL SELECT 20 AS value, 1 AS id, "B",
  UNION ALL SELECT 22 AS value, 2 AS id, "C"
  UNION ALL SELECT 30 AS value, 3 AS id, "A"
  UNION ALL SELECT 37 AS value, 2 AS id, "B"
  UNION ALL SELECT 31 AS value, 2 AS id, "C"
  UNION ALL SELECT 42 AS value, 1 AS id, "D"
), 
temp as (
  select 
    id,
    array_agg(struct(value, other_value)) as data
  from data
  group by id
)
select 
  id,
  array(select value from unnest(data)) as data,
  (select other_value from unnest(data) order by value ASC limit 1) first_other_data,
  (select other_value from unnest(data) order by value DESC limit 1) last_other_data
from temp
Sign up to request clarification or add additional context in comments.

3 Comments

Hi Martin, my bad, In my attempt to simplify the query for this SO question I took out some essential components. I have modified it with a more accurate representation of my problem. but given the scenario presented you're right, your query would solve the previous problem.
Yes it works, thank you for your reply. I have upvoted your answer and comments, however I will approve Mikhail's answer as it looks more neat. Thank you.
it does?! I wouldn't expect it to scale well
2

The easiest way to achieve your goal without changing your initial query is to wrap it with extra select as in below example

#standardSQL
SELECT ANY_VALUE(t).* FROM (
  SELECT
     FIRST_VALUE(id) OVER w1 AS id
   , ARRAY_AGG(value) OVER w1  AS data
   , FIRST_VALUE(other_value) OVER w1 AS first_other_data
   , LAST_VALUE(other_value) OVER w1  AS last_other_data
  FROM data
  WINDOW w1 AS (PARTITION BY id ORDER BY value ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
) t
GROUP BY FORMAT('%t', t)  

with result

enter image description here

2 Comments

interesting, thank you. What does FORMAT in the group by exactly does?
FORMAT('%t', t) generates kind of fingerprint of row so you can then use it to deduplicate your result. Please consider voting up the answer if it was helpful :o)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.