1

I have a table that looks like this:

enter image description here

Then I try to aggregate the label1, label2, label3 into one array for each type of label and then, finally I want to put all the non null labels into one combined array. So my query looks like this

#standardSQL
WITH
table AS (
SELECT 'abc' id, 1 label1, 12 label2, 122 label3 UNION ALL
SELECT 'abc', 1, 12, 129 UNION ALL
SELECT 'xyz', 2, 23, NULL UNION ALL
SELECT 'xyz', 2, 24, NULL
),

each_label_agg AS (
 SELECT
 id,
 ARRAY_AGG(label1 IGNORE NULLS) AS label1_agg,
 ARRAY_AGG(label2 IGNORE NULLS) AS label2_agg,
 ARRAY_AGG(label3 IGNORE NULLS) AS label3_agg
FROM
table
GROUP BY
 id)
SELECT
 each_label_agg.*,
 ARRAY_CONCAT(each_label_agg.label1_agg, each_label_agg.label2_agg, 
 each_label_agg.label3_agg) AS combined_labels
FROM
 each_label_agg

And the output looks like this:

enter image description here

But in the output I was expecting the combined_labels to be [2,2,23,24] for id xyz.

The ignore nulls parameter doesn't work in array_concat. I am guessing somehow the combined_labels become malformed because of an empty array at label3. How can I get the expected combined_labels for xyz to be [2,2,23,24]?

2 Answers 2

5
#standardSQL
WITH table AS (
  SELECT 'abc' id, 1 label1, 12 label2, 122 label3 UNION ALL
  SELECT 'abc', 1, 12, 129 UNION ALL
  SELECT 'xyz', 2, 23, NULL UNION ALL
  SELECT 'xyz', 2, 24, NULL
), each_label_agg AS (
 SELECT
   id,
   ARRAY_AGG(label1 IGNORE NULLS) AS label1_agg,
   ARRAY_AGG(label2 IGNORE NULLS) AS label2_agg,
   ARRAY_AGG(label3 IGNORE NULLS) AS label3_agg
  FROM table
  GROUP BY id
)
SELECT
  each_label_agg.*,
  ARRAY_CONCAT(
   IFNULL(each_label_agg.label1_agg, []), 
   IFNULL(each_label_agg.label2_agg, []),
   IFNULL(each_label_agg.label3_agg, [])
  ) AS combined_labels
FROM each_label_agg
Sign up to request clarification or add additional context in comments.

1 Comment

Works perfectly as I wanted!! Thanks again. But the way it had to be handled seemed to me a bit counter-intuitive.
1

The cause of this issue is because BigQuery has limitations respect to NULL values and the array generation will return NULL if any argument is NULL as documented here. Therefore, it is convenient to substitute them for empty arrays (since NULLs and empty arrays are two distinct values in BigQuery)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.