Efficiently outer join two array columns row-wise in BigQuery table

Question

I'll first state the question as simply as possible, then elaborate with more detail and an example.

Concise question without context

I have a table with rows containing columns of arrays. I need to outer join the elements of some pairs of these, compute some variables, and then aggregate the results back into a new array. I'm currently using a pattern where I:

unnest each column in the pair to be joined (cross join to PK of row)
full outer join the two on the PK and compute desired fields
group by PK to get back to single row with array column that summarizes the results

Is there a way to do this without the multiple unnesting and grouping back down?

More context and an example

I have a table which represents edits to an entity that is made up of multiple sub-records. Each row represents a single entity. There is a column before that contains the records before the edit, and another after that contains the records afterwards.

My goal is to label each sub-record with exactly one of the four valid edit types:

DELETE - record exists in before but not after
ADD - record exists in after but not before
EDIT - record exists in both before and after but any field was changed
NONE - record exists in both before and after and no fields were changed

Each of the sub-record values is represented by its ID and a hash of all of its fields. I've created some fake data and provided my initial implementation below. This works, but it seems very roundabout.

WITH source_data AS (
  SELECT
    1 AS pkField,
    [
      STRUCT(1 AS id, 1 AS fieldHash),
      STRUCT(2 AS id, 2 AS fieldHash),
      STRUCT(3 AS id, 3 AS fieldHash)
    ] AS before,
    [
      STRUCT(1 AS id, 1 AS fieldHash),
      STRUCT(2 AS id, 0 AS fieldHash),  -- record 2 edited
      -- record 3 deleted
      STRUCT(4 AS id, 4 AS fieldHash), -- record 4 added
      STRUCT(5 AS id, 5 AS fieldHash)  -- record 5 added
    ] AS after
)
SELECT
  pkField,
  ARRAY_AGG(STRUCT(
    id,
    CASE
      WHEN beforeHash IS NULL THEN "ADD"
      WHEN afterHash IS NULL THEN "DELETE"
      WHEN beforeHash <> afterHash THEN "EDIT"
      ELSE "NONE"
    END AS editType
  )) AS edits
FROM (
  SELECT pkField, id, fieldHash AS beforeHash
  FROM source_data
  CROSS JOIN UNNEST(source_data.before)
)
FULL OUTER JOIN (
  SELECT pkField, id, fieldHash AS afterHash
  FROM source_data
  CROSS JOIN UNNEST(source_data.after)
)
USING (pkField, id)
GROUP BY pkField

Is there a simpler and/or more efficient way to do this? Perhaps something that avoids the multiple unnesting and grouping back down?

i think, what you have is already the simplest and most efficient way! — Mikhail Berlyant
– Mikhail Berlyant, Commented Aug 24, 2022 at 22:21
But, looks like there is always better one exists :o) - see below answer — Mikhail Berlyant
– Mikhail Berlyant, Commented Aug 24, 2022 at 22:28

Mikhail Berlyant · Accepted Answer · 2022-08-24 22:34:13Z

2

I think, what you have is already simple and efficient way!
Meantime, you can consider below optimized version

select pkField, 
  array(select struct(
      id, case
      when b.fieldHash is null then 'ADD'
      when a.fieldHash is null then 'DELETE'
      when b.fieldHash != a.fieldHash then 'EDIT'
      else 'NONE'
      end as editType
    ) edits
    from (select id, fieldHash from t.before) b
    full outer join (select id, fieldHash from t.after) a
    using(id)
  ) edits
from source_data t

if applied to sample data in your question - output is

edited Aug 24, 2022 at 22:34

answered Aug 24, 2022 at 22:27

Mikhail Berlyant

174k10 gold badges172 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Mack Over a year ago

Perfect! This is exactly what I was hoping would exist. Thank you!

Collectives™ on Stack Overflow

Efficiently outer join two array columns row-wise in BigQuery table

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related