I have a table which has an array field of ids, and I want to replace the ids with names, but keep it in the same array format. I have a separate 'lookup' table with the ids & their corresponding names. Since it's an array, I cannot do a simple join. I've tried flattening the array field, then doing a simple join - which works - but then I don't know how to put it back into the same original array format. Are there any better ways of doing this?
2 Answers
Below is for BigQuery Standard SQL
#standardSQL
CREATE TEMP FUNCTION VLOOKUP(expr ANY TYPE, map ANY TYPE) AS ((
IFNULL((SELECT name FROM UNNEST(map) WHERE id = expr), expr)
));
WITH `project.dataset.table` AS (
SELECT 'aaa' other_fields, [STRUCT('456' AS id_field), STRUCT('3367'), STRUCT('xyz')] AS array_fields UNION ALL
SELECT 'bbb', [STRUCT('56'), STRUCT('89')] UNION ALL
SELECT 'ccc', [STRUCT('40'), STRUCT('768'), STRUCT('8766'), STRUCT('abc')]
), `project.dataset.lookup_table` AS (
SELECT '456' id, 'A' name UNION ALL
SELECT '56', 'B' UNION ALL
SELECT '89', 'C' UNION ALL
SELECT '40', 'D'
)
SELECT t.* REPLACE(
ARRAY(
SELECT AS STRUCT id_field, VLOOKUP(id_field, kv) AS desired_new_field
FROM t.array_fields
) AS array_fields
)
FROM `project.dataset.table` t,
(SELECT ARRAY_AGG(STRUCT(id, name)) AS kv FROM `project.dataset.lookup_table`) arr
with result
Comments
If you have primary key in the table, you can flatten the array, and then group back on primary key using ARRAY_AGG to reconstruct an array. Without primary key, you can still do INNER JOIN (but not OUTER JOINs) over elements of the array, i.e.
SELECT
ARRAY(
SELECT STRUCT<id_field string, desired_id_field string>(id_field, desired_id_field)
FROM UNNEST(id_field) id_field
INNER JOIN lookup_table
ON id_field = lookup_table.names)
FROM main_table
You will still need to merge id_field and desired_id_field later to fill the gaps of ids which didn't match.

