On Bigquery, I have a table with many columns. I want to create a new table showing counts of all distinct values for all columns.
I am using the below code:
SELECT key AS column_name,
COUNT(DISTINCT val) AS number_of_options,
ARRAY_AGG(DISTINCT val) AS distinct_options
FROM (
SELECT TRIM(x[0], '"') AS key,
TRIM(x[1], '"') AS val
FROM t,
UNNEST(SPLIT(TRIM(TO_JSON_STRING(t), '{}'), ',"')) kv,
UNNEST([STRUCT(SPLIT(kv, '":') AS x)])
)
GROUP BY key
The above code saves me from specifying all column names or looping over them. But this throws the error 'Array index 1 is out of bounds (overflow)'. Without referencing x[1] (which contains the value for each row-column), the code works fine. I tried adding 'where x[1] is not null', etc., but everytime the code throws the above error.
Will appreciate any help. Can elaborate more if needed.