1

Considering a BigQuery view that has one column ARRAY<STRUCT<col1 FLOAT, col2 FLOAT>> called X.

Which is the best way to retrieve those rows with an additional column that is a computation of those elements in the "X" array. Is it possible to make it via big query stored procedure?

it would be great having something like:

select computation(X), * from something

computation(X) would loop through the various element of the X array and sum them with some additional rules. Regarding that, it seems BigQuery functions do not handle structures/arrays but only scalar types so X can't be taken in account due to the fact that datatype is ARRAY<STRUCT<col1 FLOAT, col2 FLOAT>>. Loops are still needed so I was thinking about using a procedure but I couldn't find any example that does operations on repeatable columns and give the result with the query itself.

One of the point was keeping this query usable within BigQuery query console, avoiding solutions like external script (like python).

Is there any example of procedures that internally have a select and they return additional information to it? Like a map function over a query. A sort of filter like the previous function computation (X) example.

As requested, to give more context I have a column that is:

ARRAY<STRUCT<pricing_unit_quantity FLOAT64,
start_usage_amount FLOAT64, usd_amount FLOAT64, account_currency_amount FLOAT64>>

It contains GCP price tiers. I have to iterate through all of them and compute the final price. FLOAT64 is a placeholder for a reliable datatype for currencies. I am still looking for that on BigQuery.

I would reach something like that, implementing a function called get_tiers_total_expense


-- Ex. Those are 2 tiers.
-- Tier 1: It starts from 0 usages and goes until 20. The costs 10 for every unit used.
-- Tier 2: It starts from 20 usages and cost is 5

select get_tiers_total_expense(array(
  select as struct 1.0, 0.0, 10.0, 9.0 union all
  select as struct 1.0, 20.0, 5.0, 4.0 as tiered_rates));
4
  • 1
    you can define an UDF taking an array of struct as a parameter. Commented Apr 5, 2023 at 12:37
  • Do you have any specifc example of input and it's expected output ? return additional information you mentioned is not clear to me. At least knowing a return type of a function would be helpful to suggest a example of UDF. Commented Apr 6, 2023 at 4:20
  • Thank you @Jaytiger, I wrote additional info and my current approach. Commented Apr 6, 2023 at 6:56
  • I've written down some working example based on your explanation. kindly check that out and get back to me if you need more explanation. Commented Apr 6, 2023 at 7:07

2 Answers 2

2

You might consider below. (I would suggest SQL UDF rather than JS UDF.)

-- sample data
CREATE TEMP TABLE tiers AS
SELECT 1.0 pricing_unit_quantity, 0.0 start_usage_amount,
       10.0 usd_amount, 9.0 account_currency_amount UNION ALL
SELECT 1.0, 20.0, 5.0, 4.0;

-- define UDFs here
CREATE TEMP FUNCTION get_tiers_total_expense (
  tiers ARRAY<STRUCT<pricing_unit_quantity FLOAT64,
                     start_usage_amount FLOAT64,
                     usd_amount FLOAT64,
                     account_currency_amount FLOAT64>>
) AS ((
  -- it takes tiers and return the final cost
  -- -> you can adjust the aggregation logic
  SELECT SUM(pricing_unit_quantity * usd_amount * account_currency_amount)
    FROM UNNEST(tiers) tier
));

-- sample query here using UDFs and sample data
SELECT get_tiers_total_expense(
         ARRAY_AGG(
           STRUCT(
             pricing_unit_quantity,
             start_usage_amount,
             usd_amount,
             account_currency_amount
           )
         )
       ) AS get_tiers_total_expense
  FROM tiers t
 --GROUP BY -- later you can change group-by columns depending on your use cases.

-- query result
+-------------------------+
| get_tiers_total_expense |
+-------------------------+
|                   110.0 |
+-------------------------+
Sign up to request clarification or add additional context in comments.

2 Comments

In general I agree with you regarding the usage of SQL instead of JS in UDFs. In my specific case I have additional logic to deal with and JS is simpler, at least for me. Thanks
Sure, if JS is more flexible to implement your logic, definitely you can use JS UDF.
0

I ended up by following the suggestion of @jaytiger and so I created a BigQuery UDF.

I had to iterate through all the gcp tiers to calculate the final cost cost. I used FLOAT as placeholder for currencies. I surely need to figure out which is a better data type for storing them in BigQuery...

Here the partial solution:

CREATE OR REPLACE FUNCTION get_tiers_total_expense(
tiers ARRAY<STRUCT<pricing_unit_quantity FLOAT64,
start_usage_amount FLOAT64, usd_amount FLOAT64, account_currency_amount FLOAT64>>
)
RETURNS NUMBER
LANGUAGE js AS """
// it takes tiers and return the final cost
// Luckily float64 are converted to Number in JS and not float
// that is not great for currency computation
""";


-- Ex. Those are 2 tiers.
-- Tier 1: It starts from 0 usages and goes until 20. The costs 10 for every unit used.
-- Tier 2: It starts from 20 usages and cost is 5

select get_tiers_total_expense(array(
  select as struct 1.0, 0.0, 10.0, 9.0 union all
  select as struct 1.0, 20.0, 5.0, 4.0 as tiered_rates));

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.