1

I'm looking to replicate the width_bucket function that is available in Oracle with a new function in BigQuery. The function creates equiwidth buckets based on the number you specify between a min and max value. For example, width_bucket(user_count, 0, 35, 10) would create 10 equal buckets like 0 - 3.5, 3.5 - 7, etc and tell you which bucket user_count falls in. Any assistance would be greatly appreciated!

Oracle doc - https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions214.htm

Here's what I have and I believe this works, but I want to get it so I don't have to reference a table if possible to generate the row numbers.

CREATE OR REPLACE FUNCTION functions.widthBucket(
  value NUMERIC,
  minValue NUMERIC,
  maxValue NUMERIC,
  buckets INT64)
AS ((
  SELECT resultBucket 
  FROM (
      SELECT CASE 
               WHEN value >= (minValue * bucketNumber) + ((maxValue/buckets) * (bucketNumber - 1))
                AND value < (maxValue/buckets) * bucketNumber 
               THEN bucketNumber
               WHEN value = maxValue and bucketNumber = buckets 
               THEN bucketNumber
             ELSE -1 
             END as resultBucket
      FROM (
            SELECT ROW_NUMBER() OVER (PARTITION BY '') as bucketNumber
            FROM project.dateTable
           ) x
      WHERE bucketNumber <= buckets) x
  WHERE resultBucket != -1
  ));
1
  • can you provide example of how you plan to use such UDF - with some dummy data for testing :o) asking because i don't seems to understand what that table is doing inside the function Commented Jul 23, 2020 at 1:38

1 Answer 1

2

Below is for BigQuery Standard SQL

Try below - I think it does exactly what you asked

CREATE TEMP FUNCTION widthBucket(
  value NUMERIC, 
  minValue NUMERIC, 
  maxValue NUMERIC, 
  buckets NUMERIC
) AS (
  RANGE_BUCKET(value, GENERATE_ARRAY(minValue, maxValue, (maxValue - minValue)/buckets))
);

The use is as simple as in your question For example, widthBucket(user_count, 0, 35, 10)

To address edge case when value equal to the maxValue - use below variation of above

CREATE TEMP FUNCTION widthBucket(
  value NUMERIC, 
  minValue NUMERIC, 
  maxValue NUMERIC, 
  buckets NUMERIC
) AS ((
  SELECT IF(bucket > buckets, buckets, bucket)
  FROM (
    SELECT RANGE_BUCKET(value, GENERATE_ARRAY(minValue, maxValue, (maxValue - minValue)/buckets)) bucket
  )
));
Sign up to request clarification or add additional context in comments.

2 Comments

Ah yes, that works pretty well and much cleaner. One problem though, if you set the value equal to the maxValue, the result is buckets + 1. Thoughts?
sure. this behavior makes sense as it is how RANGE_BUCKET works. So see update in the answer to handle this scenario

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.