3

I have the following scenario in Postgres (I'm using 9.4.1).

I have a table of this format:

create table test(
    id serial,
    val numeric not null,
    created timestamp not null default(current_timestamp),
    fk integer not null
);

What I then have is a threshold numeric field in another table which should be used to label each row of test. For every value which is >= threshold I want to have that record marked as true but if it is true it should reset subsequent counts to 0 at that point, e.g.

Data set:

insert into test(val, created, fk)
  (100, now() + interval '10 minutes', 5),
  (25,  now() + interval '20 minutes', 5),
  (30,  now() + interval '30 minutes', 5),
  (45,  now() + interval '40 minutes', 5),
  (10,  now() + interval '50 minutes', 5);

With a threshold of 50 I would like to get the output as:

100 -> true (as 100 > 50) [reset]
25  -> false (as 25 < 50)
30  -> true (as 25 + 30 > 50) [reset]
45  -> false (as 45 < 50)
10  -> true (as 45 + 10 > 50)

Is it possible to do this in a single SQL query? So far I have experimented with using a window function.

select t.*,
       sum(t.val) over (
         partition by t.fk order by t.created
       ) as threshold_met
from test t
where t.fk = 5;

As you can see I have got it to the point where I have a cumulative frequency and suspect that the tweaking of rows between x preceding and current row may be what I'm looking for. I just can't work out how to perform the reset, i.e. set x, in the above to the appropriate value.

1
  • 2
    Excellent question with all the necessary details. More questions like this one please. :) Commented Apr 2, 2015 at 16:17

1 Answer 1

4

Create your own aggregate function, which can be used as window function.

Specialized aggregate function

It's easier than one might think:

CREATE OR REPLACE FUNCTION f_sum_cap50 (numeric, numeric)
  RETURNS numeric LANGUAGE sql AS
'SELECT CASE WHEN $1 > 50 THEN 0 ELSE $1 END + $2';

CREATE AGGREGATE sum_cap50 (numeric) (
  sfunc    = f_sum_cap50
, stype    = numeric
, initcond = 0
);

Then:

SELECT *, sum_cap50(val) OVER (PARTITION BY fk
                               ORDER BY created) > 50 AS threshold_met 
FROM   test
WHERE  fk = 5;

Result exactly as requested.

db<>fiddle here
Old sqlfiddle

Generic aggregate function

To make it work for any thresholds and any (numeric) data type, and also allow NULL values:

CREATE OR REPLACE FUNCTION f_sum_cap (anyelement, anyelement, anyelement)
  RETURNS anyelement
  LANGUAGE sql STRICT AS
$$SELECT CASE WHEN $1 > $3 THEN '0' ELSE $1 END + $2;$$;

CREATE AGGREGATE sum_cap (anyelement, anyelement) (
  sfunc    = f_sum_cap
, stype    = anyelement
, initcond = '0'
);

Then, to call with a limit of, say, 110 with any numeric type:

SELECT *
     , sum_cap(val, '110') OVER (PARTITION BY fk
                                 ORDER BY created) AS capped_at_110
     , sum_cap(val, '110') OVER (PARTITION BY fk
                                 ORDER BY created) > 110 AS threshold_met 
FROM   test
WHERE  fk = 5;

db<>fiddle here
Old sqlfiddle

Explanation

In your case we don't have to defend against NULL values since val is defined NOT NULL. If NULL can be involved, define f_sum_cap() as STRICT and it works because (per documentation):

If the state transition function is declared "strict", then it cannot be called with null inputs. With such a transition function, aggregate execution behaves as follows. Rows with any null input values are ignored (the function is not called and the previous state value is retained) [...]

Both function and aggregate take one more argument. For the polymorphic variant it can be a hard coded data type or the same polymorphic type as the leading arguments.

About polymorphic functions:

Note the use of untyped string literals, not numeric literals, which would default to integer!

Sign up to request clarification or add additional context in comments.

3 Comments

Brilliant, thanks Erwin. One question: is it possible to define the threshold at runtime (as it will not always be 50)? Thanks :-)
Ah I think I've got it, I have changed the f_sum_cap function to take an additional numeric parameter which I pass through at query time, e.g. sum_cap(val, 40) and then have the f_sum_cap function use 'select case when $1 > $3 then 0 else $1 end + $2'. Is this the best approach?
@jabclab: Basically yes. While being at it, I would also make it polymorphic and fit for NULL values. Consider the update.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.