4

In the case that I have the same example data as in this question and additionally declare the following two functions:

CREATE OR REPLACE FUNCTION example.markout_666_example_666_price_table_666_price(_symbol text, _time_of timestamptz, _start interval, _duration interval)
  RETURNS float8
  LANGUAGE sql STABLE STRICT PARALLEL SAFE AS  -- !
$func$
SELECT p.price
FROM   example.price_table p
WHERE  p.symbol = _symbol
AND    p.time_of >= _time_of + _start
AND    p.time_of <= _time_of + _start + _duration
ORDER  BY p.time_of
LIMIT  1;
$func$;

CREATE OR REPLACE FUNCTION example.markout_666_example_666_price_table_666_volume(_symbol text, _time_of timestamptz, _start interval, _duration interval)
  RETURNS float8
  LANGUAGE sql STABLE STRICT PARALLEL SAFE AS  -- !
$func$
SELECT p.volume
FROM   example.price_table p
WHERE  p.symbol = _symbol
AND    p.time_of >= _time_of + _start
AND    p.time_of <= _time_of + _start + _duration
ORDER  BY p.time_of
LIMIT  1;
$func$;

These two functions are similar but reference different columns. In a more general case they might also reference different tables. I state two different functions however as inputting a column name (or a different table name) to a function seems to be regarded as an anti-pattern in writing Postgres functions.

I can use both of these functions in a query like:

SELECT symbol, time_of, example.markout_666_example_666_price_table_666_price(symbol, time_of, '3 hours', '24 hours') as markout_price,
                        example.markout_666_example_666_price_table_666_price(symbol, time_of, '25 hours', '24 hours') as markout_price_2,
                        example.markout_666_example_666_price_table_666_volume(symbol, time_of, '3 hours', '24 hours') as markout_volume
from example.interesting_times it; 

This is quite verbose however and we need to write symbol and time_of several times. If we have functions declared for more tables and more functions of these tables the queries can get quite complex. Is it possible to instead write something like:

SELECT symbol, time_of, example.markout('example.price_table', 'price', '3 hours', '24 hours') as markout_price,
                        example.markout('example.price_table', 'price', '25 hours', '24 hours') as markout_price_2,
                        example.markout('example.price_table', 'volume', '3 hours', '24 hours') as markout_volume
from example.interesting_times it; 

where example.markout is a macro/metaprogramming type construct and have this function be evaluated the same as if we used the more vebose syntax? Is there any metaprogramming-like technique that can be used here?

All I can find searching is sql_macro in oracle database and this page on "macro commands" in an out of date version of Postgres which is no longer in the Postgres manual.

3 Answers 3

2

You can use dynamic SQL to meet your needs as below:

CREATE OR REPLACE FUNCTION example.markout(
    _tbl text, 
    _col text, 
    _symbol text, 
    _time_of timestamptz, 
    _start interval, 
    _duration interval
)
RETURNS float8
AS
$func$
DECLARE
    _stmt text;
    _result float8;
BEGIN
    _stmt = FORMAT(
        'SELECT p.%I
         FROM   %I p
         WHERE  p.symbol = %L
         AND    p.time_of >= %L + %L
         AND    p.time_of <= %L + %L + %L
         ORDER  BY p.time_of
         LIMIT  1;', 
         _col, _tbl, _symbol, _time_of, _start, _time_of, _start, _duration
    );

    RAISE NOTICE '%', _stmt;  -- For debugging

    EXECUTE _stmt INTO _result;  -- Fetch the result into a variable

    RETURN _result;  -- Return the fetched result
END;
$func$ LANGUAGE plpgsql STABLE STRICT PARALLEL SAFE;

This can be invoked as:

select  markout('example.price_table', 'price', symbol, time_of, '3 hours', '24 hours') as markout_price,
        markout('example.price_table', 'price', symbol, time_of, '25 hours', '24 hours') as markout_price_2,
        markout('example.price_table', 'volume', symbol, time_of, '3 hours', '24 hours') as markout_volume
from example.interesting_times it;

E&EO

Sign up to request clarification or add additional context in comments.

3 Comments

Good answer. I'd only use dynamic SQL for such functions if it saves a lot of effort. Otherwise, the increased complexity and often worse performance (not to forget the risk of SQL injection makes it less desirable.
This is interesting. Is this likely to be comparable speed to using separate functions?
@Stuart, since it is a small and simple function, it will be negligibly slower. However that said, it simply avoids the large foot-print of managing different function.
1

To make the function work for different tables and different (sets of) columns, you need dynamic SQL. Makes the design more sophisticated. You need to know your PL/pgSQL and beware of SQL injection!

If you are not so sure, and there are just a couple of lookup-tables, rather create one dedicated function per table, returning the super-set of possible columns. Even I would do that.

That said, here is a perfectly safe and optimized function.
There are multiple advanced concepts at work.

CREATE OR REPLACE FUNCTION f_markout(_tbl regclass
                                   , _symbol text
                                   , _time_of timestamptz
                                   , _start interval
                                   , _duration interval
                                   , VARIADIC _cols text[]  -- last IN param!
                                   , OUT _rec record        -- short syntax
                                    )
  LANGUAGE plpgsql STABLE STRICT PARALLEL SAFE AS
$func$
BEGIN
   EXECUTE format(
      $q$
      SELECT %1$s
      FROM   %2$s p
      WHERE  p.symbol = $1
      AND    p.time_of >= $2
      AND    p.time_of <= $3
      ORDER  BY p.time_of
      LIMIT  1;
      $q$
    , (SELECT string_agg(quote_ident(c), ', ') FROM unnest(_cols) c)  -- %1 (quoted as identifiers!)
    , _tbl                                                            -- %2 (auto-quoted!)                                                    
      )
   USING _symbol                        -- $1
       , _time_of + _start              -- $2
       , _time_of + _start + _duration  -- $3
   INTO _rec;
END
$func$;

fiddle

Call:

SELECT *
FROM f_markout('price_table', 'GME', '2016-01-02 00:30+0', '3h', '24h', 'price', 'volume') AS p(p1 float8, v1 float8);  -- !!!

This is one of the rare cases where a function returning anonymous records actually makes sense.
Note how it demands a column definition list in the call. Use any column names, but data types must match!

Your query:

SELECT i.symbol, i.time_of, m1.*, m2.*
FROM   interesting_times i
     , f_markout('price_table' , i.symbol, i.time_of, '3 h', '24 h', 'price', 'volume')     AS m1(price1 float8, volume float8)
     , f_markout('price_table2', i.symbol, i.time_of, '3 h', '24 h', 'price', 'Clown Item') AS m2(price2 float8, "Clown Item" text);

Note how I call the function in the FROM list. The comma is effectively short syntax for CROSS JOIN LATERAL - which is safe for my function. (Wouldn't be safe for a "table-function", which can return 0 rows, thereby killing all results. So we'd use LEFT JOIN instead.) About LATERAL:

This way, each function is called once only. If you'd put the function in the SELECT list and decompose directly, that would result in multiple function calls for multiple result columns. See:

This way we can access each table once per time frame. Doing it multiple times for multiple result columns would also multiply the cost.

You want to be able to pass any number of column names. At the same time we do not want to pass that as concatenated string, which would be wide open to SQL injection. The clean and elegant solution is a VARIADIC parameter. Must be the last one in the list of IN parameters to be unambiguous.

Before concatenating, I make sure each column name is double-quoted where needed, thereby making SQL-injection completely impossible. Column names must be passed case-sensitively! See:

The table name is passed as type regclass. Takes care of proper quoting automatically, and fails immediately for non-existent tables. Also allows to schema-qualify the input or not. See:

I pass values as values to EXECUTE with the USING clause. Makes SQL-injection impossible, and also avoids cost and potential errors from casting input to text, concatenating and casting back in the query.

Plain SQL

As a reminder: plain SQL will still be slightly faster. More verbose, but less error-prone.
The equivalent to above query:

SELECT i.symbol, i.time_of, p1.*, p2.*
FROM   interesting_times i
LEFT   JOIN LATERAL (
   SELECT p.price AS price1, volume AS volume1
   FROM   price_table p
   WHERE  p.symbol = i.symbol
   AND    p.time_of >= i.time_of + interval '3h'
   AND    p.time_of <= i.time_of + interval '27h'
   ORDER  BY p.time_of
   LIMIT  1
   ) p1 ON true
LEFT   JOIN LATERAL (
   SELECT p.price AS price2, p."Clown Item"
   FROM   price_table2 p
   WHERE  p.symbol = i.symbol
   AND    p.time_of >= i.time_of + interval '3h'
   AND    p.time_of <= i.time_of + interval '27h'
   ORDER  BY p.time_of
   LIMIT  1
   ) p2 ON true
ORDER  BY 1, 2;

Note the LEFT JOIN in this case.

fiddle

Comments

1

Just create a single function, and select only the columns you need. Function inlining should work in your case, and then the unneeded columns will be pruned as usual. You just need to change it to RETURNS TABLE and remove STRICT.

CREATE OR REPLACE FUNCTION example.price_table_666(_symbol text, _time_of timestamptz, _start interval, _duration interval)
  RETURNS TABLE
  LANGUAGE sql STABLE PARALLEL SAFE AS  -- !
$func$
SELECT
    p.price,
    p.volume
FROM   example.price_table p
WHERE  p.symbol = _symbol
AND    p.time_of >= _time_of + _start
AND    p.time_of <= _time_of + _start + _duration
ORDER  BY p.time_of
LIMIT  1;
$func$;

You then use this in a FROM or a lateral JOIN, eg

SELECT
  t.*,
  price1.price,
  price1.volume,
  price2.price AS price2
FROM someTable t
CROSS JOIN example.price_table_666(symbol, time_of, '3 hours', '24 hours') AS price1
CROSS JOIN example.price_table_666(symbol, time_of, '25 hours', '24 hours') AS price2;

In a more genralized case of multiple tables, you can use a series of UNION ALLs with conditions, again if the parameter is a constant then the whole union will bre pruned to just the relevant arm.

CREATE OR REPLACE FUNCTION example.price_table_666(_tableName text, _symbol text, _time_of timestamptz, _start interval, _duration interval)
  RETURNS TABLE
  LANGUAGE sql STABLE PARALLEL SAFE AS  -- !
$func$
SELECT p.*
FROM (
    SELECT
        p.price,
        p.volume
    FROM   example.price_table p
    WHERE  p.symbol = _symbol
    AND    p.time_of >= _time_of + _start
    AND    p.time_of <= _time_of + _start + _duration
    AND    _tableName = 'price'
    ORDER  BY p.time_of
    LIMIT  1
) p

UNION ALL
SELECT p.*
FROM (
    SELECT
        p.price,
        p.volume
    FROM   example.price_table2 p
    WHERE  p.symbol = _symbol
    AND    p.time_of >= _time_of + _start
    AND    p.time_of <= _time_of + _start + _duration
    AND    _tableName = 'price2'
    ORDER  BY p.time_of
    LIMIT  1
) p
;
$func$;

2 Comments

Thanks for this. In my actual usecase I have ~50 columns in each of a few different tables but might only want to apply the function to a small number in any one query. Making one function that calculates everything seems like it would be alot slower than using the seperate functions. I was hoping to be able to get the same speed (which intuitively seems possible as I basically just want a syntax change through metaprogramming or maybe a wrapper function)
It won't be slow in the simple SELECT columnsHere case, because the compiler will prune unused columns, and prune impossible branches of logic. It only falls down if you do some complex calculation such as window functions pushed into subqueries etc. I think you are overthinking this, just put all the columns in and be done.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.