How to create a function that loops through another function in PostgreSQL?

Question

I'm using PostgreSQL 9.3.9 and I have a procedure called list_all_upsells that takes in the beginning of a month and the end of a month. (see sqlfiddle.com/#!15/abd02 for sample data) For example, the below code would list the count of upselled accounts for the month of October:

select COUNT(up.*) as "Total Upsell Accounts in October" from 
list_all_upsells('2015-10-01 00:00:00'::timestamp, '2015-10-31 23:59:59'::timestamp) as up
where up.user_id not in
(select distinct user_id from paid_users_no_more 
where concat(extract(month from payment_stop_date),'-',extract(year from payment_stop_date))<>
concat(extract(month from payment_start_date),'-',extract(year from payment_start_date)));

The list_all_upsells procedure looks like this:

DECLARE
payor_email_2 text;
   BEGIN
FOR payor_email_2 in select distinct payor_email from paid_users LOOP
return query
execute
'select paid_users.* from paid_users,
(
select payment_start_date as first_time from paid_users
where payor_email = $3
order by payment_start_date limit 1
) as dummy
where payor_email = $3
and payment_start_date > first_time
and payment_start_date between $1 and $2
and first_time < $1'
using a, b, payor_email_2;
END LOOP;
return;
END

I want to be able to run this for all months that we have records and query the data together in one table like this:

Month   | Total Upselled Accounts
---------------------------------
08/2014 | 23
09/2014 | 35
ETC...
10/2015 | 56

I have a query to grab the first of each month and last of each month for the months we have been in business:

select distinct date_trunc('month', payment_start_date)::date as startmonth
from paid_users ORDER BY startmonth;

Last of month:

SELECT distinct (date_trunc('MONTH', payment_start_date) + 
INTERVAL '1 MONTH - 1 day')::date as endmonth from paid_users 
ORDER BY endmonth;

Now how would I create a function to loop through the list_all_upsells and grab the count for each of these months? I.e. the first query for startmonth gives me 2014-03-01, 2014-04-01, ...to 2015-10-01 whereas the second query for endmonth gives me 2014-03-31, 2014-04-30, ...to 2015-10-31. I want to run the list_all_sells on each of these months so that I can get an aggregate count each month of how many upselled accounts we have

My paid_users table looks like this:

CREATE TABLE paid_users
(
  user_id integer,
  user_email character varying(255),
  payor_id integer,
  payor_email character varying(255),
  payment_start_date timestamp without time zone DEFAULT now()
)

paid_users_no_more:

CREATE TABLE paid_users_no_more
(
  user_id integer,
  payment_stop_date timestamp without time zone DEFAULT now()
)

I am really not good with postgres, but is not it possible to substitute execute with proper joins? — GSazheniuk
– GSazheniuk, Commented Oct 7, 2015 at 21:05
When looking at layers of looping it's almost always way, way faster to convert to using a combined query with subqueries, joins, etc. — Craig Ringer
– Craig Ringer, Commented Oct 8, 2015 at 0:43

Patrick · Accepted Answer · 2015-10-16 03:40:18Z

3

You have a couple of issues with your function, so let's start there. The short of it is that (1) you need only a single parameter to indicate the month, using beginning and ending of the month is setting yourself up for problems; (2) you do not need a dynamic query because you are not changing identifiers (table or column names); (3) you do not need a loop; and (4) your logic is wrong. I could also mention that PostgreSQL uses functions and that they all start with a line like CREATE FUNCTION list_all_upsells(...) but that would be just too picky.

To start with the logic: Apparently a user identified by his email address takes out a subscription from a certain payment_start_date until a certain payment_stop_date and can do this multiple times. You are looking for those users who took out their first subscription before the month in question, and who started a new subscription in the month in question but not a first subscription. In that case the filter payment_start_date > first_time is useless because you already filter for a first subscription being prior to the month in question (first_time < $1) and a new subscription (payment_start_date BETWEEN $1 AND $2).

Points (1), (2) and (3) really only become obvious when rewriting the query inside the function:

CREATE FUNCTION list_all_upsells(timestamp) RETURNS SETOF paid_users AS $$
  SELECT paid_users.*
  FROM paid_users
  JOIN (  -- This JOIN keeps only those rows where the payor_email has a prior subscription
    SELECT DISTINCT payor_email,
           first_value(payment_start_date) OVER (PARTITION BY payor_email ORDER BY payment_start_date) AS dummy
    FROM paid_users
    WHERE payment_start_date < date_trunc('month', $1)
  ) dummy USING (payor_email)
  -- This filter keeps only those rows with new subscriptions in the month
  WHERE date_trunc('month', payment_start_date) = date_trunc('month', $1)
$$ LANGUAGE sql STRICT;

Since the body of the function has reduced to a single SQL statement, the function is now a sql language function, which is more efficient than plpgsql. You now supply only a single parameter, which can be any moment in the month you want the data for, so list_all_upsells(LOCALTIMESTAMP) will give you the results for the current month. In terms of the query you posted it would be:

SELECT count(up.*) AS "Total Upsell Accounts in October"
FROM list_all_upsells(LOCALTIMESTAMP) up
WHERE up.user_id NOT IN 
  (SELECT DISTINCT user_id FROM paid_users_no_more 
   WHERE date_trunc('month', payment_stop_date) <>
         date_trunc('month', up.payment_start_date)
  );

This, incidentally, really begs the question why you have the table paid_users_no_more. Why not simply add a column payment_stop_date to table paid_users? Where that column is NULL the user is still subscribed. But the whole query is rather odd, because list_all_upsells() returns new subscriptions during the month, so why bother with cancelled subscriptions at some other time?

Now on to your real question:

SELECT months.m "Month", coalesce(count(up.*), 0) "Total Upselled Accounts"
FROM generate_series('2014-08-01'::timestamp,
                     date_trunc('month', LOCALTIMESTAMP),
                     '1 month') AS months(m)
LEFT JOIN list_all_upsells(months.m) AS up ON date_trunc('month', payment_start_date) = m
GROUP BY 1
ORDER BY 1;

Generate a series of months from some starting month until the current month, then count the new subscriptions for each month, possibly 0.

SQLFiddle

edited Oct 16, 2015 at 3:40

answered Oct 8, 2015 at 2:16

Patrick

33k7 gold badges73 silver badges102 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

Ashley I. Over a year ago

This is a really great answer and the logic makes perfect sense. I tried running your list_all_upsells create function however and got a syntax error near "SELECT" (5th line) - why is this? @Patrick

Patrick Over a year ago

Oh, that was a nasty error. Nothing to do with the SELECT. I started working from your code and changing the scalar sub-query in the select list of the main query to a regular JOIN: you need to remove the , in the second line, after FROM paid_users. Took me a while to find that finicky littl'un!

Ashley I. Over a year ago

Hey Patrick - should it be CURRENT_DATE instead of CURRENT_TIME? I get "ERROR: function list_all_upsells(time with time zone) does not exist LINE 2: FROM list_all_upsells(current_time) up ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts." when I do current_time. When i try current_date I get 0 tho

Patrick Over a year ago

Ah. CURRENT_TIME gives a timestamp with time zone. Use LOCALTIMESTAMP instead; that gives a normal timestamp. I thought the cast would be automatic. Answer updated.

Ashley I. Over a year ago

Hm I don't know why but I get the result as 0 and when I run the last query I only get the month of October returned

|

Collectives™ on Stack Overflow

How to create a function that loops through another function in PostgreSQL?

1 Answer 1

10 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

10 Comments

Your Answer

Sign up or log in

Post as a guest

Related