4

I'm using PostgreSQL 9.3.9 and I have a procedure called list_all_upsells that takes in the beginning of a month and the end of a month. (see sqlfiddle.com/#!15/abd02 for sample data) For example, the below code would list the count of upselled accounts for the month of October:

select COUNT(up.*) as "Total Upsell Accounts in October" from 
list_all_upsells('2015-10-01 00:00:00'::timestamp, '2015-10-31 23:59:59'::timestamp) as up
where up.user_id not in
(select distinct user_id from paid_users_no_more 
where concat(extract(month from payment_stop_date),'-',extract(year from payment_stop_date))<>
concat(extract(month from payment_start_date),'-',extract(year from payment_start_date)));

The list_all_upsells procedure looks like this:

DECLARE
payor_email_2 text;
   BEGIN
FOR payor_email_2 in select distinct payor_email from paid_users LOOP
return query
execute
'select paid_users.* from paid_users,
(
select payment_start_date as first_time from paid_users
where payor_email = $3
order by payment_start_date limit 1
) as dummy
where payor_email = $3
and payment_start_date > first_time
and payment_start_date between $1 and $2
and first_time < $1'
using a, b, payor_email_2;
END LOOP;
return;
END

I want to be able to run this for all months that we have records and query the data together in one table like this:

Month   | Total Upselled Accounts
---------------------------------
08/2014 | 23
09/2014 | 35
ETC...
10/2015 | 56

I have a query to grab the first of each month and last of each month for the months we have been in business:

select distinct date_trunc('month', payment_start_date)::date as startmonth
from paid_users ORDER BY startmonth;

Last of month:

SELECT distinct (date_trunc('MONTH', payment_start_date) + 
INTERVAL '1 MONTH - 1 day')::date as endmonth from paid_users 
ORDER BY endmonth;

Now how would I create a function to loop through the list_all_upsells and grab the count for each of these months? I.e. the first query for startmonth gives me 2014-03-01, 2014-04-01, ...to 2015-10-01 whereas the second query for endmonth gives me 2014-03-31, 2014-04-30, ...to 2015-10-31. I want to run the list_all_sells on each of these months so that I can get an aggregate count each month of how many upselled accounts we have

My paid_users table looks like this:

CREATE TABLE paid_users
(
  user_id integer,
  user_email character varying(255),
  payor_id integer,
  payor_email character varying(255),
  payment_start_date timestamp without time zone DEFAULT now()
)

paid_users_no_more:

CREATE TABLE paid_users_no_more
(
  user_id integer,
  payment_stop_date timestamp without time zone DEFAULT now()
)
3
  • I am really not good with postgres, but is not it possible to substitute execute with proper joins? Commented Oct 7, 2015 at 21:05
  • Hey @GSazheniuk I have no idea :s Commented Oct 7, 2015 at 21:32
  • When looking at layers of looping it's almost always way, way faster to convert to using a combined query with subqueries, joins, etc. Commented Oct 8, 2015 at 0:43

1 Answer 1

3

You have a couple of issues with your function, so let's start there. The short of it is that (1) you need only a single parameter to indicate the month, using beginning and ending of the month is setting yourself up for problems; (2) you do not need a dynamic query because you are not changing identifiers (table or column names); (3) you do not need a loop; and (4) your logic is wrong. I could also mention that PostgreSQL uses functions and that they all start with a line like CREATE FUNCTION list_all_upsells(...) but that would be just too picky.

To start with the logic: Apparently a user identified by his email address takes out a subscription from a certain payment_start_date until a certain payment_stop_date and can do this multiple times. You are looking for those users who took out their first subscription before the month in question, and who started a new subscription in the month in question but not a first subscription. In that case the filter payment_start_date > first_time is useless because you already filter for a first subscription being prior to the month in question (first_time < $1) and a new subscription (payment_start_date BETWEEN $1 AND $2).

Points (1), (2) and (3) really only become obvious when rewriting the query inside the function:

CREATE FUNCTION list_all_upsells(timestamp) RETURNS SETOF paid_users AS $$
  SELECT paid_users.*
  FROM paid_users
  JOIN (  -- This JOIN keeps only those rows where the payor_email has a prior subscription
    SELECT DISTINCT payor_email,
           first_value(payment_start_date) OVER (PARTITION BY payor_email ORDER BY payment_start_date) AS dummy
    FROM paid_users
    WHERE payment_start_date < date_trunc('month', $1)
  ) dummy USING (payor_email)
  -- This filter keeps only those rows with new subscriptions in the month
  WHERE date_trunc('month', payment_start_date) = date_trunc('month', $1)
$$ LANGUAGE sql STRICT;

Since the body of the function has reduced to a single SQL statement, the function is now a sql language function, which is more efficient than plpgsql. You now supply only a single parameter, which can be any moment in the month you want the data for, so list_all_upsells(LOCALTIMESTAMP) will give you the results for the current month. In terms of the query you posted it would be:

SELECT count(up.*) AS "Total Upsell Accounts in October"
FROM list_all_upsells(LOCALTIMESTAMP) up
WHERE up.user_id NOT IN 
  (SELECT DISTINCT user_id FROM paid_users_no_more 
   WHERE date_trunc('month', payment_stop_date) <>
         date_trunc('month', up.payment_start_date)
  );

This, incidentally, really begs the question why you have the table paid_users_no_more. Why not simply add a column payment_stop_date to table paid_users? Where that column is NULL the user is still subscribed. But the whole query is rather odd, because list_all_upsells() returns new subscriptions during the month, so why bother with cancelled subscriptions at some other time?

Now on to your real question:

SELECT months.m "Month", coalesce(count(up.*), 0) "Total Upselled Accounts"
FROM generate_series('2014-08-01'::timestamp,
                     date_trunc('month', LOCALTIMESTAMP),
                     '1 month') AS months(m)
LEFT JOIN list_all_upsells(months.m) AS up ON date_trunc('month', payment_start_date) = m
GROUP BY 1
ORDER BY 1;

Generate a series of months from some starting month until the current month, then count the new subscriptions for each month, possibly 0.

SQLFiddle

Sign up to request clarification or add additional context in comments.

10 Comments

This is a really great answer and the logic makes perfect sense. I tried running your list_all_upsells create function however and got a syntax error near "SELECT" (5th line) - why is this? @Patrick
Oh, that was a nasty error. Nothing to do with the SELECT. I started working from your code and changing the scalar sub-query in the select list of the main query to a regular JOIN: you need to remove the , in the second line, after FROM paid_users. Took me a while to find that finicky littl'un!
Hey Patrick - should it be CURRENT_DATE instead of CURRENT_TIME? I get "ERROR: function list_all_upsells(time with time zone) does not exist LINE 2: FROM list_all_upsells(current_time) up ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts." when I do current_time. When i try current_date I get 0 tho
Ah. CURRENT_TIME gives a timestamp with time zone. Use LOCALTIMESTAMP instead; that gives a normal timestamp. I thought the cast would be automatic. Answer updated.
Hm I don't know why but I get the result as 0 and when I run the last query I only get the month of October returned
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.