Postgresql: Create a date sequence, use it in date range query

Question

I'm not great with SQL but I have been making good progress on a project up to this point. Now I am completely stuck.

I'm trying to get a count for the number of apartments with each status. I want this information for each day so that I can trend it over time. I have data that looks like this:

table: y_unit_status

unit | date_occurred | start_date | end_date   | status
1    | 2017-01-01    | 2017-01-01 | 2017-01-05 | Occupied No Notice
1    | 2017-01-06    | 2017-01-06 | 2017-01-31 | Occupied Notice
1    | 2017-02-01    | 2017-02-01 |            | Vacant
2    | 2017-01-01    | 2017-01-01 |            | Occupied No Notice

And I want to get output that looks like this:

date       | occupied_no_notice | occupied_notice | vacant
2017-01-01 | 2                  | 0               | 0
...
2017-01-10 | 1                  | 1               | 0
...
2017-02-01 | 1                  | 0               | 1

Or, this approach would work:

date       | status             | count
2017-01-01 | occupied no notice | 2
2017-01-01 | occupied notice    | 0

date_occurred: Date when the status of the unit changed start_date: Same as date_occurred end_date: Date when status stopped being x and changed to y.

I am pulling in the number of bedrooms and a property id so the second approach of selecting counts for one status at a time would produce a relatively large number of rows vs. option 1 (if that matters).

I've found a lot of references that have gotten me close to what I'm looking for but I always end up with a sort of rolling, cumulative count.

Here's my query, which produces a column of dates and counts, which accumulate over time rather than reflecting a snapshot of counts for a particular day. You can see my references to another table where I'm pulling in a property id. The table schema is Property -> Unit -> Unit Status.

WITH t AS(
    SELECT i::date from generate_series('2016-06-29', '2017-08-03', '1 day'::interval) i
    )

SELECT t.i as date,
u.hproperty,
count(us.hmy) as count --us.hmy is the id
FROM t
LEFT OUTER JOIN y_unit_status us ON t.i BETWEEN us.dtstart AND 
us.dtend
INNER JOIN y_unit u ON u.hmy = us.hunit -- to get property id
WHERE us.sstatus = 'Occupied No Notice'
AND t.i >= us.dtstart
AND t.i <= us.dtend
AND u.hproperty = '1'
GROUP BY t.i, u.hproperty
ORDER BY t.i
limit 1500

I also tried a FOR loop, iterating over the dates to determine cases where the date was between start and end but my logic wasn't working. Thanks for any insight!

If end_date is NULL in a row (I suppose that's what happens in your example data, when you omit it) does that mean that its status is assumed to be somewhere in the future (but not sure when it will change)? — pozs
– pozs, Commented May 2, 2017 at 14:35
Yes, that was the correct assumption. NULL end_dates are indicative of current status and will be modified when unit status changes. — jmbull
– jmbull, Commented May 2, 2017 at 20:14

pozs · Accepted Answer · 2017-05-02 14:57:15Z

3

You are on the right track, but you'll need to handle NULL values in end_date. If those means that status is assumed to be changed somewhere in the future (but not sure when it will change), the containment operators (@> and <@) for the daterange type are perfect for you (because ranges can be "unbounded"):

with params as (
  select date '2017-01-01' date_from,
         date '2017-02-02' date_to
)
select     date_from + d, status, count(unit)
from       params
cross join generate_series(0, date_to - date_from) d
left join  y_unit_status on daterange(start_date, end_date, '[]') @> date_from + d
group by   1, 2

To achieve the first variant, you can use conditional aggregation:

with params as (
  select date '2017-01-01' date_from,
         date '2017-02-02' date_to
)
select     date_from + d,
           count(unit) filter (where status = 'Occupied No Notice') occupied_no_notice,
           count(unit) filter (where status = 'Occupied Notice') occupied_notice,
           count(unit) filter (where status = 'Vacant') vacant
from       params
cross join generate_series(0, date_to - date_from) d
left join  y_unit_status on daterange(start_date, end_date, '[]') @> date_from + d
group by   1

Notes:

The syntax filter (where <predicate>) is new to 9.4+. Before that, you can use CASE (and the fact that most aggregate functions does not include NULL values) to emulate it.
You can even index the expression daterange(start_date, end_date, '[]') (using gist) for better performance.

http://rextester.com/HWKDE34743

edited May 2, 2017 at 14:57

answered May 2, 2017 at 14:52

pozs

36.6k5 gold badges61 silver badges66 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

jmbull Over a year ago

This is perfect. Quick, thorough response with demo code. Thanks so much!

Collectives™ on Stack Overflow

Postgresql: Create a date sequence, use it in date range query

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related