0

I have time series data for the stock exchange which is only open between 9am and 4pm. I wish to disregard any rows that fall outside these bounds.

To know if I need to make API calls I am comparing a generate_series for a given time period with the time series data I have stored in my db.

I am doing a LEFT JOIN against the generate_series results and my av_intraday tables.

This is the query...

SELECT *
FROM generate_series(date_trunc('day', localtimestamp - interval '1 day')
                    , localtimestamp
                    , interval '15 min') g(start_time)
LEFT JOIN av_intraday av ON av.date = g.start_time
                    AND extract(hour from g.start_time) >= 9 
                    AND extract(hour from g.start_time) <= 16
ORDER BY g.start_time;

This is the resulting data...

av_intraday

     start_time      | symbol |        date         |   open    |   high    |    low    |   close   | volume  
---------------------+--------+---------------------+-----------+-----------+-----------+-----------+---------

...
 2019-07-30 08:45:00 |        |                     |           |           |           |           |        
 2019-07-30 09:00:00 |        |                     |           |           |           |           |        
 2019-07-30 09:15:00 |        |                     |           |           |           |           |        
 2019-07-30 09:30:00 |        |                     |           |           |           |           |        
 2019-07-30 09:45:00 | fb     | 2019-07-30 09:45:00 |    194.95 |    195.83 |    194.54 |    195.58 | 2004674
 2019-07-30 09:45:00 | amzn   | 2019-07-30 09:45:00 |  1891.115 |    1905.3 |   1883.48 |   1904.28 |  564821
 2019-07-30 09:45:00 | goog   | 2019-07-30 09:45:00 |   1225.41 |   1234.87 | 1223.4301 | 1232.0699 |  203333
 2019-07-30 09:45:00 | tsla   | 2019-07-30 09:45:00 |       234 |    235.54 |    233.72 |    235.07 |  207546
 2019-07-30 09:45:00 | aapl   | 2019-07-30 09:45:00 |    208.88 |    208.93 |   208.335 |  208.6148 |  602413
...
 2019-07-30 16:00:00 | amzn   | 2019-07-30 16:00:00 |   1897.42 |    1899.9 |   1896.61 |   1898.33 |  152442
 2019-07-30 16:00:00 | goog   | 2019-07-30 16:00:00 |   1225.09 |   1226.34 |    1223.3 |    1223.4 |  169110
 2019-07-30 16:00:00 | tsla   | 2019-07-30 16:00:00 |    241.94 |     242.3 |    241.93 |     242.2 |  151572
 2019-07-30 16:00:00 | aapl   | 2019-07-30 16:00:00 |    208.94 |    208.94 |  208.3436 |   208.685 | 1096338
 2019-07-30 16:15:00 |        |                     |           |           |           |           |        
 2019-07-30 16:30:00 |        |                     |           |           |           |           |        
 2019-07-30 16:45:00 |        |                     |           |           |           |           |        
 2019-07-30 17:00:00 |        |                     |           |           |           |           |        
 2019-07-30 17:15:00 |        |                     |           |           |           |           |        
 2019-07-30 17:30:00 |        |                     |           |           |           |           |        
...

Data should look like the following...

av_intraday

     start_time      | symbol |        date         |   open    |   high    |    low    |   close   | volume  
---------------------+--------+---------------------+-----------+-----------+-----------+-----------+---------

 2019-07-30 09:00:00 |        |                     |           |           |           |           |        
 2019-07-30 09:15:00 |        |                     |           |           |           |           |        
 2019-07-30 09:30:00 |        |                     |           |           |           |           |        
 2019-07-30 09:45:00 | fb     | 2019-07-30 09:45:00 |    194.95 |    195.83 |    194.54 |    195.58 | 2004674
 2019-07-30 09:45:00 | amzn   | 2019-07-30 09:45:00 |  1891.115 |    1905.3 |   1883.48 |   1904.28 |  564821
 2019-07-30 09:45:00 | goog   | 2019-07-30 09:45:00 |   1225.41 |   1234.87 | 1223.4301 | 1232.0699 |  203333
 2019-07-30 09:45:00 | tsla   | 2019-07-30 09:45:00 |       234 |    235.54 |    233.72 |    235.07 |  207546
 2019-07-30 09:45:00 | aapl   | 2019-07-30 09:45:00 |    208.88 |    208.93 |   208.335 |  208.6148 |  602413
...
 2019-07-30 16:00:00 | amzn   | 2019-07-30 16:00:00 |   1897.42 |    1899.9 |   1896.61 |   1898.33 |  152442
 2019-07-30 16:00:00 | goog   | 2019-07-30 16:00:00 |   1225.09 |   1226.34 |    1223.3 |    1223.4 |  169110
 2019-07-30 16:00:00 | tsla   | 2019-07-30 16:00:00 |    241.94 |     242.3 |    241.93 |     242.2 |  151572
 2019-07-30 16:00:00 | aapl   | 2019-07-30 16:00:00 |    208.94 |    208.94 |  208.3436 |   208.685 | 1096338
 2019-07-30 16:15:00 |        |                     |           |           |           |           |        
 2019-07-30 16:30:00 |        |                     |           |           |           |           |        
 2019-07-30 16:45:00 |        |                     |           |           |           |           |        
...

Got the desired result with the following...

SELECT *
FROM (
    SELECT g.start_time as g_start_time, av.symbol, av.open, av.high, av.low, av.close, av.volume
    FROM generate_series(date_trunc('day', localtimestamp - interval '1 day')
                        , localtimestamp
                        , interval '15 min') g(start_time)
    LEFT JOIN av_intraday av ON av.date = g.start_time
    ORDER BY g.start_time
) t
WHERE extract(hour from g_start_time) >= 9
AND extract(hour from g_start_time) <= 16;

But I'm not sure why it wouldn't work as it was. Can someone explain? Thank you.

2
  • Could you fill in what the columns are in your result please? Particularly which table they're from. Commented Jul 31, 2019 at 4:15
  • @Schwern I've added the column names and table names for reference. Commented Jul 31, 2019 at 4:23

1 Answer 1

1
LEFT JOIN av_intraday av ON av.date = g.start_time
                    AND extract(hour from g.start_time) >= 9 
                    AND extract(hour from g.start_time) <= 16

This says to match each row of the series against rows of av_intraday where the date matches the start_time and where the start_time's hour is between 9 and 16 which effectively limits the date to those hours.

But this only effects the join. A left join will return all the rows of the left table; that is the from table, your series. Any row on the left without a match on the right will simply use null for the right columns. See the quite excellent Visual Representation of SQL Joins for more.

Instead you need to move your restrictions on the series into a where clause. There it will affect the entire query.

You also don't want to subtract a day from the localtime to start the series else it will start from the day before.

select *
from generate_series(
  date_trunc('day', localtimestamp),
  localtimestamp,
  interval '15 min'
) g(start_time)
left join av_intraday av
  on av.date = g.start_time
where extract(hour from g.start_time) between 9 and 16
order by g.start_time;
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.