-1

I'm struggling to optimize a query which currently uses multiple subqueries. The subqueries run fast individually, but when joined together, are very slow. Below I've outlined a sample set of data and my current query.

Purpose of Query

To display the date different events occurred on an order

Structure of data

There is an Orders table and an Order History table

Orders

Order Part Status
1 A Planned
2 B Released
3 A Closed
4 B Released
5 C Closed

Order History

Timestamp Order Message NewStatus
10:00am 1 Created Planned
10:15am 2 Created Planned
10:16am 2 Status Changed Released
10:20am 3 Created Planned
10:24am 3 Status Changed Released
10:25am 2 Status Changed Planned
10:30am 4 Created Planned
10:35am 3 Status Changed Closed
10:40am 5 Created Planned
10:45am 4 Status Changed Released
10:50am 5 Status Changed Released
10:55am 2 Status Changed Released
11:00am 5 Status Changed Planned
11:05am 5 Status Changed Released
11:15am 5 Status Changed Closed
11:20am 3 Status Changed Released
11:25am 3 Status Changed Closed

Desired Result

Order Part Status Created Date First Release Latest Release Closed Date
1 A Planned 10:00am
2 B Released 10:15am 10:16am 10:55am
3 A Closed 10:20am 10:24am 11:20am 11:25am
4 B Released 10:30am 10:45am 10:45am
5 C Closed 10:40am 10:50am 11:05am 11:15am

Current Query

My current query does a subquery against the history table for each timestamp field (Created, First Release, Latest Release, Closed) and then joins them all up:

SELECT 
c.order
,c.part
,c.status
,c.cDate
,fr.frDate
,lr.lrDate
,cd.cdDate
FROM
(SELECT 
 o.order
 ,o.part
 ,o.status
 ,min(oh.timestamp)
 FROM order o
 JOIN order_history oh
 ON o.order = oh.order
 WHERE oh.message = 'Created' AND oh.NewStatus = 'Planned'
 group by o.order, o.part, o.status) c
JOIN 
(SELECT 
 o.order
 ,o.part
 ,o.status
 ,min(oh.timestamp)
 FROM order o
 JOIN order_history oh
 ON o.order = oh.order
 WHERE oh.message = 'Status Changed' AND oh.NewStatus = 'Released'
 group by o.order, o.part, o.status) fr
ON c.order = fr.order and c.part = fr.part and c.status = fr.status
JOIN 
(SELECT 
 o.order
 ,o.part
 ,o.status
 ,max(oh.timestamp)
 FROM order o
 JOIN order_history oh
 ON o.order = oh.order
 WHERE oh.message = 'Status Changed' AND oh.NewStatus = 'Released'
 group by o.order, o.part, o.status) lr
ON c.order = lr.order and c.part = lr.part and c.status = lr.status
JOIN 
(SELECT 
 o.order
 ,o.part
 ,o.status
 ,max(oh.timestamp)
 FROM order o
 JOIN order_history oh
 ON o.order = oh.order
 WHERE oh.message = 'Status Changed' AND oh.NewStatus = 'Closed'
 group by o.order, o.part, o.status) cd
ON c.order = cd.order and c.part = cd.part and c.status = cd.status

So as you can see... it's ugly! And really slow. What can I do to optimize?

Please keep in mind this is much simplified, fake example. I've tried to show the complexity in the data and query. Apologies if there are mistakes in the query, but I can verify that the current query does work to give me the results I want, so I'm not looking for just corrections, but optimizations.

9

2 Answers 2

3

It's impossible to optimize a query without access to the real query, table and index structures, and execution plan. However simplifying it to remove the subqueries and reduce the number of joins is unlikely to hurt, and might give you a better place to start investigating the plan and look for real optimization opportunities.

It also doesn't help that your 'simplified, fake' version uses illegal identifiers and has various other errors (including needing outer joins to get all five rows back), which you said to ignore but as Mitch commented, those things and the general fakeness make it kind of useless. Still, as a general idea of where to start, perhaps, with your example you could rewrite it using conditional aggregation, for example as:

SELECT 
o.order_num
,o.part
,o.status
,MIN(CASE WHEN oh.message = 'Created' AND oh.NewStatus = 'Planned' THEN oh.timestamp END) AS cDate
,MIN(CASE WHEN oh.message = 'Status Changed' AND oh.NewStatus = 'Released' THEN oh.timestamp END) AS frDate
,MAX(CASE WHEN oh.message = 'Status Changed' AND oh.NewStatus = 'Released' THEN oh.timestamp END) AS lrDate
,MAX(CASE WHEN oh.message = 'Status Changed' AND oh.NewStatus = 'Closed' THEN oh.timestamp END) AS cdDate
FROM orders o
LEFT JOIN order_history oh
ON o.order_num = oh.order_num
GROUP BY o.order_num, o.part, o.status
ORDER BY o.order_num, o.part, o.status
ORDER_NUM PART STATUS CDATE FRDATE LRDATE CDDATE
1 A Planned 10:00am null null null
2 B Released 10:15am 10:16am 10:55am null
3 A Close 10:20am 10:24am 11:20am 11:25am
4 B Released 10:30am 10:45am 10:45am null
5 C Closed 10:40am 10:50am 11:05am 11:15am

fiddle including a working version of your query with legal identifiers (though 'timestamp' still isn't ideal) and other changes, and this modified query.

(This works with the 'timestamp' as strings with your fake values, but would also work with real dates or timestamps, or course. And it doesn't necessarily need to be an outer join if there is always a 'Created' record, which seems likely.)

Sign up to request clarification or add additional context in comments.

Comments

2

You can use MATCH_RECOGNIZE for row-by-row pattern matching:

SELECT m.order_num,
       o.part,
       m.status,
       m.cdate,
       m.frdate,
       m.lrdate,
       m.cddate
FROM   order_history
       MATCH_RECOGNIZE(
         PARTITION BY order_num
         ORDER BY timestamp
         MEASURES
           LAST(newstatus)           AS status,
           FIRST(timestamp)          AS cdate,
           FIRST(released.timestamp) AS frdate,
           LAST(released.timestamp)  AS lrdate,
           LAST(closed.timestamp)    AS cddate
         PATTERN (created (planned|released|closed)*)
         DEFINE
           created  AS message = 'Created',
           planned  AS message = 'Status Changed' AND newstatus = 'Planned',
           released AS message = 'Status Changed' AND newstatus = 'Released',
           closed   AS message = 'Status Changed' AND newstatus = 'Closed'
       ) m
       INNER JOIN orders o
       ON o.order_num = m.order_num

Note: The only thing the orders table is necessary for is the part column and everything else can be generated from the order_history table; so if you don't need that column then you can eliminate the join.

Which, borrowing Alex Poole's sample data, outputs:

ORDER_NUM PART STATUS CDATE FRDATE LRDATE CDDATE
1 A Planned 10:00am null null null
2 B Released 10:15am 10:16am 10:55am null
3 A Closed 10:20am 10:24am 11:20am 11:25am
4 B Released 10:30am 10:45am 10:45am null
5 C Closed 10:40am 10:50am 11:05am 11:15am

fiddle

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.