SQL query challenge

Question

So here's yet another 'write a query to X' challenge.

I'm monitoring a number of networked vending machines. Each machine has a number of parts, e.g. bank note acceptor, coin system, printer and so on.

Problems with machine parts are logged in table, let's call it 'faults', which looks something like this (irrelevant fields omitted):

machineid           partid         start_time            end_time
---------           ------         ----------------      ----------------
       1                2          2009-10-05 09:00      NULL
       1                3          2009-10-05 08:00      2009-10-05 10:00
       2                2          2009-09-30 12:00      2009-09-30 14:00
       3                4          2009-09-28 13:00      2009-09-28 15:00
       3                2          2009-09-28 12:00      2009-09-28 14:00

end_date is NULL if the problem is currently ongoing.

I need a query which show time periods for which the machine as a whole is down, and which can account for overlapping ranges, collapsing them down into a single record. So for the sample data above, it would produce:

machineid          start_time            end_time
---------          ----------------      ----------------
       1           2009-10-05 08:00      NULL
       2           2009-09-30 12:00      2009-09-30 14:00
       3           2009-09-28 12:00      2009-09-28 15:00

It's not tough to write procedural code to do this line by line, but a nice declarative SQL query would be more useful, more elegant. It seems like it ought to be possible, I just can't quite get there though.

SQL dialect is Oracle. Analytic functions are availabe if that would help.

Thanks!

Does it need to show all time periods when a machine is down, or only the last time? — J. Polfer
– J. Polfer, Commented Oct 5, 2009 at 20:58
I might add that all this is a precursor to calculating percent uptime of machines, and MTBFs, that sort of thing. — Res Cogitans
– Res Cogitans, Commented Oct 5, 2009 at 21:18

Vincent Malgrat · Accepted Answer · 2009-10-05 22:42:37Z

8

using analytics, you can build a query that will make a single pass on the data (with a large data set this will be the most efficient):

SELECT machineid, MIN(start_time), MAX(end_time)
  FROM (SELECT machineid, start_time, end_time, 
               SUM(gap) over(PARTITION BY machineid 
                             ORDER BY start_time) contiguous_faults
           FROM (SELECT machineid, start_time, 
                        coalesce(end_time, DATE '9999-12-31') end_time,
                         CASE
                            WHEN start_time > MAX(coalesce(end_time, 
                                                           DATE '9999-12-31'))
                                              over(PARTITION BY machineid 
                                                   ORDER BY start_time 
                                                   ROWS BETWEEN UNBOUNDED PRECEDING
                                                            AND 1 preceding)
                            THEN 1
                         END gap
                    FROM faults))
 GROUP BY machineid, contiguous_faults
 ORDER BY 1, 2

This query starts by determining if a row is contiguous to any row that started before. We then group the rows that are contiguous.

edited Oct 5, 2009 at 22:42

answered Oct 5, 2009 at 22:37

Vincent Malgrat

67.9k9 gold badges122 silver badges176 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Res Cogitans Over a year ago

Genius! Took me a while a grok it, but my admiration grew the whole time. Very elegant! Thanks.

DVK Over a year ago

Looks like it may work - did you test it on the data set that I suggested to najmeddine? (his test set + 1 more row)

DVK Over a year ago

What does "over()" do? (I'm a Sybase person, not Oracle :( )

DVK Over a year ago

If someone could write the same thing as Sybase syntax I'd be incredibly greatful (I sure hope this doen't use.

Vincent Malgrat Over a year ago

@DVK: I tested the query on the data set as you suggested and the additional row is correctly merged into the last segment. The OVER keyword transforms a GROUP function into a windowing function (the function is applied on the window defined after the OVER keyword). Analytics are really powerful, alas I'm not a Sybase person and I wouldn't know if something equivalent can be written in Sybase :>

|

Community · Accepted Answer · 2020-06-20 09:12:55Z

2

Basically, you can not do it (find a covering partition set of a forest) in pure set theory (e.g. as a bounded # of queries without a loop).

To do it in the most set-like way,

Create a temp table for forest partitioning (10 or 11 columns, 4 from failure #1, 4 from failure #2, 1 for partition ID, 1 for round in which node was inserted, and 1 for assorted optimizations I can't think of with a fever of 38C.
Run a loop (BFS or DFS, whatever you find to easier implement the forest partitioning algorithm in). The tricky part, compared to graphs, is that you can have many sub-trees joined from the top to current sub-tree

You can use sheepsimulator's query as basic building block for the loop (e.g. finding 2 connected node)
When the partitioning loop is done, simply do

   select min(p1.start_time), max(p2.end_time), p1.partition,p2.partition
   from partitions p1, partitions p2
   where p1.partition = p2.partition
   group by p1.partition,p2.partition
   

    /* This will need to be tweaked using COALESCE 
       to deal with NULL end times in obvious way) */

I apologize for not spelling the exact code for forest partitioning (it may be filed under tree partitioning) - I'm dead tired and I'm certain some Googling will yield one now that you know the tdata structure and problem name (or you can post this as a more precisely formulated Q on StackOverflow - e.g. "How to implement an algorithm for complete partitioning of a forest of trees as a loop in SQL".

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Oct 5, 2009 at 21:40

DVK

130k33 gold badges219 silver badges337 bronze badges

1 Comment

DVK Over a year ago

P.S. I have a feeling you can optimize this to run in O(logN) and avoid the mins/maxs comuttation to boot by computing transitive edges in loop, but not 100% certain it will work

manji · Accepted Answer · 2009-10-05 22:22:28Z

2

SELECT  DISTINCT 
        t1.machineId, 
        MIN(t2.start_time) start_time, 
        MAX(COALESCE(t2.end_time, '3210/01/01')) end_time
FROM FAULTS t1
JOIN FAULTS t2 ON t1.machineId = t2.machineId
                  AND ((t2.start_time >= t1.start_time
                       AND (t1.end_time IS NULL OR t2.start_time <= t1.end_time)
                  )
                  OR
                  (t1.start_time >= t2.start_time 
                       AND (t2.end_time IS NULL OR t1.start_time <= t2.end_time) 
                  ))
GROUP BY t1.machineId, t1.part_id

I tested this query on the following data:

machine_id   |part_id |start_time           |end_time
-------------------------------------------------------------------------
1           |2       |05 Oct 2009 09:00:00  |NULL
1           |3       |05 Oct 2009 08:00:00  |05 Oct 2009 10:00:00
2           |2       |30 Sep 2009 12:00:00  |30 Sep 2009 14:00:00
2           |3       |30 Sep 2009 15:00:00  |30 Sep 2009 16:00:00
2           |4       |30 Sep 2009 16:00:00  |30 Sep 2009 17:00:00
3           |2       |28 Sep 2009 12:00:00  |28 Sep 2009 14:00:00
3           |4       |28 Sep 2009 13:00:00  |28 Sep 2009 15:00:00

I got this:

machine_id   |start_time             |end_time
-----------------------------------------------------------------
1           |05 Oct 2009 08:00:00   |01 Jan 3210 00:00:00
2           |30 Sep 2009 12:00:00   |30 Sep 2009 14:00:00
2           |30 Sep 2009 15:00:00   |30 Sep 2009 17:00:00
3           |28 Sep 2009 12:00:00   |28 Sep 2009 15:00:00

edited Oct 5, 2009 at 22:22

answered Oct 5, 2009 at 21:37

manji

48k5 gold badges98 silver badges104 bronze badges

4 Comments

DVK Over a year ago

Won't this brak if there were 3 consecutive faults? of or am I mis-reading it?

DVK Over a year ago

Please add |2|5|30 Sep 2009 17:00:00|30 Sep 2009 18:00:00| row and re-run your query. <P> If it produces 4 rows as above (with 3rd row ending at 18:00), your query works 100%. If 5 rows, I am not mis-reading and it does not work

DVK Over a year ago

@najmeddine - any luck? It would be so flipping cool if you aactually can do it in 1 query!

Res Cogitans Over a year ago

Indeed, with the row added the above doesn't do the job. Nice try though!

user180100 · Accepted Answer · 2009-10-05 21:03:10Z

0

SELECT machineid, min(start_time), max(ifnull(end_time, '3000-01-01 00:00'))
FROM faults
GROUP BY machineid

should do the job (replacing ifnull by the equivalent Oracle function if needed).

answered Oct 5, 2009 at 21:03

user180100

3 Comments

Lukasz Lysik Over a year ago

No, it does not if times are not overlapping.

CSharpAtl Over a year ago

not if you have the same machine that went down at 2 different times like 3:00 - 5:00 and then 6:00 - 7:00, you would say it was down from 3:00 - 7:00 which is not true.

J. Polfer Over a year ago

This query would possibly imply incorrect things. It will return a downtime period that goes from the earliest start_time to the latest end_time for all records for a given machineid. This time could include uptimes as well as downtimes!

J. Polfer · Accepted Answer · 2009-10-05 21:10:18Z

0

I wish I had time to give a full answer, but here is a hint to find overlapping downtimes:

select a.machineid, a.start_time, a.end_time, b.start_time, b.end_time
from faults a,
     faults b,
where a.machineid = b.machineid
  and b.start_time >= a.start_time
  and b.start_time <= a.end_time;

answered Oct 5, 2009 at 21:10

J. Polfer

12.6k10 gold badges56 silver badges84 bronze badges

2 Comments

DVK Over a year ago

This will only find 2 breakages caused by 2 parts. Not N

J. Polfer Over a year ago

@DVK - True. As I said in the above answer, I didn't solve the problem completely. But it is part of the solution.

Charles Bretana · Accepted Answer · 2009-10-05 21:52:22Z

0

I believe you would need a stored proc to do this, or something like recursive 'Common Table Expressions (CTEs) (as exists in SQL srever), or otherwise (in a single SQL Statement) you would not be able to get the right answer when 3 or more rows togeher form a contiguous range of covered dates.

like:

 |----------|
           |---------------|
                        |----------------|

Without actually going through the exercise, I might suggest that in the stored proc, build a table of all "candidate dates" and then construct a table that contains all the dates that are NOT covered by a daterange in an existing row, then build your output resultset by "negating" this set.

edited Oct 5, 2009 at 21:52

answered Oct 5, 2009 at 21:46

Charles Bretana

147k24 gold badges158 silver badges219 bronze badges

Comments

Cade Roux · Accepted Answer · 2009-10-05 22:03:07Z

0

See this discussion - with a solution near the bottom: http://www.microsoft.com/communities/newsgroups/en-us/default.aspx?dg=microsoft.public.sqlserver.programming&tid=2bae93da-c70e-4de4-a58b-d8cc0bf8ffd5

answered Oct 5, 2009 at 22:03

Cade Roux

90.1k42 gold badges186 silver badges268 bronze badges

Comments

Erwin Smout · Accepted Answer · 2009-10-05 22:27:46Z

Hehe.

In SIRA_PRISE, which supports interval types, solving this problem would be as easy as

SELECT machineID, period FROM Faults.

IN which 'period' is an attribute of the time interval type whose start and end points are start_time and end_time of your SQL table.

But since you are presumably forced to solve this in SQL, and with a system that does not support interval types, I can only wish you a lot of courage.

Two hints :

The union of two intervals can be handled in SQL using complex CASE constructs (if interval_values_overlap then lowest_start_time highest_end_time, all that sort of stuff).

Since you cannot tell beforehand how many rows will merge into one, you will presumably find yourself forced to write recursive SQL.

Collectives™ on Stack Overflow

SQL query challenge

8 Answers 8

7 Comments

1 Comment

4 Comments

3 Comments

2 Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

8 Answers 8

7 Comments

1 Comment

4 Comments

3 Comments

2 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related