-2

With the following table, I need to make a query to replace the null values of “time” (number of minutes, integer) by a value consistent with the preceding and following values, with partitioning by “esz” and sorting by ascending “rank”. In the example, the data is already sorted. Ideally:

  • the value should be as close as possible to the preceding one (+1) when it exists
  • the value should be as close as possible to the one that follows (-1) if none precedes (rank 1)
  • if the difference between the preceding and following values is insufficient (0 or 1, for example), then the value must be equal to the smaller of the two.
  • the value must never be less than the previous one, nor greater than the following one. No "esz" can have only empty "time" values I've tried LAG() and LEAD() but, as the null values can be anywhere, I don't think this is the best option. I guess I'd have to use a recursive query, but I can't do it. Can anyone help me find a solution? Thank you very much
id esz code rank time
1 1 "SOM" 1 5
2 1 "QUI" 2 NULL
3 1 "VER" 3 10
4 1 "NSC" 4 15
5 1 "3SM" 99 NULL
6 2 "QUI" 1 7
7 2 "VER" 2 NULL
8 2 "SOM" 3 NULL
9 2 "NSC" 4 12
10 2 "3SM" 99 NULL
11 3 "NSC" 1 NULL
12 3 "VER" 2 NULL
13 3 "QUI" 3 11
14 3 "SOM" 4 12
15 3 "3SM" 99 NULL
16 4 "SOM" 1 2
17 4 "QUI" 2 NULL
18 4 "NSC" 3 3
19 4 "VER" 4 NULL
20 4 "3SM" 99 NULL
21 5 "NSC" 1 NULL
22 5 "SOM" 2 4
23 5 "VER" 3 NULL
24 5 "QUI" 4 7
25 5 "3SM" 99 NULL

Sample output:

id esz code rank time
1 1 "SOM" 1 5
2 1 "QUI" 2 6
3 1 "VER" 3 10
4 1 "NSC" 4 15
5 1 "3SM" 99 16
6 2 "QUI" 1 7
7 2 "VER" 2 8
8 2 "SOM" 3 9
9 2 "NSC" 4 12
10 2 "3SM" 99 13
11 3 "NSC" 1 9
12 3 "VER" 2 10
13 3 "QUI" 3 11
14 3 "SOM" 4 12
15 3 "3SM" 99 13
16 4 "SOM" 1 2
17 4 "QUI" 2 2
18 4 "NSC" 3 3
19 4 "VER" 4 4
20 4 "3SM" 99 5
21 5 "NSC" 1 3
22 5 "SOM" 2 4
23 5 "VER" 3 5
24 5 "QUI" 4 7
25 5 "3SM" 99 8

I can handle most of the cases in the table with LAG() and LEAD() and CASE. But I still have problems with some of them: id=11 and id=12 for example. Either I have to go backwards from the value of id=13, or I have to know when filling id=11 that I have to leave a value available to fill id=12.

My last query:

    SELECT
      esz,
      code,
      rank,
      CASE
        WHEN time IS NOT NULL THEN time
        ELSE
          CASE
            WHEN rank = 1 THEN 1
            WHEN LAG(time) OVER(PARTITION BY esz ORDER BY rank) IS NOT NULL AND LEAD(time) OVER(PARTITION BY esz ORDER BY rank) IS NULL THEN LAG(time) OVER(PARTITION BY esz ORDER BY rank) + 1
            WHEN LEAD(time) OVER(PARTITION BY esz ORDER BY rank) IS NOT NULL AND LAG(time) OVER(PARTITION BY esz ORDER BY rank) IS NULL THEN LEAD(time) OVER(PARTITION BY esz ORDER BY rank) - 1
            WHEN LAG(time) OVER(PARTITION BY esz ORDER BY rank) IS NOT NULL AND LEAD(time) OVER(PARTITION BY esz ORDER BY rank) IS NOT NULL THEN
              CASE
                WHEN LEAD(time) OVER(PARTITION BY esz ORDER BY rank) - LAG(time) OVER(PARTITION BY esz ORDER BY rank) <= 1 THEN LAG(time) OVER(PARTITION BY esz ORDER BY rank)
                ELSE LAG(time) OVER(PARTITION BY esz ORDER BY rank) +1
              END
            WHEN rank > LEAD(rank) OVER(PARTITION BY esz ORDER BY rank) OR LEAD(rank) OVER(PARTITION BY esz ORDER BY rank) IS NULL THEN 999
          END
      END AS time
      FROM mytable
      ORDER BY esz, rank

Here's the result:

id esz code rank time
1 1 "SOM" 1 5
2 1 "QUI" 2 6
3 1 "VER" 3 10
4 1 "NSC" 4 15
5 1 "3SM" 99 16
6 2 "QUI" 1 7
7 2 "VER" 2 8
8 2 "SOM" 3 <11>
9 2 "NSC" 4 12
10 2 "3SM" 99 13
11 3 "NSC" 1 <1>
12 3 "VER" 2 10
13 3 "QUI" 3 11
14 3 "SOM" 4 12
15 3 "3SM" 99 13
16 4 "SOM" 1 2
17 4 "QUI" 2 2
18 4 "NSC" 3 3
19 4 "VER" 4 4
20 4 "3SM" 99 <999>
21 5 "NSC" 1 <1>
22 5 "SOM" 2 4
23 5 "VER" 3 5
24 5 "QUI" 4 7
25 5 "3SM" 99 8

I've highlighted the values that differ from the expected result. It's not perfect, but there are no inconsistencies. However, there would be a problem if more than 3 consecutive overrides were inserted.

7
  • Upload your sample output .... Don't do too much complecated ... @Nicolas Commented Apr 27, 2024 at 22:52
  • I've added an example solution in my post. Thanks in advance Commented Apr 28, 2024 at 4:29
  • "I need to make a query to replace ..." But where is your query? Stackoverflow is not a (free) code writing service... Commented Apr 28, 2024 at 13:04
  • Your comment seems full of reproaches... I didn't think my code was interesting since it doesn't achieve the objective. But if I have to provide proof of my work in order to get help, I'll add the latest version without any problem. Commented Apr 28, 2024 at 15:22
  • How many consecutive NULL value will come at column time, for condition: over(partition by esz order by ranks). - @Nicolas Commented Apr 28, 2024 at 18:08

4 Answers 4

1

I found a way to resolved this issue correctly.
As there are two NULL scenario in your inputs. So, design query according to that ...

select id, esz,  code, ranks,
  -- lag(times) over(partition by esz order by ranks) as prev_time,
  -- lead(times) over(partition by esz order by ranks) as next_time,
  times,
  CASE
      -- Previous || Current*  || Next :: Scenario
      --    ANY   || NOT NULL* || ANY  
      WHEN(times is not NULL) 
           THEN times
      -- NOT NULL ||   NULL*   || NOT NULL
      WHEN(times is NULL AND 
           (lag(times) over(partition by esz order by ranks)) is not NULL AND 
           (lead(times) over(partition by esz order by ranks)) is not NULL) 
           THEN CASE 
                  WHEN((lag(times) over(partition by esz order by ranks) + 1) <
                        (lead(times) over(partition by esz order by ranks)) - 1)
                     THEN ((lag(times) over(partition by esz order by ranks)) + 1)
                  ELSE
                      ((lead(times) over(partition by esz order by ranks)) - 1)
                END
      --   NOT NULL ||  NULL  ||   NULL*   ||  ANY   (esz = 4 and 2)
      WHEN(times is NULL AND
           (lag(times) over(partition by esz order by ranks)) is NULL AND
           (lag(times, 2) over(partition by esz order by ranks)) is not NULL) 
           THEN CASE
               WHEN((lead(times) over(partition by esz order by ranks)) is NULL)
                  THEN (lag(times, 2) over(partition by esz order by ranks) + 2)
               ELSE CASE
                  WHEN((lag(times, 2) over(partition by esz order by ranks) + 2) <
                       (lead(times) over(partition by esz order by ranks) - 1))
                     THEN (lag(times, 2) over(partition by esz order by ranks) + 2)
                  ELSE (lead(times) over(partition by esz order by ranks) - 1)
               END
            END
       --   ANY  ||   NULL*   ||  NULL || NOT NULL  (esz = 3)
      WHEN(times is NULL AND
           (lead(times) over(partition by esz order by ranks)) is NULL AND
           (lead(times, 2) over(partition by esz order by ranks)) is not NULL) 
           THEN CASE
              WHEN ((lag(times) over(partition by esz order by ranks)) is NULL)
                  THEN (lead(times, 2) over(partition by esz order by ranks) - 2)
              ELSE CASE
                  WHEN((lag(times) over(partition by esz order by ranks) + 1) <
                        (lead(times, 2) over(partition by esz order by ranks) - 2))
                      THEN (lag(times) over(partition by esz order by ranks) + 1)
                  ELSE (lead(times, 2) over(partition by esz order by ranks) - 2)
              END
            END
      --    NOT NULL ||   NULL*   || ANY
      WHEN(times is NULL AND 
           (lag(times) over(partition by esz order by ranks)) is not NULL) 
           THEN ((lag(times) over(partition by esz order by ranks)) + 1)
      --       NULL  ||   NULL*   ||  NOT NULL
      WHEN(times is NULL AND 
           (lag(times) over(partition by esz order by ranks)) is NULL AND 
           (lead(times) over(partition by esz order by ranks)) is not NULL) 
           THEN ((lead(times) over(partition by esz order by ranks)) - 1)
      ELSE -999
  END as time_output
from T1

Run the Query: db<>fiddle

Output with validation:
enter image description here

This solution work for two null scenario top or bottom or both, as follows:

value | null | *(Current Position = NULL) | null | value
value | null | *(Current Position = NULL) | anything
anything | *(Current Position = NULL) | null | value

So, go to db<>fiddle page and there I explain all conditions one by one.
Please add the Case-03 (null-CurrentPos-null) scenario if required.

I think you understand my approach. So, design your query according to that.

Sign up to request clarification or add additional context in comments.

Comments

1

Thank you @art-bindu. You've done what I refused to do, thinking that a simpler solution might exist: look further than the previous line and the next line. I tested your query on a larger set: more than 5 rows per esz and more consecutive null values. As you said, an adaptation was necessary and I managed it without any problem. However, an inconsistency appeared: at one point, two consecutive beats were not increasing (a 9 fed from row r-2 and an 8 fed from row r+2). So I reworked to handle even more cases. Here's the result, using several technical values:

  • an "index": COUNT (time) OVER(PARTITION BY esz ORDER BY rank),
  • for each esz, a "max_time" of the values of each index: MAX(time) OVER(PARTITION BY esz, index)
  • a "next_time" corresponding to LEAD(time_max) OVER(ORDER BY esz, rank) I run through the rows 3 times and if there are any NULLs left on the last run, I assign the "next_time" value. The remaining values may correspond either to values in the middle of a long series of consecutive NULLs, or to values that are framed by a LAG() and a LEAD() whose difference is small.

Here's my query:

WITH
e1 AS (SELECT esz, code, rank, time AS initial_time, time FROM dni.test),
e2_1 AS (SELECT esz, code, rank, initial_time, time, COUNT(time) OVER(PARTITION BY esz ORDER BY rank) AS i FROM e1),
e2_2 AS (SELECT *, CASE WHEN MAX(time) OVER(PARTITION BY esz, i) IS NULL THEN 1 ELSE MAX(time) OVER(PARTITION BY esz, i) END AS max_time FROM e2_1),
e2_3 AS (SELECT *, LEAD(max_time) OVER(ORDER BY esz, rank) AS next_time FROM e2_2),
e2_4 AS (
SELECT esz, code, rank, initial_time,
    CASE
        WHEN time IS NOT NULL THEN time
        ELSE
            CASE
                WHEN i = LAG(i) OVER(ORDER BY esz, rank) AND max_time = LAG(max_time) OVER(ORDER BY esz, rank) THEN
                    CASE
                        WHEN next_time - max_time > 1 THEN next_time -1
                        WHEN next_time - max_time < 0 OR next_time IS NULL THEN max_time +1
                        ELSE time
                    END
            ELSE
            CASE
                WHEN next_time = 1 THEN 1
                ELSE next_time -1
            END
        END
    END AS time,
    i, max_time, next_time
FROM e2_3
),
e3_1 AS (SELECT esz, code, rank, initial_time, time, COUNT(time) OVER(PARTITION BY esz ORDER BY rank) AS i FROM e2_4),
e3_2 AS (SELECT *, CASE WHEN MAX(time) OVER(PARTITION BY esz, i) IS NULL THEN 1 ELSE MAX(time) OVER(PARTITION BY esz, i) END AS max_time FROM e3_1),
e3_3 AS (SELECT *, LEAD(max_time) OVER(ORDER BY esz, rank) AS next_time FROM e3_2),
e3_4 AS (
SELECT esz, code, rank, initial_time,
    CASE
        WHEN time IS NOT NULL THEN time
        ELSE
            CASE
                WHEN i = LAG(i) OVER(ORDER BY esz, rank) AND max_time = LAG(max_time) OVER(ORDER BY esz, rank) THEN
                    CASE
                        WHEN next_time - max_time > 1 THEN next_time -1
                        WHEN next_time - max_time < 0 OR next_time IS NULL THEN max_time +1
                        ELSE time
                    END
            ELSE
            CASE
                WHEN next_time = 1 THEN 1
                ELSE next_time -1
            END
        END
    END AS time,
    i, max_time, next_time
FROM e3_3
),
e4_1 AS (SELECT esz, code, rank, initial_time, time, COUNT(time) OVER(PARTITION BY esz ORDER BY rank) AS i FROM e3_4),
e4_2 AS (SELECT *, CASE WHEN MAX(time) OVER(PARTITION BY esz, i) IS NULL THEN 1 ELSE MAX(time) OVER(PARTITION BY esz, i) END AS max_time FROM e4_1),
e4_3 AS (SELECT *, LEAD(max_time) OVER(ORDER BY esz, rank) AS next_time FROM e4_2),
e4_4 AS (
SELECT esz, code, rank, initial_time,
    CASE
        WHEN time IS NOT NULL THEN time
        ELSE
            CASE
                WHEN i = LAG(i) OVER(ORDER BY esz, rank) AND max_time = LAG(max_time) OVER(ORDER BY esz, rank) THEN
                    CASE
                        WHEN next_time - max_time > 1 THEN next_time -1
                        WHEN next_time - max_time < 0 OR next_time IS NULL THEN max_time +1
                        ELSE next_time
                    END
            ELSE
            CASE
                WHEN next_time = 1 THEN 1
                ELSE next_time -1
            END
        END
    END AS time,
    i, max_time, next_time
FROM e4_3
)
SELECT * FROM e4_4 ORDER BY esz, rank

I suppose you could do better, but, well, it works.

1 Comment

Thank you @Nicolas... I'm happy to helping you... Must post the doubts... It also helps us to learn something
0

You can start with something like this DBFIDDLE (https://dbfiddle.uk/adwfE2_v):

SELECT 
  id,
  esz,
  code,
  rank,
  CASE WHEN NOT TIME is null 
            THEN time
       WHEN LEAD(time) OVER (PARTITION BY esz ORDER BY id) - LAG(time) OVER (PARTITION BY esz ORDER by id) >=2
            OR LEAD(time) OVER (PARTITION BY esz ORDER BY id) is null
            THEN (LAG(time) OVER (ORDER BY id))+1 
       ELSE 9999
  END,
  LAG(time) OVER (PARTITION BY esz ORDER by id) as LAG,
  LEAD(time) OVER (PARTITION BY esz ORDER by id) as LEAD
FROM mytable

And the keep adding WHEN to the CASE statement until you no longer se any 9999 or NULL values.

(After that you get loose the columns lag and lead too 😉, which might only useful in debugging)

output:

id esz code rank case lag lead
1 1 "SOM" 1 5 null null
2 1 "QUI" 2 6 5 10
3 1 "VER" 3 10 null 15
4 1 "NSC" 4 15 10 null
5 1 "3SM" 99 16 15 null
6 2 "QUI" 1 7 null null
7 2 "VER" 2 8 7 null
8 2 "SOM" 3 9999 null 12
9 2 "NSC" 4 12 null null
10 2 "3SM" 99 13 12 null
11 3 "NSC" 1 null null null
12 3 "VER" 2 9999 null 11
... ... ... ... ... ... ...

Comments

0

Try on your full dataset - I added here a 6th group with all time NULL to see what happens:

with data(id, esz, code, rank, time) as (
    select 1, 1, 'SOM', 1, 5 union all
    select 2, 1, 'QUI', 2, NULL union all
    select 3, 1, 'VER', 3, 10 union all
    select 4, 1, 'NSC', 4, 15 union all
    select 5, 1, '3SM', 99, NULL union all
    select 6, 2, 'QUI', 1, 7 union all
    select 7, 2, 'VER', 2, NULL union all
    select 8, 2, 'SOM', 3, NULL union all
    select 9, 2, 'NSC', 4, 12 union all
    select 10, 2, '3SM', 99, NULL union all
    select 11, 3, 'NSC', 1, NULL union all
    select 12, 3, 'VER', 2, NULL union all
    select 13, 3, 'QUI', 3, 11 union all
    select 14, 3, 'SOM', 4, 12 union all
    select 15, 3, '3SM', 99, NULL union all
    select 16, 4, 'SOM', 1, 2 union all
    select 17, 4, 'QUI', 2, NULL union all
    select 18, 4, 'NSC', 3, 3 union all
    select 19, 4, 'VER', 4, NULL union all
    select 20, 4, '3SM', 99, NULL union all
    select 21, 5, 'NSC', 1, NULL union all
    select 22, 5, 'SOM', 2, 4 union all
    select 23, 5, 'VER', 3, NULL union all
    select 24, 5, 'QUI', 4, 7 union all
    select 25, 5, '3SM', 99, NULL union all
    select 26, 6, 'YYY', 50, NULL union
    select 27, 6, 'ZZZ', 99, NULL 
)
select d.id, d.esz, d.code, d.rank, time, --grp, ptime, pdelta, ntime, ndelta,
    coalesce(time, 
        case when ntime-ndelta < ptime+pdelta
        then
            -- neither ptime and ntime are NULL: we take the smallest
            -- but not smaller than latest NOT NULL one
            greatest(ntime-ndelta, ptime)
        else
            -- one of ptime, ntime may be NULL
            -- as latest option we take the ordering in the group of NULL
            coalesce(ptime+pdelta, ntime-ndelta,pdelta)
        end 
    ) as time_output
from (
    select d.*, 
        -- pdelta: delta in the NULL group from previous NOT NULL in increasing order
        row_number() over(partition by esz, grp order by rank) as pdelta,
        -- ndelta: delta in the NULL group from next NOT NULL in decreasing order
        row_number() over(partition by esz, grp order by rank desc) as ndelta
    from (
        select d.*, 
            -- ptime is the latest NOT NULL in the partition in the range before up to to the previous row
            coalesce(time, ltime) as ptime,
            -- attribution of an unique number to each group of consecutive NULL in the partition
            -- that will allow to give an ordering number (pdelta and ndelta) to each NULL inside each subgroup
            row_number() over(partition by esz order by rank) -
                sum(case when time is null then 1 end) over(partition by esz, 
                    case when time is null then 1 end 
                    order by case when time is null then 1 end, rank) as grp,
            -- ntime is the first next NOT NULL in the partition in the range after starting from the next row
            coalesce(time, ftime )
            as ntime
        from ( 
            -- the technique to get around the fact that PostgreSQL  doesn't have
            -- "ignore nulls" option in first_value() and last_value()
            select id, esz, code, rank, time,
              coalesce(time, max(time) over (partition by esz, pgrp)) as ltime,
              coalesce(time, max(time) over (partition by esz, ngrp)) as ftime
            from 
            (
              select id, esz, code, rank, time,
                count(time) over(partition by esz order by rank) as pgrp,
                count(time) over(partition by esz order by rank desc) as ngrp
              from data 
            )
        ) d
    ) d
) d
order by esz, rank ;

https://dbfiddle.uk/8Z48_mg4 to compare with ORACLE version, supporting first_value/last_value with "ignore nulls" https://dbfiddle.uk/ycoe6qPj

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.