0

I need help solving a performance problem related to a recursive function in SQL Server. I have a table of tasks for items, each of which have a lead time. My function recursively calls itself to calculate the due date for each task, based on the sum of the preceding tasks (simplistically put..). The function performs slowly at large scale, I believe mainly because must recalculate the due date for each ancestor, for each subsequent task.

So I am wondering, is there a way to store a calculated value that could persist from function call to function call, that would last only the lifetime of the connection? Then my function could 'short-circuit' if it found a pre-calculated value, and avoid re-evaluating for each due date request. The basic schema is as below, with a crude representation of the function in question (This function could also be done with a cte, but the calculations are still repeating the same calculations):

Create Table Projects(id int, DueDate DateTime)
Create Table Items(id int, Parent int, Project int, Offset int)
Create Table Tasks (id int, Parent int, Leadtime Int, Sequence int)

insert into Projects Values
(100,'1/1/2021')

Insert into Items Values
(0,null, 100, 0)
,(1,12, null, 0)
,(2,15, null, 1)

Insert into Tasks Values
 (10,0,1,1)
,(11,0,1,2)
,(12,0,2,3)
,(13,0,1,4)
,(14,1,1,1)
,(15,1,1,2)
,(16,2,2,1)
,(17,2,1,2);

CREATE FUNCTION GetDueDate(@TaskID int)
    Returns DATETIME
    AS BEGIN
    Declare @retval DateTime = null
    Declare @parent int = (Select Parent from Tasks where ID = @TaskID)
    Declare @parentConsumingOp int = (select Parent from Items where ID = @parent)
    Declare @parentOffset int = (select Offset from Items where ID = @parent)
    Declare @seq int = (Select Sequence from Tasks where ID = @TaskID)
    Declare @NextTaskID int = (select ID from Tasks where Parent = @parent and Sequence = @seq-1)
    Declare @Due DateTime = (select DueDate from Projects where ID = (Select Project from Items where ID = (Select Parent from Tasks where ID = @TaskID)))
    Declare @leadTime int = (Select LeadTime from Tasks where ID = @TaskID)
    if @NextTaskID is not null
    BEGIN
        SET @retval = DateAdd(Day,@leadTime * -1,dbo.GetDueDate(@NextTaskID))
    END ELSE IF @parentConsumingOp Is Not Null
    BEGIN
        SET @retval = DateAdd(Day,(@leadTime + @parentOffset)*-1,dbo.GetDueDate(@parentConsumingOp))
    END ELSE SET @retval = DateAdd(Day,@parentOffset*-1,@Due)
    Return @retval
END

EDIT: Sql Fiddle Here

5
  • Why not use a temporary table to hold the values? Temporary tables exist within the session so your code would need to check if it exists and create it if it doesn't (i.e. on the 1st call of the function). If you could call the function multiple times in the same session then it would probably be good practice to explicitly drop it (or delete its contents) at the end of each recursion Commented Dec 7, 2020 at 10:43
  • Provide sample data, desired results, and an explanation of the language you want to implement. You may not need a separate function or recursive CTE to do what you want (whatever that might be). Commented Dec 7, 2020 at 13:01
  • Thanks @GordonLinoff, Sample data provided, and duplicated at the sql fiddle link. Desired results are: a faster calculation of Due dates. Explanation of how i might achieve that is in the original question. All this is shown as well in the Sql Fiddle. Am i missing anything? Commented Dec 7, 2020 at 22:11
  • @NickW, it looks like i am unable to write to any tables (temporary or base) from a function, which is where i am stalling out a bit... Commented Dec 8, 2020 at 8:37
  • Is this a good opportunity for sqlclr? Commented Dec 9, 2020 at 0:45

1 Answer 1

0

Caveat: the following is based on the sample data you've provided rather than trying to work through the logic in your function (i.e. what you are trying to achieve rather than how you have implemented it)...

The result of the function appears to be:

for "this task"

project.due_date - (sum(tasks.leadtime) +1) where tasks.sequence <= sequence of this task and tasks.parent = parent of this task

If this is the case then this function gives the same result as yours but is much simpler:

CREATE FUNCTION GetDueDate1(@TaskID int)
    Returns DATETIME
    AS BEGIN
    Declare @retval DateTime = null
    Declare @parent int = (Select Parent from Tasks where ID = @TaskID)
    Declare @seq int = (Select sequence from Tasks where ID = @TaskID)
    Declare @totlead int = (select Sum(Leadtime) - 1 from Tasks where parent = @parent and sequence <= @Seq)
    Declare @duedate DateTime = (select p.DueDate from tasks t inner join items i on t.parent = i.id inner join projects p on i.Project = p.id where t.id = 13)
    SET @retval = DateAdd(Day,@totlead * -1,@duedate)
    Return @retval
END;

If I run both functions against your data:

select id
,leadtime
, sequence
, [dbo].[GetDueDate](id) "YourFunction"
, [dbo].[GetDueDate1](id) "MyFunction"
from tasks
where parent = 0;

I get the same result:

id  leadtime    sequence    YourFunction            MyFunction
10  1           1           2021-01-01 00:00:00.000 2021-01-01 00:00:00.000
11  1           2           2020-12-31 00:00:00.000 2020-12-31 00:00:00.000
12  2           3           2020-12-29 00:00:00.000 2020-12-29 00:00:00.000
13  1           4           2020-12-28 00:00:00.000 2020-12-28 00:00:00.000

Hope this helps? If it doesn't then please provide some sample data where my function doesn't produce the same result as yours

Update following Comment

Good point, the code above doesn't work for all your data. I've been thinking this problem through and have come up with the following - please feel free to point it out if I have misunderstood anything:

  1. Your function will, obviously, only return the Due Date for the task you have passed in as a parameter. It will also only calculate the due dates of each of the preceding tasks once during this process
  2. Therefore there is no point "saving" the due dates calculated for other tasks as they will only be used once in the calculation of the initial task id (so no performance gain from holding these values as they wont get re-used) and they wont be used if you called the function again - because that's not how functions work: it can't "know" that you may have called the function previously and already calculated the due date for that task id as part of an intermediate step

Re-reading your initial explanation, it appears that you actually want to calculate the due dates for a number of tasks (or all of them?) - not just a single one. If this is the case then I wouldn't (just) use a function (which is inherently limited to 1 task), instead I would write a Stored Procedure that would loop through all your tasks, calculate their due date and save this to a table (either a new table or update your Tasks table).

  1. You would need to ensure that the tasks were processed in an appropriate order so that those used in calculations for subsequent tasks were calculated first
  2. You can re-use the logic in your function (or even call the function from within the SP) but add step(s) that check if the Due Date has already been calculated (i.e. try and select it from the table), use it if it had or calculate it (and save it to the table) if it hadn't
  3. You would need to run this SP whenever relevant data in the tables used in the calculation was amended
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks a lot, @NickW, your code above is correct for the first 4 rows, but doesnt match my results in the last four rows. The missing link here is that each Item may have a parent Task, and in that scenario, the lead time needs to be rolled up recursively, so calculate the Due Date of the Parent Task, then use that as the Base for the Summed lead times for the current item.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.