1

(C#) I have an Http Starter function which triggers an Orchestration function containing five asynchronous Activity functions within a While loop.

Pseudo-structure of ideal functional layout which I'm trying to achieve:

static long latestLogEventIdx = new long;
static long batchIncrement = 1000;

[FunctionName("MainOrchestrator")]
public static async Task<List<string>> RunMainOrchestrator([context])
        
await context.CallActivityAsync<string>("ActivityFunc1", data)

while (latestLogEventIdx  < maxIndex) 
{
    try
    {
        await context.CallActivityAsync<string>("ActivityFunc2", data)
        await context.CallActivityAsync<string>("ActivityFunc3", data)
        await context.CallActivityAsync<string>("ActivityFunc4", data)
        await context.CallActivityAsync<string>("ActivityFunc5", data)
    }
    catch
    {
        err
    }
    
}
        
latestLogEventIdx += batchIncrement
        
return outputs
        
}

(The only difference from the above code to my actual code is that my actual code has all activity functions within the while loop, but ideally the first would only execute once. I'm just not quite sure how to achieve that).

The Activity functions make queries to several SQL databases and process the response data. The script is long-running and consequently needs to process the data in batches between record indexes to avoid timing out and other issues, which is what I thought to use my While loop for - to iterate over the Activity functions while within a specified range of indices returned from my first Activity function.

I created a global variable to keep track of the latest index logged to a SQL database (the first activity function gets and sets this index). Then I query another database for a range of data between the latest index logged + batchIncrement which I store in a global list (second activity function).

Then I process this data in the third Activity function using the data I stored in my global var from the previous activity function. Finally, I save it to another DB in the last activity function.

Then I shift the latestLogEventIdx forward by the batchIncrement and start over, ideally from the second Activity function (if somehow I can skip the first Activity function).

This is causing some issues beyond the first pass which now seems logical based on my little understanding of Durable functions and the execution order of Orchestration functions which I understand (minimally) from this MS Visual Studio tutorial on YouTube.

It seems, based on the log.LogInformation output in my terminal, as if the Orchestrator is being called on top of itself after the first full execution; executing Activity functions on top of one another and in random orders, as the output prints several consecutive executions of each function in seemingly random orders only becoming more sporadic. These errors occur after the first full pass of the Orchestrator function.

Any help on how to achieve a working approach of the semi-functional pseudo-code above would be much appreciated!

1 Answer 1

3

The big issue I think is the use of global/static values/list that you are using. Durable orchestrators should refrain from using data that is defined outside the scope of the function except constants. This is because these values may be lost or changed from the outside. Looking at this link you can find static variables as something to avoid.

The way durable functions execute is that they start to run until they hit a activity function. Then they will stop executing this may cause the entire function app to load out of memory to save on cost (which will reset any static values) Then the activity function will run (either from the same process or a different). Any static values may or may not be the same here. This will then run and when it has succeded it will put any returned values into a azure storage table.

The orchestrator can now continue but there is no way to start executing in the middle of the code so we restart the orchestrator from the beginning. This is why you will see the log multiple times. To avoid this instead use the logger returned from log = context.CreateReplaySafeLogger(log); (Logging durable functions). When the durable function reaches the activity function instead of executing again it will look in the table storage and see that it has already executed and return (if any) value from the context.CallActivityAsync method. It can then continue until the next activity. When that finishes it will restart but now run past the first 2 activities until the 3rd activity etc.

[FunctionName("MainOrchestrator")]
public static async Task<List<string>> RunMainOrchestrator([context])
log = context.CreateReplaySafeLogger(log);
int maxIndexToProcess = await context.CallActivityAsync<string>("ActivityFunc1", null)
for (int i = 0; i < maxIndexToProcess; i += 1000) // increment with batchsize
{
    try
    {
        var data = await context.CallActivityAsync<string>("ActivityFunc2", null)
        var processedData = await context.CallActivityAsync<string>("ActivityFunc3", data)
        bool success = await context.CallActivityAsync<string>("ActivityFunc4", processedData)
    }
    catch
    {
        // Log error with replaysafe logger
    }
}      
}

I chose a for loop but while would work aswell. Hope this helps you move forward and understand the inner workings of durable functions better. Please ask a question if something needs to be clarified.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks so much for the reply, Bjorne! (Er du Dansk?) While I learned a great deal and will take a lot from your post moving forward, including the removal of global statics for good practice, the issue I was having turned out to be a result of data leftover in my local Azure Storage Emulator. When I open that Storage Emulator UI and run the command .\AzureStorageEmulator.exe clear all I no longer face these sporadic logging and strange execution order issues related to my Function App output. Thank you again for taking the time to explain these things for me.
Yes the weird execution when debugging can definitely be a result of uncleared old executions that will continue when you restart the debugger. Which will affect the static variables and log. This is another reason for not using static so that you may have multiple orchestrators of the same type running without impacting the work of each other. Even if clearing the emulator works when debugging it may still execute strangely when deployed since then you can't guarantee everything to run in the same process. Btw I'm from Sweden :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.