0

Problem i'm trying to solve:

For each directory, it exists some files, and I want to upload this to Azure.

So I want to do this: Task1 - uploading files in directory 1 to azure Task 2 - uploading files in directory 2 to azure

I want to do this concurrently.

I have the following code:

private async Task ProcessMatFiles(string directory, List<FileInfo> matFiles)
{
    foreach (var file in matFiles)
    {
        if (!string.IsNullOrEmpty(file.Name) && !string.IsNullOrEmpty(directory) && !string.IsNullOrEmpty(file.FullName))
        {
            var cloudBlockBlob = this._cloudBlobContainer.GetBlockBlobReference("textures/" + directory + "/" + file.Name);

            if (!await cloudBlockBlob.ExistsAsync())
                await cloudBlockBlob.UploadFromFileAsync(file.FullName);
        }
    }
List<Task> tasks = new List<Task>();
foreach (var directory in matFileDirectories)
{
    // Get all the files in the directory
    var matFiles = new DirectoryInfo(directory).EnumerateFiles().ToList();

    // Get the directory name of the files
    var matDirectory = Path.GetFileName(Path.GetDirectoryName(matFiles.FirstOrDefault().FullName));

    if (matFiles.Count > 0 && !string.IsNullOrEmpty(matDirectory))
    {
        var task = new Task(() =>this.ProcessMatFiles(matDirectory, matFiles));
        tasks.Add(task);
        task.Start();
    }
}

Task.WaitAll(tasks.ToArray());

With this code, i get the following warning:

Because this call is not awaited, execution of the current method continues before the call is completed. Consider applying the 'await' operator to the result of the call.

What does that mean? How does this affect my code?

I can remove the warning by doing like this:

var task = new Task(async () => await this.ProcessMatFiles());

Is this the correct way to this?

10
  • No, don't use Task.Start() and Task.Wait(). That defeats the purpose of tasks. They aren't threads, they are a promise that something will produce a result in the future. Use Task.Run or await Commented Oct 10, 2019 at 12:25
  • @PanagiotisKanavos check my updated question. Im creating many tasks and then wait for them to complete. Commented Oct 10, 2019 at 12:27
  • @PanagiotisKanavos how would you do instead in my case? Commented Oct 10, 2019 at 12:28
  • Why don't you simply use await? Every edit adds extra problems. Task.Wait() won't wait on the async process it will await on the task that starts that async process. The result is still a fire-and-forget call with an extra Task object for the GC to collect. You could have just written ProcessMatFiles Commented Oct 10, 2019 at 12:29
  • @PanagiotisKanavos can you give me an example how you would do it? Commented Oct 10, 2019 at 12:30

1 Answer 1

1

The real problem seems to be how to process multiple files in parallel. ProcessMatFiles returns a Task already and I'd assume it doesn't run anything heavy on the caller's thread. That task can be stored in the tasks list. That list can be awaited without blocking with

await Task.WhenAll(tasks);

A better solution would be to convert the entire loop into a LINQ query that returns the Tasks and await it.

var tasks = from var directory in matFileDirectories
            let dir=new DirectoryInfo(directory)
            let files=dir.GetFiles()
            select ProcessMatFiles(dir.Name, files));

await Task.WhenAll(tasks);

The problem with this is that enumerating the files in a folder is expensive itself, and GetFiles(), or using EnumerateFiles().ToList() has to wait for the enumeration to finish. It would be better if ProcessMatFiles received the DirectoryInfo object and enumerated the files in a separate thread.

Another improvement would be to process the files one-by-one :

var tasks = from var directory in matFileDirectories
            let dir=new DirectoryInfo(directory)
            from file in dir.EnumerateFiles()
            select ProcessMatFile(dir.Name, file));

It's possible to improve this further if one knows what ProcessMatFiles does, eg use Dataflow blocks or Channels for throttling and using a specific number of tasks, breaking the process into multiple concurrent steps etc.

Update

Since this is a file upload operation, each file is a separate asynchronous operation. Most of the checks can be removed when working with DirectoryInfo and FileInfo objects.

The upload method should be just :

async Task Upload(FileInfo file)
{
    var folder=file.Directory.Name;
    var blob = _cloudBlobContainer.GetBlockBlobReference(${"textures/{folder}/{file.Name}";
    if (!await blob.ExistsAsync())
    {
        await blob.UploadFromFileAsync(file.FullName);
    }
}

The task-producing query can be simplified to :

var tasks = from var directory in matFileDirectories
            let dir=new DirectoryInfo(directory)
            from file in dir.EnumerateFiles()
            select UploadFile(file);

await Task.WhenAll(tasks);

This will try to fire off all upload operations as fast as the files can be iterated. This could flood the network. One solution is to use an ActionBlock that will only use eg 8 tasks at a time to upload files. A limit is placed on the input buffer too, to avoid filling it with eg 1000 FileInfo items :

var options=new ExecutionDataflowBlockOptions
      {
         MaxDegreeOfParallelism = 8,  //Only 8 concurrent operations
         BoundedCapacity=64           //Block posters if the input buffer has too many items
      } ;
var block=new ActionBlock<FileInfo>(async file=>UploadFile(file),options);

var files = from var directory in matFileDirectories
            let dir=new DirectoryInfo(directory)
            from file in dir.EnumerateFiles()
            select file;

foreach(var file in files)
{
    //Wait here if the input buffer is full
    await block.SendAsync(file);
}

block.Complete();

//Wait for all uploads to finish
await block.Completion;
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.