1

I have an application that we are developing using .NET 4.0 and EF 6.0. Premise of the program is quite simple. Watch a particular folder on the file system. As a new file gets dropped into this folder, look up information about this file in the SQL Server database (using EF), and then based on what is found, move the file to another folder on the file system. Once the file move is complete, go back to the DB and update the information about this file (Register File move).

These are large media files so it might take a while for each of them to move to the target location. Also, we might start this service with hundreds of these media files sitting in the source folder already that will need to be dispatched to the target location(s).

So to speed things up, I started out with using Task parallel library (async/await not available as this is .NET 4.0). For each file in the source folder, I look up info about it in the DB, determine which target folder it needs to move to, and then start a new task that begins to move the file…

LookupFileinfoinDB(filename)
{
  // use EF DB Context to look up file in DB
}

// start a new task to begin the file move
var moveFileTask = Task<bool>.Factory.StartNew(
                () =>
                    {
                        var success = false;

                        try
                        {
                         // the code to actually moves the file goes here…
                         .......
                         }
                      }

Now, once this task completes, I have to go back to the DB and update the info about the file. And that is where I am running into problems. (keep in mind that I might have several of these 'move file tasks'running in parallel and they will finish at different times. Currently, I am using task continuations to register the file move in the DB:

filemoveTask.ContinueWith(
                       t =>
                       {
                           if (t.IsCompleted && t.Result)
                           {
                             RegisterFileMoveinDB();
                           }
                       }

Problem is that I am using the same DB context for looking up the file info in the main task as well as inside the RegistetrFilemoveinDB() method later, that executes on the nested task. I was getting all kinds of weird exceptions thrown at me (mostly about SQL server Data reader etc.) when moving several files together. Online search for the answer revealed that the sharing of DB context among several tasks like I am doing here is a big no no as EF is not thread safe.

I would rather not create a new DB context for each file move as there could be dozens or even hundreds of them going at the same time. What would be a good alternative approach? Is there a way to 'signal' the main task when a nested task completes and finish the File move registration in the main task? Or am I approaching this problem in a wrong way all together and there is a better way to go about this?

5
  • I would just scope separate DbContext object inside each of RegisterFileMoveinDB and LookupFileinfoinDB. Commented Jun 26, 2017 at 20:04
  • You are working with external resources (file system, database) - so async-await can be better for your case then "wasting" threads for IO operations. You can use async-await in .NET 4.0. Using async/await without .NET Framework 4.5 Commented Jun 26, 2017 at 20:41
  • @Fabio - how are threads being wasted? What do you think happens when you call an awaitable xyzAsync(...) method? Commented Jun 26, 2017 at 21:20
  • @Moho, thread which executes IO operation do nothing - only waiting for response. async-await provide possibility execute asynchronous IO operation on one thread. Notice, that I talk about asynchronous IO operations. What happens when you call await xyzAsync(...) depend on how xyzAsync implemented. Commented Jun 27, 2017 at 6:14
  • so, if you await an async IO operation, a thread is not allocated/created per IO - all IO operations are handled by the same thread? Commented Jun 27, 2017 at 6:22

3 Answers 3

6

Your best bet is to scope your DbContext for each thread. Parallel.ForEach has overloads that are useful for this (the overloads with Func<TLocal> initLocal:

Parallel.ForEach( 
    fileNames, // the filenames IEnumerable<string> to be processed
    () => new YourDbContext(), // Func<TLocal> localInit
    ( fileName, parallelLoopState, dbContext ) => // body
    {
        // your logic goes here
        // LookUpFileInfoInDB( dbContext, fileName )
        // MoveFile( ... )
        // RegisterFileMoveInDB( dbContext, ... )

        // pass dbContext along to the next iteration
        return dbContext;
    }
    ( dbContext ) => // Action<TLocal> localFinally
    {
        dbContext.SaveChanges(); // single SaveChanges call for each thread
        dbContext.Dispose();
    } );

You can call SaveChanges() within the body expression/RegisterFileMoveInDB if you prefer to have the DB updated ASAP. I would suggest tying the file system operations in with the DB transaction so that if the DB update fails, the file system operations are rolled back.

Sign up to request clarification or add additional context in comments.

1 Comment

Hi Moho, this scenario seems to be similar to wha I need... coudl you please help me with it? Here is the link: stackoverflow.com/questions/46333707/…
1

You could also pass the ExclusiveScheduler of a ConcurrentExclusiveSchedulerPair instance as a parameter of ContinueWith. This way the continuations will run sequentially instead of concurrently regarding to each other.

TaskScheduler exclusiveScheduler
    = new ConcurrentExclusiveSchedulerPair().ExclusiveScheduler;

//...

filemoveTask.ContinueWith(t => 
{
    if (t.Result)
    {
        RegisterFileMoveinDB();
    }
}, exclusiveScheduler);

Comments

0

According to @Moho question:

  1. Threads in i.e. built-in IO async operations are taken from threadpool of .NET runtime CLR so it's very efficient mechanism. If you create threads by your self you do it in old manner which is inefficient especially for IO operations.

  2. When you call async you don't have to wait immediately. Postpone waiting until it's necessary.

Best Regards.

1 Comment

Are you saying he should not use threads for database write operations? And maybe adopt other improvements mechanism because creating threads that insert in db is costly?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.