1

I have a large DataTable that contains Users details. I need to complete user's details into this table from several tables in DB. I run though each row in the table and make several calls to different tables in the database, using ADO.NET objects and methods, process and reorganize the results and them to the main table. It's works fine, but to slow... My idea was to split the large table into few small tables and run the CompleteAddressDetails method in a few threads simultaneously and in the end to merge small tables into one result table. I have implemented this idea using Task object of TPL. There is a code below. It works fine, but without any improvement of execution time. Several questions: 1. Why there no any improvement of execution time? 2. What I have to do in order to improve it?

Thank you for any advice!

        resultTable1 = data.Clone();
        resultTable2 = data.Clone();
        resultTable3 = data.Clone();
        resultTable4 = data.Clone();
        resultTable5 = data.Clone();

        DataTable[] tables = new DataTable[] { resultTable1, resultTable2, resultTable3, resultTable4, resultTable5 };

        for (int i = 0; i < data.Rows.Count; i += 5)
        {
            for (int j = 0; j < 5; j++)
            {
                if (data.Rows.Count > i + j)
                {
                    tables[j].Rows.Add(data.Rows[i + j].ItemArray);
                }
            }

        }



Task[] taskArray = {Task.Factory.StartNew(() =>CompleteAddressDetails(resultTable1)),
                               Task.Factory.StartNew(() =>CompleteAddressDetails(resultTable2)),
                               Task.Factory.StartNew(() =>CompleteAddressDetails(resultTable3)),
                               Task.Factory.StartNew(() =>CompleteAddressDetails(resultTable4)),
                               Task.Factory.StartNew(() =>CompleteAddressDetails(resultTable5))};

            Task.WaitAll(taskArray);
1
  • Running the methods with Task.Factory.StartNew is not the best way forward here. It would be better to rewrite the CompleteAddressDetails method to be properly asynchronous (async/await). Given the level of detail you provide, it's not possible to say where you're bottlenecking. How fast do the queries execute when you run them directly into the database? Are you sure you've got everything covered with respect to indexes on the tables that your query would benefit from? Commented Oct 29, 2015 at 10:58

2 Answers 2

2

When using multi-threaded parallelism without any performance benefit, there's basically two possibilities:

  1. The code isn't CPU-bound, so throwing more CPUs on the task isn't going to help
  2. The code uses too much synchronization to actually allow realistic parallel execution

In this case, 1 is likely the cause. Your code isn't doing enough CPU work to benefit from multi-threading. Most likely, you're simply waiting for the database to do the work.

It's hard to give any pointers without seeing what the CompleteAddressDetails method does - I assume it goes through all the rows one by one, and executes a couple of separate queries to fill in the details. Even if each individual query is fast enough, doing thousands of separate queries is going to hurt your performance no matter what you do - and especially so if those queries require locking some shared state in the DB.

First, think of a better way to fill in the details. Perhaps you can join some of those queries together, or maybe you can even load all of the rows at once. Second, try profiling the actual queries as they happen on the server. Find out if there's something you can do to improve their performance - say, by adding some indices, or by better using the existing ones.

Sign up to request clarification or add additional context in comments.

2 Comments

You are totally right, I run through all the rows one by one and process 3-5 very simple calls to DB, depending on the results. How do you think, if I'll load the data from those tables into a DataTable objects first, and then query them using linq in separated threads, can this way help to improve times?
@LevZ There is no reason for multi-threading on the client side at all - all the work is done on the network and in the database engine. Just try to find a way to reduce the number of queries - for a typical operation, this should be only a few queries (we're talking single digits) per operation. For example, you could pass multiple IDs to your queries at once (e.g. using a table valued parameter) and then update the relevant rows in the data tables manually. But there really isn't much specific I can say, just the general pointers.
0

There is no improvement because you can't code your way around how the sql server database handles your calls.

I would recommend using a User-Defined Table Type on SQL Server, a Stored Procedure that accepts this table type, and then just send the DataTable you have through to the Stored Procedure and do your processing in there. You'd then be able to optimize from there going forward.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.