2

I have a SQL Server SP and would like to execute that for around 10 million user record sets to process the user info and update DB. Currently its taking few millisecond for 1 user to update in DB. Since, it has to process million records what are the best ways to do it?

I am thinking do it from the C# application using multithreaded or parallel foreach. Below is the basic code to run the SP, how can I use it to run it in multithreaded mode to do the processing job faster?

 string connString = "data source=.\SQLEXPRESS;initial catalog=Test;integrated security=True;";
            SqlConnection conn = new SqlConnection(connString);
            conn.Open();
            SqlCommand cmd = new SqlCommand("ETL_TEST", conn);
            cmd.CommandType = CommandType.StoredProcedure;

            cmd.Parameters.Add(new SqlParameter("@user_id", '12345'));
            cmd.ExecuteReader();
            conn.Close();
4
  • multithreading this will end up being a LOT slower. You would effectively have to change this to be row by row. Or is this process already doing this row by row? Commented Feb 17, 2016 at 21:58
  • there is one more SP written to do row by row operation and it takes lot of time. Commented Feb 17, 2016 at 22:00
  • @SeanLange: No it won't be a lot slower. Up to a point, multiple concurrent requests to SQL Server speed up overall throughput. Commented Feb 17, 2016 at 22:00
  • What user information i.e. columns,datatype are you trying to update? Also where is the update information coming from? Commented Feb 18, 2016 at 0:36

3 Answers 3

3

how can I use it to run it in multithreaded mode to do the processing job faster?

Parallel.ForEach() is a very reasonable starting point to accomplish your goal. Up to a point, sending concurrent requests to SQL Server can improve overall throughput.

Put your code into the action of Parallel.ForEach(). Use using statements to ensure that things are properly disposed (right now you will leave a connection open if an exception is thrown before it is closed).

Note that the .NET Framework will attempt to find an optimal number of threads (and different versions of the framework may end up selecting different numbers). It is possible that .NET will select a number of threads that is either

  • Too low (which you cannot influence, but you could solve by creating and managing tasks explicitly)
  • Too high, resulting in inefficient execution in SQL Server (due to IO or lock contention, high CPU, etc). In this case, you can use the MaxDegreeOfParallelism property to limit the number of threads. Don't forget to adjust the property if you move to a more or less powerful SQL Server.

Note that a multi-threaded solution, while probably more efficient than a single-threaded one, is still very chatty. It makes a request to the database for each user record. In this type of situation, I have achieved order-of-magnitude performance improvements by sending a batch of work over to SQL Server rather than a single work item. You'll likely see the greatest gains by redesigning the stored procedure to accept a number of users at once.

Sign up to request clarification or add additional context in comments.

8 Comments

It is very important to stress that the TPL will generally be completely off from the optimal number of threads for IO work. The number of threads will (literally) trend to infinity over time. This is easy to reproduce with a sleep(100). +1 because you said that but I wanted to stress that even more.
@usr: In my experience TPL does a reasonable but not perfect job of selecting the number of threads. In doing the work related to the linked question, I did have to specifically limit the degree of parallelism. Can you cite an example for when TPL will tend toward a very large number of threads? Would be curious to understand that situation.
github.com/dotnet/coreclr/issues/1754 Really bad. Unlimited thread creation in all realistic IO scenarios.
That link complains that core is too slow in creating new threads when the workload profile changes drastically. It doesn't mention that an arbitrary number of threads would ever be created.
Quoting: The memory use is proportional to the number of threads (here: 731).. Both problems are present. Slow increase but infinite increase. That picture is scary! Linear growth of thread count over time. Extrapolate that over night. This is the kind of problem that takes production systems down.
|
0

Since, it has to process million records what are the best ways to do it?

If you are asking what is the best way to make the process parallel, then @Eric's answer looks promising. I would agree that generally speaking, "Up to a point, sending concurrent requests to SQL Server can improve overall throughput.", as long as we really stress the "Up to a point" part. But "that point" could easily be just a few running at a time (depending on the logic and schema), which isn't the type of gain you are looking for.

If you are asking what is the best way to accomplish the processing of 10 million "sets" of records as quickly / efficiently as possible, well, that is a much more involved question. It requires a bit more analysis. At bare minimum it would require reviewing both the logic in your ETL_TEST Stored Procedure and your Schema (Tables, Indexes, etc). So for the moment, there is not enough information presented here to provide any meaningful assistance along these lines.

Comments

0

There are several ways you could speed this up:

  1. Pass the data in a table-value parameter so you can process them in one call is likely the best way, but it does take a bit of setup and if this is a one-time thing, not really worth it, IMO.
  2. Since you are dealing with a single parameter, turn it into a comma-separated value string and pass it in so you can process a couple thousand per call on the server.
  3. The simplest thing you can do without changing much code is put this in a transaction and commit your records every couple thousand records, or all at once. Doing this will increase the speed by about 100-fold.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.