2

I wish to download a large number of files with httpclient, performs some time consuming but not expensive computation on them, and then add the result to my database after running some query that shows that it is not already there.

How can I do this conceptually (just the locations of the awaits and the like would be helpful)

I currently have the following:

get list of addresses add (await the web page download, then continue processing) to a list of Task foreach element of the list, await on it, and then add it to the database.

However, it seems that this is essentially running it serially.

How should this be designed?

2 Answers 2

3

I would set up a pipeline using TPL Dataflow. You post the addresses and the actors are:

  1. Web page download
  2. Processing
  3. Add to DB

Use async wherever you can (as long as the operation is truly asynchronous) and set a high MaxDegreeOfParallelism to allow TPL to choose the optimal value by itself.

Sign up to request clarification or add additional context in comments.

6 Comments

Do you mean use the primitives that it provides (actionblock, bufferblock, etc) or something else? If you mean the primitives, can you please explain which ones? I am not at all familiar with TPL.
Well, yes. The easiest way is to use ActionBlock for all of them. You only need to give it a method (WebPageDownload for example) and remember to "send" it to the next ActionBlock at the end. The more "correct" way is to also use TransformBlocks
Is there a nice way of having a trasformblock later in the chain post to a transform block that is earlier in the chain?
@soandos There's no technical problem with it. It just needs to have a reference to the earlier block. But if you do that it stops being a simple pipeline and you're in danger of a continuous loop.
But what am I looping on? is download.InputCount != 0 && Download.OutputCount != 0 && process.OutputCount != 0 && process.InputCount != 0 (the looping ones) correct?
|
1

I would get the downloads/processing running in parallel and await them all to complete. The code would look something like this:

// get a collection of "hot" Tasks running in parallel
var tasks = myCollection.Select(x => DownloadAndProcessAsync(x));

// await the completion of all Tasks
await Task.WhenAll(tasks);

2 Comments

There's no point of waiting for all of the tasks to complete before moving on with what's already finished
The method inside the lambda was intended to download and perform all processing for a single item. Updated the method name to reflect that.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.