2

so im making a program which is kind of a web crawler. it downloads the html of a page and parses it for a specific text using regex and then adds it to a list.

to achieve this, i used async http requests. the GET request is sent asynchronously and the parsing operation is performed on the returned html.

my issue, and i'm not sure if it may be simple, is that the program doesn't run smoothly. it will send a bunch of requests, pause for a couple seconds, then increments the items parsed all at once (although the counter is programmed to increment once every time an item is added) so that for example it jumps from 53 to 69 instead of showing, 54,55,56,...

sorry for being a newb but i taught myself all this stuff and some experienced advice would go a long way.

thanks

3
  • stackoverflow.com/questions/1732348/… Commented May 17, 2012 at 3:14
  • this is for a specific site where the resulting html is always in the same form with changing variables so regex works fine. Commented May 17, 2012 at 3:57
  • but just out of curiosity, is there another method of doing it more efficiently? Commented May 17, 2012 at 3:58

1 Answer 1

4

That sounds correct.

The slowest part of your task is downloading the pages over the network.

Your program starts downloading a bunch of pages at once, waits for them to arrive, then parses them all almost instantly.

Sign up to request clarification or add additional context in comments.

4 Comments

in that case, can I give priority to the main thread somehow? that is, the thread that is queuing the async requests into ThreadPool? i need this because the main thread is also making a request each time 20 async requests have been made. so whats happening is that its being backlogged behind all the already queued ThreadPool requests and blocking the whole program waiting for its response.
@user1115071: Consider using the TPL, which is already optimized for this.
Please forgive my ignorance as I've never used the TPL. Should I be using it for all threads, or only for the main ones I mentioned?
Use Parallel.For* or Task or LINQ AsParallel() and don't use threads or the threadpool directly at all.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.