0

I'm a newbie at asyncio and aiohttp. Recently, I try to practice for understanding how does the eventloop actually working.

when I practice for sending urls simultaneously, I encounter some problems. According to my knowledge, create_task will make the coro get into the eventloop and await will make the eventloop jump out to do other task until the await task is done, but the following result is out of my mind. The upside in blockmain works like sync(block mode) and the downside just work as my expect(It's works like what I've known with both async/await and asyncio). I'm not really sure whether I get misunderstanding for the knowledge of async/await and asyncio in this situation or not. If someone who really know about it, give me the detailed answer please. It really bother me.

Sorry for my poor English.

Following is my code

urls = [
'https://www.104.com.tw/jobs/search/?keyword=python&order=1&page=1&jobsource=2018indexpoc&ro=0',
'https://www.104.com.tw/jobs/search/?keyword=python&order=1&page=2&jobsource=2018indexpoc&ro=0',
'https://www.104.com.tw/jobs/search/?keyword=python&order=1&page=3&jobsource=2018indexpoc&ro=0',
'https://www.104.com.tw/jobs/search/?keyword=python&order=1&page=4&jobsource=2018indexpoc&ro=0',
'https://www.104.com.tw/jobs/search/?keyword=python&order=1&page=5&jobsource=2018indexpoc&ro=0',
'http://www.httpbin.org:12345/',
'https://www.104.com.tw/jobs/search/?keyword=python&order=1&page=6&jobsource=2018indexpoc&ro=0',
'https://www.104.com.tw/jobs/search/?keyword=python&order=1&page=7&jobsource=2018indexpoc&ro=0',
'https://www.104.com.tw/jobs/search/?keyword=python&order=1&page=8&jobsource=2018indexpoc&ro=0',
'https://www.104.com.tw/jobs/search/?keyword=python&order=1&page=9&jobsource=2018indexpoc&ro=0',
'https://www.104.com.tw/jobs/search/?keyword=python&order=1&page=10&jobsource=2018indexpoc&ro=0']

async def fetch_(link):
    # loop = asyncio.get_event_loop()
    # print(asyncio.all_tasks(loop))
    async with ClientSession(timeout=ClientTimeout(total=10)) as session:
        async with session.get(link) as response:
            html_body = await response.text()
            print(f"{link} is done")

async def blockmain():
    # ========================= following 2 lines can't work as my expect
    for link in urls:
        await asyncio.create_task(fetch_(link))
    
    # second part
    # ========================= following 3 line can work as my expect
    # loop 1
    tasks = [asyncio.create_task(fetch_(link)) for link in urls]
    for t in tasks:
        await t
    # loop 2
    tasks = [asyncio.create_task(fetch_(link)) for link in urls]
    for t in tasks:
        await t

asyncio.run(blockmain())

I want to know the reason why the program will run like sync(block mode) when I await asyncio.create_task in the for loop, but work async that await task after create all tasks.

Thanks.

1
  • Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. Commented Jun 7, 2022 at 20:38

1 Answer 1

0

In the first case you are not running the tasks concurrently.

for link in urls:
    await asyncio.create_task(fetch_(link))

The expression asyncio.create_task schedules the routine fetch_ as a task. The await keyword suspends the current task (blockmain) and waits for the fetch_ task to complete. Those are the only two tasks at that point. When the fetch_ task finishes, the main task continues. It goes through the loop again with a new value for link. That process repeats. You never have two tasks fetch_ running at the same time, since you await each task as you create it. There is no useful concurrent execution.

In the second case you get concurrent execution, since you create all the tasks before you await for the first time. The instances of fetch_ take turns, switching from one task to another each time one of the tasks needs to await something.

However, the code for your second case is longer than it needs to be. See the documentation for the asyncio.gather function. You could replace all three lines with one line, like this:

await asyncio.gather(fetch_(link) for link in urls)

The gather function automatically creates tasks and awaits until they are all finished.

Sign up to request clarification or add additional context in comments.

7 Comments

Thanks for your answer. Therefore, although I await in the first part, it only have two tasks at the same time, so it get block with the task which I await for because of there is no other task can take turns, right? Another question is When I repeat the second part code for twice, It can continue work concurrency for the second round, I'm really confuse why it can continue to do the second round without waiting for that "httpbin" timeout? Thanks
100% right about the first part. In the second part, the first await you perform is on the first task. While you await the completion of the first one, all the other tasks continue to run because they have all been started. After the first task finishes you await on the second task, etc. If one of the tasks takes a long time, your for loop gets stuck at that point but, once again, all the tasks continue. You can't exit from the loop until every task is finished. Most programmers use gather() instead of a for loop but it is more or less the same. Is that what you were asking?
My question is that I run the second part for twice. The urls except httpbin can complete running. Is the reason that second time for loop can run when the first httpbin still await because of it turn to continue running for blockmain task? code tasks = [asyncio.create_task(fetch_(link)) for link in urls] for t in tasks: await t tasks = [asyncio.create_task(fetch_(link)) for link in urls] for t in tasks: await t really thanks for your response.
I only see two for loops. I don't understand "I run the second part for twice." The second for loop doesn't even start until all the tasks launched by the first one are finished, because you await each task inside the first for loop. Probably I am not understanding you. Maybe it would help if you would edit the question to add more information, or post a different question. Also it's hopeless to try to read Python code in a comment because the indentation gets lost.
I've update my question above. If I run the second part, it can run the second for loop during it await for the first loop "httpbin" timeout, but why can it happended? Thanks.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.