How to write a DataFrame to csv while ingesting data via an API with asyncio and aiohttp

Question

I built an API wrapper module in Python with aiohttp that allows me to significantly speed up the process of making multiple GET requests and retrieving data. Every data response is turned into a pandas DataFrame.

Using asyncio I do something that looks like this:

import asyncio
from custom_module import CustomAioClient


id_list = ["123", "456"]

async def main():
    client = CustomAioClient()

    tasks = []
    for id in id_list:
        task = asyncio.ensure_future(client.get_latest_value(id=id))
        tasks.append(task)

    responses = await asyncio.gather(*tasks, return_exceptions=True)
    # Close the session
    await client.close_session()
    return responses

if __name__ == "__main__":
    asyncio.run(main())

This returns a list of pandas DataFrames with time series for each id in the id_list that I want to save as csv files. I am a bit confused on how to proceed here.

Obviously I could just iterate over the list and save every DataFrame iteratively, but this seems highly inefficient to me. Is there a way to improve things here?

Edit

I did the following to save things and it is much faster than just iterating over multiple URLs, getting the data and saving it. I doubt whether this fully makes use of the asynchronous functionalities though.

import asyncio
from custom_module import CustomAioClient


async def fetch(client: CustomAioClient, id: str):
    df = await client.get_latest_value(id=id)
    df.to_csv(f"C:/{id}.csv")
    print(df)

async def main():
    client = CustomAioClient()
    id_list = ["123", "456"]

    tasks = []
    for id in id_list:
        task = asyncio.ensure_future(fetch(client=client, id=id))
        tasks.append(task)

    responses = await asyncio.gather(*tasks, return_exceptions=True)
    # Close the session
    await client.close_session()

if __name__ == "__main__":
    loop = asyncio.new_event_loop()
    loop.run_until_complete(main())

what about just making a method client.get_and_save(id), so that, well, the getting-and-saving is done within that same async task? — scotscotmcc
– scotscotmcc, Commented Feb 14, 2023 at 21:00
That is something I would want to do in a main function rather than in the class itself. I want the logic that actually retrieves the data to be separated from the logic that stores it, such that other people can also use my module to get data and then do whatever they want with it. — brokkoo
– brokkoo, Commented Feb 14, 2023 at 21:14

georgwalker45 · Accepted Answer · 2023-02-14 22:26:09Z

0

Check out the example at the end of the asyncio article in Real Python

This example takes the approach of setting up the function call for doing a single operation which includes getting the data and then writing it to the file, then making a bulk_method to handle multiple requests.

Also, the use of the 'with' keyword should be used for actions that require specific setup and cleanup, such as opening a file or making a connection to a server.

answered Feb 14, 2023 at 22:26

georgwalker45

1407 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

brokkoo Over a year ago

This guide uses aiofiles to write things to a csv file but I would like to know how to use pandas.DataFrame.to_csv to do this.

scienceseba · Accepted Answer · 2023-02-15 08:06:52Z

You could declare a simple function that downloads the DataFrame and saves it to csv file. Then, you could call this function using a ThreadPoolExecutor and the ayncio event loop, something like this:

import asyncio
from concurrent.futures import ThreadPoolExecutor
from custom_module import CustomAioClient

def download_to_csv(client: CustomAioClient, id: str) -> None:
    df = client.get_latest_value(id=id)
    df.to_csv(f"{id}.csv")

async def process(id_list: list[str]) -> None:
    client = CustomAioClient()
    with ThreadPoolExecutor() as executor:
        loop = asyncio.get_event_loop()
        tasks = [loop.run_in_executor(executor, download_to_csv(client, id)) for id in id_list]
    await asyncio.gather(*tasks)

id_list = ["123", "456"]

if __name__ == "__main__":
    loop = asyncio.get_event_loop()
    future = asyncio.ensure_future(process(id_list))
    loop.run_until_complete(future)

Collectives™ on Stack Overflow

How to write a DataFrame to csv while ingesting data via an API with asyncio and aiohttp

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related