Does anyone know which python library would provide the fastest read speeds from an Azure Analysis Database? Long story short, I have access to an external Azure Analysis Database (A Power BI Premium dataset) and I need to copy that database to a local environment at least every morning. I'm copying it into an MS SQL database and it's about 4gb in total size, so it's not so big.
My current workflow utilizes pandas but the larger tables, that have 40 columns and several hundred thousand rows, take HALF AN HOUR EACH to read into a pandas dataframe. Just read. I've tested my internet connection, other devices, increased my processing power etc etc but that has barely made a dent. Maybe I'm going about it wrong but if anyone has any suggestions I'd love to hear it.
If you have any non-python suggestions I would love to hear those as well. However, I work for a non-profit where programming work conditions are less than ideal, so the goal is to find a cheap, simple method.
Here's what that my current process looks like:
with Pyadomd(connectionString) as azure_conn:
# Azure Analysis Service databases require DAX, MDX
DAX_query = f"""EVALUATE '{table}'"""
with azure_conn.cursor().execute(DAX_query) as cur:
# this line takes half hour to run....
data = pd.DataFrame(cur.fetchone(), columns=[i.name for i in cur.description])
# I then write to the database which takes around a minute.
data.to_sql(name=table, con=sql_conn, if_exists='replace', index=False)
I've been able to speed up the process by changing the query to pull the entire table as a string with a set separator (~) and then dividing it back up in pandas but, again, that's less than ideal and was still quite slow.