My code looks like this, i use pd.DataFrame.from_records to fill data into the dataframe, but it takes Wall time: 1h 40min 30s to process the request and load data from the sql table with 22 mln rows into df.
# I skipped some of the code, since there are no problems with the extract of the query, it's fast
cur = con.cursor()
def db_select(query): # takes the request text and sends it to the data_frame
cur.execute(query)
col = [column[0].lower() for column in cur.description] # parse headers
df = pd.DataFrame.from_records(cur, columns=col) # fill the data into the dataframe
return df
Then I pass the sql query to the function:
frame = db_select("select * from table")
How can i optimize code for speed up process?
pd.read_sql?dd.read_sql_table()as indask(pandas big data big brother) instead of pandas.pip install daskandimport dask.dataframe as dd