7

Pandas gets ridiculously slow when loading more than 10 million records from a SQL Server DB using pyodbc and mainly the function pandas.read_sql(query,pyodbc_conn). The following code takes up to 40-45 minutes to load 10-15 million records from SQL table: Table1

Is there a better and faster method to read SQL Table into pandas Dataframe?

import pyodbc
import pandas

server = <server_ip> 
database = <db_name> 
username = <db_user> 
password = <password> 
port='1443'
conn = pyodbc.connect('DRIVER={SQL Server};SERVER='+server+';PORT='+port+';DATABASE='+database+';UID='+username+';PWD='+ password)
cursor = conn.cursor()

data = pandas.read_sql("select * from Table1", conn) #Takes about 40-45 minutes to complete
4
  • 1
    check with chunk Commented Nov 19, 2018 at 21:08
  • Does rows = cursor.execute("select * from Table1").fetchall() take a similar amount of time? Commented Nov 20, 2018 at 19:57
  • @W-B chunk does not help with the time issue. Still takes a lot of time to read. Commented Nov 26, 2018 at 17:17
  • @GordThompson Thank you. I tried using execute() and fetchall() takes decent amount of time to read the pyodbc cursor object but takes forever to convert it into pandas Dataframe. Please see link Commented Nov 26, 2018 at 17:19

1 Answer 1

1

I had a same problem with even more number of rows, ~50 M Ended up writing a SQL query and stored them as .h5 files.

sql_reader = pd.read_sql("select * from table_a", con, chunksize=10**5)

hdf_fn = '/path/to/result.h5'
hdf_key = 'my_huge_df'
store = pd.HDFStore(hdf_fn)
cols_to_index = [<LIST OF COLUMNS THAT WE WANT TO INDEX in HDF5 FILE>]

for chunk in sql_reader:
    store.append(hdf_key, chunk, data_columns=cols_to_index, index=False)

# index data columns in HDFStore
store.create_table_index(hdf_key, columns=cols_to_index, optlevel=9, kind='full')
store.close()

This way, we'll be able to read them faster than a Pandas.read_csv

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.