So I have a table with 146 columns and approx. 8 mil rows of sparse data stored locally into a Postgresql.
My goal is to select the whole dataset at once, store it into a pandas dataframe and perform some calculations.
So far I have read about server side cursors in many threads but i guess I'm doing something wrong as I don't see improvement in memory. The documentation is also quite limited..
My code so far is the following:
cur=conn.cursor('testCursor')
cur.itersize = 100000
cur.execute("select * from events")
df = cur.fetchall()
df = pd.DataFrame(df)
conn.commit()
conn.close()
I also tried using fetchmany() or fetchone() instead of fetchall() but I don't know how to scroll the results. I guess I could use something like this for fetchone() but I don't know how to handle fetchmany():
df = cur.fetchone()
while row:
row = cur.fetchone()
Lastly, in case of fetchone() and fetchmany() how can I concat the results into a single dataframe without consuming all of my memory? Just to note that I have 16gb available RAM