I am trying to run a query on a table which has about 10 million rows. Basically I am trying to run Select * from events and then I am writing it to a CSV file.
Here is the code:
with create_server_connection() as connection:
cursor = connection.cursor()
cursor.itersize = 20000
cwd = os.getcwd()
query = open( sql_file_path, mode='r').read()
print(query)
cursor.execute(query)
with open(file_name, 'w', newline='')as fp:
a = csv.writer(fp)
for row in cursor:
a.writerow(row)
def create_server_connection():
DB_CONNECTION_PARAMS = os.environ["DB_REPLICA_CONNECTION"]
json_object = json.loads(DB_CONNECTION_PARAMS)
try:
conn = psycopg2.connect(
database=json_object["PGDATABASE"], user=json_object["PGUSER"], password=json_object["PGPASSWORD"], host=json_object["PGHOST"], port=json_object["PGPORT"]
)
except psycopg2.OperationalError as e:
print('Unable to connect!\n{0}').format(e)
sys.exit(1)
return conn
However, for some reason, this whole process is taking up a lot of memory. I am running this as an AWS-batch process and the process exits with this error OutOfMemoryError: Container killed due to memory usage
Is there a way to reduce memory usage?
COPYto export to CSV directly? There's no faster way than have the database itself export the data.cursor). The default batch size is I think 2K rows but can change with theiterrowsattribute