How to insert 1 million rows into Oracle database with Python?

Question

I have ~100,000 to 1,000,000 rows to insert into an Oracle18c database. I'm quite new with Oracle and this order of magnitude of data. I reckon there must be some optimal way to do it, but for now I've only managed to implement a line by line insertion:

def insertLines(connection, table_name, column_names, rows):
    cursor = connection.cursor()
    if table_exists(connection, table_name):
        for row in rows:
            sql = 'INSERT INTO {} ({}) VALUES ({})'.format(table_name, column_names, row)
            cursor.execute(sql)
    cursor.close()

Is there some clear way in Oracle to bulk the rows to reach higher effectivity using cx_Oracle (the python Oracle library)?

EDIT: I read the data from a CSV file.

this is what you want --> cx-oracle.readthedocs.io/en/latest/… also never never never never use string interpolation when inserting data — gold_cy
– gold_cy, Commented Mar 20, 2019 at 23:34
@MAhsan Sadly I don't know pandas. I'm reading from a CSV file, should I focus on a pandas implementation? — MattSom
– MattSom, Commented Mar 21, 2019 at 11:33
No harm in trying it, it has an easy to use read_csv method. Then a to_sql method, this requires setting up the engine for which you could use cx_Oracle. Of course, if you have 100k+ rows, it will take a bit of time but could do it all in one go. Here is a handy link for creating the engine connection gist.github.com/DGrady/7fb5c2214f247dcff2cb5dd99e231483 — user5971808
– user5971808, Commented Mar 21, 2019 at 15:07

Christopher Jones · Accepted Answer · 2025-02-19 22:10:58Z

4

If your data is already in Python, then use executemany(). In your case with so many rows, you probably would still execute multiple calls to insert batches of records.

The latest release of cx_Oracle (which got renamed to python-oracledb) runs in a 'Thin' mode by default which bypasses the Oracle Client libraries. This means that in many cases it is faster for data loads. The usage and functionality of executemany() is still the same in the new release. Install with something like python -m pip install oracledb. Here's the current documentation for Executing Batch Statement and Bulk Loading. Also see the upgrading documentation.

Here's an example using the python-oracledb namespace. If you still use cx_Oracle then change the import to be import cx_Oracle as oracledb:

import oracledb
import csv

...
Connect and open a cursor here...
...

# Predefine the memory areas to match the table definition.
# This can improve performance by avoiding memory reallocations.
# Here, one parameter is passed for each of the columns.
# "None" is used for the ID column, since the size of NUMBER isn't
# variable.  The "25" matches the maximum expected data size for the
# NAME column
cursor.setinputsizes(None, 25)

# Adjust the number of rows to be inserted in each iteration
# to meet your memory and performance requirements
batch_size = 10000

with open('testsp.csv', 'r') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    sql = "insert into test (id,name) values (:1, :2)"
    data = []
    for line in csv_reader:
        data.append((line[0], line[1]))
        if len(data) % batch_size == 0:
            cursor.executemany(sql, data)
            data = []
    if data:
        cursor.executemany(sql, data)
    con.commit()

There is a full sample at samples/load_csv.py.

As pointed out by others:

Avoid using string interpolation in statements because it is a security risk. It is also generally a scalability problem. Use bind variables. Where you need to use string interpolation for things like column names, make sure you santize any values.
If the data is already on disk, then using something like SQL*Loader or Data Pump will be better than reading it into cx_Oracle and then sending it to the DB.

edited Feb 19 at 22:10

answered Mar 21, 2019 at 1:10

Christopher Jones

11.1k7 gold badges32 silver badges64 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

MattSom Over a year ago

Thank you for the information! I'm reading from a CSV, and I have it on my disk. What's the difference between SQL*Loader and Data Pump?

EdStevens Over a year ago

sqlldr reads data from csv files and inserts it into a table. Exactly what you want. DataPump is a generic term for the 'expdp' and 'impdp' utilites. expdp 'exports data and metadata (essentially a bunch of CREATE and INSERT commands) and writes it to a propriatary binary file. impdp reads that binary file and executes the CREATE and INSERT commands to import the data and metadata into a database. Both are fully documented at docs.oracle.com/database/121/SUTIL/toc.htm

eaolson · Accepted Answer · 2019-03-21 00:29:12Z

3

I don't know what format you have the data in, but SQL Data Loader is a command line utility specifically created for adding large amounts of data to Oracle.

answered Mar 21, 2019 at 0:29

eaolson

15.2k7 gold badges46 silver badges62 bronze badges

2 Comments

eaolson Over a year ago

Then Data Loader would probably be the best way to load your data.

BobC Over a year ago

SQL Loader would be the 2nd best option

BobC · Accepted Answer · 2019-03-21 23:13:06Z

1

The most optimal way in terms of performance and easy would be to create an External Table over your CSV file and then use SQL do the insert.

answered Mar 21, 2019 at 23:13

BobC

4,4421 gold badge14 silver badges16 bronze badges

Collectives™ on Stack Overflow

How to insert 1 million rows into Oracle database with Python?

3 Answers 3

2 Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related