2

I have ~100,000 to 1,000,000 rows to insert into an Oracle18c database. I'm quite new with Oracle and this order of magnitude of data. I reckon there must be some optimal way to do it, but for now I've only managed to implement a line by line insertion:

def insertLines(connection, table_name, column_names, rows):
    cursor = connection.cursor()
    if table_exists(connection, table_name):
        for row in rows:
            sql = 'INSERT INTO {} ({}) VALUES ({})'.format(table_name, column_names, row)
            cursor.execute(sql)
    cursor.close()

Is there some clear way in Oracle to bulk the rows to reach higher effectivity using cx_Oracle (the python Oracle library)?

EDIT: I read the data from a CSV file.

4
  • what about pandas and dataframes? Commented Mar 20, 2019 at 23:33
  • 2
    this is what you want --> cx-oracle.readthedocs.io/en/latest/… also never never never never use string interpolation when inserting data Commented Mar 20, 2019 at 23:34
  • @MAhsan Sadly I don't know pandas. I'm reading from a CSV file, should I focus on a pandas implementation? Commented Mar 21, 2019 at 11:33
  • 1
    No harm in trying it, it has an easy to use read_csv method. Then a to_sql method, this requires setting up the engine for which you could use cx_Oracle. Of course, if you have 100k+ rows, it will take a bit of time but could do it all in one go. Here is a handy link for creating the engine connection gist.github.com/DGrady/7fb5c2214f247dcff2cb5dd99e231483 Commented Mar 21, 2019 at 15:07

3 Answers 3

4

If your data is already in Python, then use executemany(). In your case with so many rows, you probably would still execute multiple calls to insert batches of records.

The latest release of cx_Oracle (which got renamed to python-oracledb) runs in a 'Thin' mode by default which bypasses the Oracle Client libraries. This means that in many cases it is faster for data loads. The usage and functionality of executemany() is still the same in the new release. Install with something like python -m pip install oracledb. Here's the current documentation for Executing Batch Statement and Bulk Loading. Also see the upgrading documentation.

Here's an example using the python-oracledb namespace. If you still use cx_Oracle then change the import to be import cx_Oracle as oracledb:

import oracledb
import csv

...
Connect and open a cursor here...
...

# Predefine the memory areas to match the table definition.
# This can improve performance by avoiding memory reallocations.
# Here, one parameter is passed for each of the columns.
# "None" is used for the ID column, since the size of NUMBER isn't
# variable.  The "25" matches the maximum expected data size for the
# NAME column
cursor.setinputsizes(None, 25)

# Adjust the number of rows to be inserted in each iteration
# to meet your memory and performance requirements
batch_size = 10000

with open('testsp.csv', 'r') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    sql = "insert into test (id,name) values (:1, :2)"
    data = []
    for line in csv_reader:
        data.append((line[0], line[1]))
        if len(data) % batch_size == 0:
            cursor.executemany(sql, data)
            data = []
    if data:
        cursor.executemany(sql, data)
    con.commit()

There is a full sample at samples/load_csv.py.

As pointed out by others:

  • Avoid using string interpolation in statements because it is a security risk. It is also generally a scalability problem. Use bind variables. Where you need to use string interpolation for things like column names, make sure you santize any values.
  • If the data is already on disk, then using something like SQL*Loader or Data Pump will be better than reading it into cx_Oracle and then sending it to the DB.
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for the information! I'm reading from a CSV, and I have it on my disk. What's the difference between SQL*Loader and Data Pump?
sqlldr reads data from csv files and inserts it into a table. Exactly what you want. DataPump is a generic term for the 'expdp' and 'impdp' utilites. expdp 'exports data and metadata (essentially a bunch of CREATE and INSERT commands) and writes it to a propriatary binary file. impdp reads that binary file and executes the CREATE and INSERT commands to import the data and metadata into a database. Both are fully documented at docs.oracle.com/database/121/SUTIL/toc.htm
3

I don't know what format you have the data in, but SQL Data Loader is a command line utility specifically created for adding large amounts of data to Oracle.

2 Comments

Then Data Loader would probably be the best way to load your data.
SQL Loader would be the 2nd best option
1

The most optimal way in terms of performance and easy would be to create an External Table over your CSV file and then use SQL do the insert.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.