2

I have this locally defined python function that works fine when inserting data into a redshift table:

def _insert_data(table_name, values_list):
    insert_base_sql = f"INSERT INTO {table_name} VALUES"
    insert_sql = insert_base_sql + ','.join([str(row) for row in values_list])
    <run_local_python_code_to_execute_insert_sql>

The values_list is a list of tuples, each with the same number of elements as columns in table_name (even though I don't explicitly assert/check that in this function). However, I'm failing to find a way to insert a NULL value for a smallint column. Here's the schema for the table in question (no DEFAULT values were assigned to columns at table creation):

 schemaname |      tablename      |    column    |         type          | encoding | distkey | sortkey | notnull
------------+---------------------+--------------+-----------------------+----------+---------+---------+---------
 public     | table               | col1         | bigint                | lzo      | t       |       1 | f
 public     | table               | col2         | date                  | lzo      | f       |       2 | f
 public     | table               | col3         | smallint              | lzo      | f       |       3 | f
 public     | table               | col4         | smallint              | lzo      | f       |       4 | f
 public     | table               | col5         | double precision      | none     | f       |       0 | f
 public     | table               | col6         | bigint                | lzo      | f       |       0 | f
 public     | table               | col7         | character varying(48) | bytedict | f       |       0 | f

I'm specifically trying to insert NULL values for col3 and col4; I've tried creating tuples with '' and 'NULL' but run into this error: Error: invalid input syntax for integer: "NULL".

For what it's worth, here's what a scrubbed row ends up looking like in the INSERT statement: ('bigint_value', 'dt_value', 'NULL', 'NULL', 'double_value', 'bigint_value', 'string_name')

4
  • Have you tried using the value of None in the place where you want the NULLs to be? Commented Nov 27, 2017 at 23:11
  • Yup - that was my first thought too; resulted in this error: Error: column "none" does not exist in table_name (rows looked like ('bigint_value', 'dt_value', None, None, 'double_value', 'bigint_value', 'string_name')) Commented Nov 27, 2017 at 23:30
  • To insert null use the keyword NULL with no quotes. Commented Nov 28, 2017 at 0:37
  • Yup - @alecxe's answer below helped. Moving away from ','.join([str(row) for row in values_list]) & using placeholders instead worked. Commented Nov 29, 2017 at 18:32

1 Answer 1

1

The approach you are taking is quite dangerous by itself. Constructing queries using string concatenation and formatting is error-prone and not secure - you are making your query vulnerable to SQL injection attacks.

Instead, properly parameterize your query, passing a list of parameters as a separate argument to the cursor.executemany(). Here is one, not quite pretty, way to generate placeholders:

placeholders = ", ".join(["%s"] * len(values_list))
query = f"""
    INSERT INTO 
        {table_name} 
    VALUES
        ({placeholders})
"""
cursor.executemany(query, values_list)

(note that table name cannot be parameterized - sanitize and validate it separately)

Note the use of executemany() - it would execute a prepared query statement for every tuple in the values_list.

But, if you are using psycopg2, there is an even better way to insert multiple records into the table - execute_values() - take a look at this answer.

And, coming back to your initial question - if you take this approach, None placeholder values would be automatically converted to 'NULL' strings by the database driver.

Sign up to request clarification or add additional context in comments.

2 Comments

Noted re: sql injection concerns; I'll correct for that. I initially tried using None as well, but ran into this error: Error: column "none" does not exist in table_name
@ScottBorden right, I suspect you have not used parameterized queries, None was inserted into the query as None "keyword" with no quotes around it - hence it was interpreted as a column name and not a column value. Please give this approach a try.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.