4

I'm using psycopg2 to interact with a PostgreSQL database. I have a function whereby any number of columns (from a single column to all columns) in a table could be inserted into. My question is: how would one properly, dynamically, construct this query?

At the moment I am using string formatting and concatenation and I know this is the absolute worst way to do this. Consider the below code where, in this case, my unknown number of columns (i.e. keys from a dict is in fact 2):

dictOfUnknownLength = {'key1': 3, 'key2': 'myString'}

def createMyQuery(user_ids, dictOfUnknownLength):
    fields, values = list(), list()

    for key, val in dictOfUnknownLength.items():
        fields.append(key)
        values.append(val)

    fields = str(fields).replace('[', '(').replace(']', ')').replace("'", "")
    values = str(values).replace('[', '(').replace(']', ')')

    query = f"INSERT INTO myTable {fields} VALUES {values} RETURNING someValue;"

query = INSERT INTO myTable (key1, key2) VALUES (3, 'myString') RETURNING someValue;

This provides a correctly formatted query but is of course prone to SQL injections and the like and, as such, is not an acceptable method of achieving my goal.

In other queries I am using the recommended methods of query construction when handling a known number of variables (%s and separate argument to .execute() containing variables) but I'm unsure how to adapt this to accommodate an unknown number of variables without using string formatting.

How can I elegantly and safely construct a query with an unknown number of specified insert columns?

1 Answer 1

2

To add to your worries, the current methodology using .replace() is prone to edge cases where fields or values contain [, ], or '. They will get replaced no matter what and may mess up your query.

You could always use .join() to join a variable number of values in your list. To top it up, format the query appropriately with %s after VALUES and pass your arguments into .execute().

Note: You may also want to consider the case where the number of fields is not equal to the number values.

import psycopg2


conn = psycopg2.connect("dbname=test user=postgres")
cur = conn.cursor()

dictOfUnknownLength = {'key1': 3, 'key2': 'myString'}


def createMyQuery(user_ids, dictOfUnknownLength):
    # Directly assign keys/values.
    fields, values = list(dictOfUnknownLength.keys()), list(dictOfUnknownLength.values())

    if len(fields) != len(values):
        # Raise an error? SQL won't work in this case anyways...
        pass

    # Stringify the fields and values.
    fieldsParam = ','.join(fields) # "key1, key2"
    valuesParam = ','.join(['%s']*len(values))) # "%s, %s"

    # "INSERT ... (key1, key2) VALUES (%s, %s) ..."
    query = 'INSERT INTO myTable ({}) VALUES ({}) RETURNING someValue;'.format(fieldsParam, valuesParam)

    # .execute('INSERT ... (key1, key2) VALUES (%s, %s) ...', [3, 'myString'])
    cur.execute(query, values) # Anti-SQL-injection: pass placeholder
                               # values as second argument.
Sign up to request clarification or add additional context in comments.

2 Comments

I believe this approach will lead to column names being treated as strings and thus encapsulated within ' ' and therefore being invalid SQL syntax?
Ensuring that len(fields)==len(values) is a very good point, thanks.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.