5

I'm working with the following code in python, calling a PostgreSQL query with subprocess:

import subprocess
claimer_name = 'a_name'
startdate = '2014-04-01'
enddate = '2018-04-01' 

data = subprocess.check_output(['/usr/bin/psql -U user_name "SELECT c.asset_id, c.video_id,
c.claim_id, c.claim_date FROM db.claim c JOIN db.claim_history h ON c.claim_id = h.claim_id JOIN
db.users_email e ON LOWER(e.email) = LOWER(h.email) JOIN m.auth_user u ON e.user_id = u.id WHERE
h.list_order = 1 AND c.claim_origin = ‘Descriptive Search’ AND c.claim_date >= \"%s\" AND    
c.claim_date < \"%s\" AND concat(u.first_name, concat(chr(32),
u.last_name)) = \"%s\""' % (startdate, enddate, claimer_name)], shell=True)

How can I escape the single quotes around 'Descriptive Search'? Running this code as-is gives the error Only ASCII characters are allowed in an identifier.

I have tried:

  1. [''Descriptive Search'']
  2. [\'Descriptive Search\']
  3. [""Descriptive Search""]
  4. [concat('Descriptive', concat(chr(32), 'Search'))]

and assigning a variable: i = 'Descriptive Search', and then c.claim_origin = \"%s\".

However, these attempts yield the same ASCII characters error. Using string-formatting works fine for my other variables (startdate, enddate, claimer_name) and I'm stumped as to why it doesn't work for the string 'Descriptive Search'.

Using PostgreSQL 9.3.

Any help or points in the right direction would be great; thanks!

5
  • 5
    Egad. There are sooo many possible points of failure. Please please do yourself a favour and use a PostgreSQL driver library for Python, something like Psycopg2. Going through shell is bound to give you a headache, if it hasn't already. And while you're at it, read about little Bobby Tables. Commented Sep 16, 2014 at 1:58
  • 1
    +1 for showing your code and the exact error text. Good on you. In future please include the PostgreSQL version too, but +1 good question. Commented Sep 16, 2014 at 3:09
  • Thanks for the note, @CraigRinger! will do for next time. Commented Sep 16, 2014 at 3:12
  • @Daniel Ah, I see you did add a version tag. Better to just put it in the text unless it's a particularly version-specific issue. Yes, I know, stack overflow is full of weird things like that. Commented Sep 16, 2014 at 3:14
  • Gotcha, @CraigRinger; thanks for the pro tip. Commented Sep 16, 2014 at 3:15

1 Answer 1

6

There are so many things wrong with this.

  • You should be using psycopg2 rather than trying to shell out to psql to talk to the database;

  • Because you're not using a proper database binding you can't use placement parameters (prepared statements) properly, so you have to handle escaping in literals yourself to avoid SQL injection risks and quoting bugs;

  • When invoking commands via subprocess, avoid using the shell if at all possible. It's another point of possible failure, and completely unnecessary in this case;

  • Long strings should generally be """ quoted in Python to avoid the need to escape nested "s;

  • The expression concat(u.first_name, concat(chr(32), u.last_name)) is needlessly contorted. Just write u.first_name || ' ' || u.last_name or format('%s %s', u.first_name, u.last_name);

  • You're using "double quotes" to quote literals you substitute in, which is invalid SQL. They'll get treated as identifiers, per the documentation. So c.claim_date < \"%s\" will fail with an error like no column "2014-04-01";

  • You're using real single quotes, not apostrophes, when quoting ‘Descriptive Search‘. At a guess you've edited the code in a word processor, not a programmer's text editor. You want apostrophes, 'Descriptive Search', when quoting literals in SQL.

Because you used single quote characters (U+2018) instead of apostrophes (U+0027) to quote the literal string Descriptive Search, PostgreSQL didn't recognise it as a literal and tried to parse it as an identifier. However, isn't a legal character in an unquoted identifier, so it reported the error you show.

See the documentation on identifiers and literals.

Here's what you should have done:

import psycopg2
import datetime
claimer_name = 'a_name'
startdate = datetime.date(2014, 1, 1)
enddate = datetime.date(2018, 1, 1)

conn = psycopg2.connect("user=user_name")
curs = conn.cursor()
curs.execute("""
    SELECT 
        c.asset_id,
        c.video_id,
        c.claim_id,
        c.claim_date
    FROM db.claim c 
         JOIN db.claim_history h ON c.claim_id = h.claim_id 
         JOIN db.users_email e ON LOWER(e.email) = LOWER(h.email) 
         JOIN m.auth_user u ON e.user_id = u.id 
    WHERE h.list_order = 1 
      AND c.claim_origin = 'Descriptive Search'
      AND c.claim_date >= %s 
      AND c.claim_date < %s
      AND u.first_name || ' ' || u.last_name = %s
    """, (startdate, enddate, claimer_name)
)
results = curs.fetchall()

Pay particular attention the fact that I did not use Python's % string-formatting operator above. The %s entries are placement parameters that're substituted properly by psycopg2; see passing parameters to SQL queries.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for all of this, @CraigRinger! It all makes sense and is very helpful.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.