0

I'm using a python workflow that makes frequent SQL calls, with the SQL calls stored in standalone files and formatted using psycopg2 identifiers. I'm trying to modify these calls using dictionary objects, but can't get the SQL calls to format correctly.

An example header of an SQL file would be:

with my_table as (
  SELECT DISTINCT ON ("column_A")
    left_table."column_b",
    left_table."column_c",
    {column_identifier}
    ...

In my Python script I call the following:

col_info = my_dictionary.entry
column_identifier = " ".join([f'left_table."{x}",' for x in col_info])

This looks exactly right, but if I include this in a call

sql_call = py2.sql.SQL(sql).format(
  column_identifier=py2.sql.Identifier(column_identifier),
  ...
  )

I receive a formatting error where the parsed columns all look like this:

"left_table.""column_d"", "

If I use Literal instead of Identifier then the SQL call transacts but produces a nonsensical output. I cannot use AsIs because other Identifier calls are used.

The only potential solution I have seen so far is where the entire SQL call is contained within f""" ... """ but since there are multiple, long SQL calls, including them all in the Python script would be very unwieldy. Being able to format these simple lines would be a highly preferable solution.

Thanks

5
  • Take a look at how they do it in : stackoverflow.com/questions/46770201/… Commented Mar 21, 2023 at 14:56
  • Why wouldn't they look like "left_table.""column_d"" you did 'left_table."{x}",' which double quoted x and then py2.sql.Identifier() which double quoted it again. Look at docs here sql, in particular class psycopg2.sql.Identifier ... Multiple strings can be passed to the object to represent a qualified name, i.e. a dot-separated sequence of identifiers. . There are plenty of examples to show how to do this. Commented Mar 21, 2023 at 18:12
  • It's still unclear to me how to prevent the double quotes around the column names. For example, escaping isn't an option (.\"{x}\") Commented Mar 22, 2023 at 15:06
  • Don't do this column_identifier = " ".join([f'left_table."{x}",' for x in col_info]). Look at the examples in the sql module link above and add the table/column names directly to the sql.SQL().format(). The table/column names will end up double quoted because that is what sql.Identifier() does. The result if done correctly will be "column_d" not ""column_d"". Commented Mar 22, 2023 at 16:53
  • I've done that elsewhere, but here I'm looking for a solution where I can generate an Identifier where the number of columns called changes between scripts, which is why left_table is part of the call Commented Mar 22, 2023 at 19:46

1 Answer 1

0

A rough draft of what I think you are looking for:

import psycopg2
from psycopg2 import sql
con = psycopg2.connect("dbname=test host=localhost  user=aklaver")

left_table = 'left_tbl'
table_name = 'some_tbl'
column_list = ['a', 'b', 'c']


q = sql.SQL("SELECT {} FROM {} JOIN {} ...").format(sql.SQL(', ').join(
            [sql.Identifier(left_table, col)  for col in column_list]),
            sql.Identifier(left_table), sql.Identifier(table_name))
print(q.as_string(con))

SELECT "left_tbl"."a", "left_tbl"."b", "left_tbl"."c" FROM "left_tbl" JOIN "some_tbl" ...

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.