statement/query cache vs Python db libraries and bound parameters?

Question

As far as I know all databases/access libraries have support for preparing statements and binding variables (e.g. PostgreSQL, ODBC, MySQL, etc.). The Python DB-API seems to imply that database libraries should be implemented using bound variables internally, yet the two I've checked does not..?

MySQLdb uses string inerpolation internally (from the implementation of cursor.execute(..)):

query = query % tuple([db.literal(item) for item in args])

and the _mysql.c implementation uses:

r = mysql_real_query(&(self->connection), query, len);

instead of the mysql_stmt_* functions.

In the psycopg2 library all execute paths seem to end up in _psyco_curs_execute, which calls _psyco_curs_merge_query_args, which merges "together a query string and its arguments." (cf. code).

Bound parameters are supposed to be both faster and more secure, so why do these libraries do string formatting instead? Since most queries will be unique, the query/statement caches will be of little use, should I dramatically reduce their sizes (to prevent the cache-maintenance overhead)?

I don't see where it is stated that it should be implemented via prepared statements. — Daniel Roseman
– Daniel Roseman, Commented Oct 3, 2015 at 18:09
I said "seems to imply", based on this paragraph: "A reference to the operation will be retained by the cursor. If the same operation object is passed in again, then the cursor can optimize its behavior. This is most effective for algorithms where the same operation is used, but different parameters are bound to it (many times)." Are you saying that the DB API is saying that it shouldn't be implemented with bound parameters? — thebjorn
– thebjorn, Commented Oct 3, 2015 at 18:11

Pavel Stehule · Accepted Answer · 2015-10-04 11:34:45Z

1

The overhead of prepared statements overhead is relative small in Postgres. This cache is not shared between processes, it is only per process - so the implementation is pretty simple. So this should not be a argument for any decision. There are others:

blind optimization (or semi blind optimization for >= 9.3)
little bit higher overhead of protocol
but it is 100% safe against SQL injection

Now, almost all interfaces ensure safe client side prepared statements - and it is probably better choose in almost all cases. The exceptions are pretty repeated statements - usually some INSERTs or UPDATEs.

The check what method is used is simple - you can log all queries on PostgreSQL side by set log_mi_duration_statement = 0 and you will see in postgresql.log queries with or without bind parameters (placeholders).

answered Oct 4, 2015 at 11:34

Pavel Stehule

46.6k6 gold badges103 silver badges102 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

statement/query cache vs Python db libraries and bound parameters?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related