20

I work on a system which downloads data from a cloud system to a local database (PostgreSQL, MySQL, ...). Now I'm having an issue with PostgreSQL performance because it takes a lot of time to insert the data.

A number of columns and the size of the data may vary. In a sample project, I have a table with approx. 170 columns. There is one unique index - but even after dropping the index the speed of the insert did not change.

I'm using JDBC driver to connect to the database and I'm inserting data in batches of 250 rows (using NamedParameterJdbcTemplate).

It took me approx. 18 seconds to insert the data on Postgres. The same data set on MySQL took me just a second. That's a huge difference - where does it come from? Is Postgres JDBC driver that slow? Can it be configured somehow to make it faster? Am I missing something else? The difference between Postgres and MySQL is so huge. Any other ideas how to make it faster?

I made a sample project which is available on Github - https://github.com/varad/postgresql-vs-mysql. Everything happens in LetsGo class in the "run" method.

23
  • 1
    could you enable more debug to see what is actual slow? (insert, commit, connecting)? for debug in java (for driver) loglevel=2 (jdbc.postgresql.org/documentation/80/connect.html) for server-side logging drupal.org/node/560192 Commented Jun 21, 2016 at 10:37
  • Which versions are we talking about? Did you try with with a transaction? Commented Jun 21, 2016 at 10:43
  • Also, did you try reversing and making it insert on MySQL first and see what happens (like letsGo.run(Type.MYSQL); letsGo.run(Type.POSTGRES);)? Also how are you checking the times? Commented Jun 21, 2016 at 10:46
  • 1
    would you please do this: create table t1 as select * from your_table limit 250; and then pg_dump --inserts -t t2 to a file and then try running the file in psql measuring time (\timing on switch) - this will give you the expected speed of insert of 250 rows on your machine. then create index etc.. and measure again. Commented Jun 21, 2016 at 11:27
  • 1
    When enabling statement logging, I can see that the statement is prepared approx. 20.000 (yes 20 thousand) times before the actual inserts start. Once the actual inserts start, that only takes approx. 1 second. But the 20.000 prepare statement take ~30 seconds on my computer. I don't know if this is caused by Spring or some configuration option. Commented Jun 24, 2016 at 10:41

3 Answers 3

24
+50

It seems that this is a combination of a Spring "bug" and a driver "bug".

Spring tries to determine the data type of a column each time setValue() is called. It does this by calling PreparedStatementMetaData.getParameterMetaData()

This apparently causes a "prepare" statement to be sent to the database which in itself is quite fast (never more then 1ms on my laptop) but as it is called for each column for each row this sums up to a lot of time (it's called for every not-null value which results in approx. 23.000 calls)

To some extent this is more a Spring bug then a driver bug because not caching the parameter meta data doesn't really make sense (at least in my opinion). The MySQL JDBC driver does not support getParameterMetaData() and Spring knows this and so this "bug" doesn't show up with MySQL because spring never calls that method.

I am not sure if Postgres' JDBC driver behavior can be classified as a bug, but it sure would be nice if the driver was caching that meta data after the first call.

Spring can be convinced to not obtain the statement meta data through the property spring.jdbc.getParameterType.ignore

So by putting:

System.setProperty("spring.jdbc.getParameterType.ignore", "true");

before the line:

LetsGo letsGo = new LetsGo();

this behaviour is disabled.

The property must be set before Spring is initialized.

When I do that with your sample project, the insert runs in 500ms on my laptop.


Edit

After seeing the comment regarding the use of the Postgres-NG driver I dug into the sources of the "official" driver and the NG driver, and the NG driver does cache the parameter meta data after the first call whereas the official driver does not which explains why using the NG driver is so much faster (without disabling the call in Spring)

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks! I believe this is a perfect answer. Though if I set the property I get "DataIntegrityViolationException" not sure why. Did you have to change something else except for adding the property?
@varad: no, that was the only thing I changed.
Interesting, I'm getting "22P02: invalid input syntax for integer". But I guess that's some other kind of issue.
I figured that out. The code which was parsing my XML and converting the values to Java types was setting an Integer to Text field and hence it failed. So it's not related to this Stack Overflow question at all. The good new is that with that parameter enabled it is really fast! Thanks for investigating the issue, it was really helpful!
2

try using pgjdbc-ng driver and then compare your results.

It is available here: http://impossibl.github.io/pgjdbc-ng/

1 Comment

Wow, with this driver it was super fast. Just half-second!
-3

I hope you are using DB Connection Pool. You can try C3P0. Spring (JDBCTemplate) doesn't provide Connection Pool implementation.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.