1

On our application server, the DevOps team are using SQL_ASCII encoding in the Postgres (9.4) DB.

A 3rd-party application is inserting Surnames into the Employee table with accented characters, e.g. Núñez

My Java (8) Application is a Spring (4.3.15) WebApp using Mybatis (3.2.4)

When my application reads such Surnames out of the SQL_ASCII db, I get:

org.postgresql.util.PSQLException: ERROR: invalid byte sequence for encoding "UTF8": 0xe3 0xa1 0x54 at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2182) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1911) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:173) at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:616) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:466) at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:459) at org.apache.tomcat.dbcp.dbcp2.DelegatingPreparedStatement.execute(DelegatingPreparedStatement.java:93) at org.apache.tomcat.dbcp.dbcp2.DelegatingPreparedStatement.execute(DelegatingPreparedStatement.java:93) at jdk.internal.reflect.GeneratedMethodAccessor78.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.apache.ibatis.logging.jdbc.PreparedStatementLogger.invoke(PreparedStatementLogger.java:55) at com.sun.proxy.$Proxy98.execute(Unknown Source)

If I try changing the client_encoding via:

SET client_encoding = 'SQL_ASCII';

Then I get error:

org.postgresql.util.PSQLException: The server's client_encoding parameter was changed to LATIN1. The JDBC driver requires client_encoding to be UTF8 for correct operation. at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1950)

How can I "safely" read these characters from the DB?

2 Answers 2

1

You are lost. An SQL_ASCII database is not encoding aware, it treats all bytes (except the 0 byte) equal. There will be no encoding conversion in the database.

So unless the data is accidentally encoded as UTF-8, which it isn't (according to the errror message), you cannot use it with the JDBC driver.

You will have to dump the database and restore it (using the appropriate -E option) into a different database (v13) with a proper encoding. Any encoding inconsistencies will have to be fixed manually during this process.

This question will provide additional insight.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks, but I'm a bit bewildered - any thoughts as to how can Postgres think the limitation of being unable to read any DB with encoding other than UTF-8 is acceptable?
But the PostgresJDBC driver insists the client encoding is UTF-8. What if the DB was LATIN1 - could we read from it via the PG JDBC driver?
Yes, that would work just fine. That's why I recommended to dump and restore the data to a database with a proper encoding.
1

allowEncodingChanges=true

Can you try setting allowEncodingChanges=true in the JDBC connection URL? (and also characterEncoding)

allowEncodingChanges = boolean

When using the V3 protocol the driver monitors changes in certain server configuration parameters that should not be touched by end users. The client_encoding setting is set by the driver and should not be altered. If the driver detects a change it will abort the connection. There is one legitimate exception to this behaviour though, using the COPY command on a file residing on the server's filesystem. The only means of specifying the encoding of this file is by altering the client_encoding setting. The JDBC team considers this a failing of the COPY command and hopes to provide an alternate means of specifying the encoding in the future, but for now there is this URL parameter. Enable this only if you need to override the client encoding when doing a copy.

Ref: Chapter 3. Initializing the Driver

Bytes in the error message are 1110 0011, 1010 0001, 0101 0100

If the stored data was encoded in ISO-8859-1, this will be ã, ¡, T.

When this byte stream was read as UTF-8, 1110MSB in first byte indicates a 3 UTF-8 byte character(counting itself).

So the following next 2 bytes should start with 10MSB. But the 3rd byte is starting with 01MSB

By default, JDBC driver is decoding this stream in UTF-8 and failing on invalid a byte stream.

Assuming the third byte had started with 10MSB, the code would work without error, but it can incorrectly map all these 3 bytes into a single Unicode code point(assuming the original encoding was not UTF-8 and the corresponding unicode code point has a valid character representation).

2 Comments

This might work, but as per the text you've quoted it's not the intended purpose of the parameter and would definitely qualify as a 'Kludge'. Isn't there an endorsed way to read these characters from an SQL_ASCII DB in Java?
On a different note, any byte using the MSB as 1 is left to interpretation in ASCII. When the server character set is SQL_ASCII, the server interprets byte values 0-127 according to the ASCII standard, while byte values 128-255 are taken as uninterpreted charactersmultibyte.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.