1

I am getting decoding error while using pd.read_sql.I am querying Oracle DB , and using cx_oracle library.

I have tried passing the encoding parameter in the Oracle Connection String as below.

cx_oracle.connect(user=user_name, password=pwd, dsn=dsn_tns,encoding="UTF-8")

The encoding options i have tried and error i am getting everytime pd.read_sql runs is as below:

  1. With encoding = 'UTF-8', error is utf-8' codec can't decode byte 0xc3 in position 34: unexpected end of data

  2. With encoding="UTF-8",nencoding="UTF-8", error is utf-8' codec can't decode byte 0xc3 in position 34: unexpected end of data With

  3. With encoding="UTF-16", nencoding="UTF-16", error is ORA-29275: partial multibyte character

The NLS_CHARACTERSET is AL32UTF8.

Anyone who has faced this issue and resolved, please suggest.

Thanks

10
  • 1
    Which Python version? What does your code look like? What happens if you remove any encoding parameters? Python 3 strings are Unicode so there shouldn't be any need for encoding. Using UTF16 when the database field is UTF8 only guarantees an error Commented Sep 9, 2020 at 6:59
  • 1
    What does the input look like and where did it come? It's quite possible it already contains invalid characters. If you loaded some single-byte text from a file and tried to pass it as-is to the database, all bytes with values>127 are invalid in the UTF8 encoding Commented Sep 9, 2020 at 7:01
  • @PanagiotisKanavos: Python version is 3.7. Removing encoding parameters is giving UTF-8 error. Commented Sep 9, 2020 at 8:42
  • @PanagiotisKanavos: The input was loaded from an existing DB to the Oracle DB. It was not loaded from file. these are some sample data. Commented Sep 9, 2020 at 8:50
  • 1
    BTW that error means the data is not UTF8. So you'd have to post a CREATE TABLE statement and INSERT clauses that create a table with a field in that specific encoding, with test data, that people can use to fully reproduce the problem. Perhaps the problem is a hard-coded non-UTF8 encoding in ORACLE HOME? Commented Sep 9, 2020 at 9:37

2 Answers 2

2

If you have corrupt data try something like suggested in the cx_Oracle doc Querying Corrupt Data:

def OutputTypeHandler(cursor, name, defaultType, size, precision, scale):
    if defaultType == cx_Oracle.STRING:
        return cursor.var(defaultType, size, arraysize=cursor.arraysize,
                encodingErrors="replace")

cursor.outputtypehandler = OutputTypeHandler

cursor.execute("select column1, column2 from SomeTableWithBadData")
Sign up to request clarification or add additional context in comments.

4 Comments

i dont think it is corrupt data, but rather data in non english characters. Like this "惠州市", "Филиа" , "plynárenský". So i should not be replacing them. i should be able to read them as is.
@ashasasidharan this page you're reading is UTF8 and yet you didn't have to do anything to post those non-English characters.
The documentation link suggests passing encoding_errors="replace". This failed with "error":"'encoding_errors' is an invalid keyword argument for this function when I tried it. Switching to camelCase encodingErrors="replace" as shown above fixed the error
Thanks for the tip. The camel case name was deprecated a while back so new code with the latest driver should use the underscore.
0

You first need to determine the character set that your server is set to:

SELECT value
FROM nls_database_parameters
WHERE parameter in ('NLS_CHARACTERSET', 'NLS_NCHAR_CHARACTERSET');

Then set the encoding and nencoding parameters of .connect() to match client to the server. ( AL32UTF8 on server matches UTF-8 on client)

If your columns are of types nvarchar, nclob etc. then you need to use the nencoding parameter as well. But you did not post the column datatypes nor your query.

cx_oracle.connect(user=user_name, password=pwd, dsn=dsn_tns, encoding="UTF-8", nencoding="UTF-8")

If your server really is AL32UTF8 and cx_oracle still gives you a decode error with encoding set to UTF-8, then as the other answer says, you have corrupt data. Test by querying a different, smaller set of rows.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.