0

I have a Spark Dataframe that must be saved in PostgreSQL. I think I have the appropriate Python sentence except for the encoding options, since I get the following error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xf3' in position 95: ordinal not in range(128)

My current sentence is as:

df.write.jdbc(url=jdbc_url, table='{}.{}'.format(schema_name, table_name), mode='overwrite', properties=properties)

It seems by default Pyspark is trying to encode the dataframe as ASCII, thus I should specify the correct encoding (UTF-8). How to do that?

I've tried with option("charset", "utf-8"), option("encoding", "utf-8") and many other combinations I've seen in the Internet. I've also tried to add "client_encoding":"utf8" in the properties passed to jdbc. But nothing seems to work.

Any help would be really appreciated.

Additional info:

  • Python 2.7
  • Spark 1.6.2

EDIT 1

My database is UTF-8 encoded:

$ sudo -u postgres psql db_test -c 'SHOW SERVER_ENCODING'
 server_encoding 
-----------------
 UTF8
(1 row)

EDIT 2

I noticed together with this error another one was hidden in the logs: the PostgreSQL driver was complaining about the table I wanted to create was already created! Thus, I removed it from PostgreSQL and everything went like a charm :) Unfortunately, I was not able to completely understand how one thing was related to the other... Maybe because the table that was already created used ASCII encoding and there was some kind of incompatibility among it and the data that was intended to be saved?

2
  • does this post give any hint? stackoverflow.com/questions/9942594/… Commented Nov 28, 2017 at 7:20
  • I've added a second edit. I explain the issue was fixed, but still do not know how :) Commented Nov 30, 2017 at 8:35

1 Answer 1

-1

You should try checking encoding of your postgre Databse.

psql my_database -c 'SHOW SERVER_ENCODING'

If that is not a multi-byte encoding then may be you need to change it to multibyte. See this thread for changing DB encoding:

Also this official documentation might be helpful: https://www.postgresql.org/docs/9.3/static/multibyte.html

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for answering. The encoding of the database is UTF-8 (I've edited my question with the result of the command).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.