I have to download the information from a PostgreSQL server (that we don't have any control over) to CSV for some non-critical analysis (basically we're looking for tables containing a specific string in any row or column), so I decided to use Pandas read_sql_table to do this, but I keep getting an UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 7: invalid start byte error on some tables, after researching through other SO questions, I've changed the client encoding to UTF8 but the error still happens. The server encoding is SQL_ASCII.
An oversimplified version of my script would look like this:
ENCODING = 'utf8'
conn_str = f"postgresql+psycopg2://{config['DBUSER']}:{config['DBPASS']}@{config['DBHOST']}/{config['DBNAME']}"
engine = create_engine(conn_str, client_encoding=ENCODING, pool_recycle=36000)
conn = engine.connect()
server = self.conn.execute("SHOW SERVER_ENCODING").fetchone()
print("Server Encoding ", server.server_encoding)
client = self.conn.execute("SHOW CLIENT_ENCODING").fetchone()
print("Client Encoding ", client.client_encoding)
df = pandas.read_sql_table(VIEWNAME, conn, SCHEMA)
Outputs:
Server Encoding SQL_ASCII
Client Encoding UNICODE
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 7: invalid start byte
As far as I can tell this issue is with the underlying SQLAlchemy connection, so I would like to solve that at the connection level, and if that's not possible, I can get away with downloading all the non problematic rows in the table but it seems there's no support to do something like that
Server Encoding SQL_ASCII. Per Localization, Thus, this setting is not so much a declaration that a specific encoding is in use, as a declaration of ignorance about the encoding.". This means you need to find out what client encoding was used to put the data into the table. Best bet is one of the Windows code pages.