Python unicode error in linux but not windows

Question

I followed some guides to piece together this bit of python

import requests
import sys
from bs4 import BeautifulSoup

url = requests.get(sys.argv[1])

html = BeautifulSoup(url.content,'html.parser')

for br in html.find_all("br"):
    br.replace_with(" ")

for tr in html.find_all('tr'):
    data = []   

    for td in tr.find_all('td'):
        data.append(td.text.strip())

    if data:
        print("{}".format(','.join(data)))

In Windows it works as I expect it to.

In Linux I get

Traceback (most recent call last):
  File "html2csv.py", line 19, in <module>
    print("{}".format(','.join(data)))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xb0' in position 4: ordinal not in range(128)

What do I need to change in my script to prevent this? I read that you can ignore problem characters but some say this isn't the proper way to do it? Not sure how to implement any of the solutions I found into what I have.

are you sure to be using the same version of python on both systems? (python 2.x vs python3.x) — Arount
– Arount, Commented Feb 4, 2020 at 11:58
Thanks this was the issue. "python script.py" defaulted to 2.7. I needed to run "python3 script.py" — Chris
– Chris, Commented Feb 4, 2020 at 13:59

Chris · Accepted Answer · 2020-02-04 13:58:50Z

1

Sorry for wasting your time.

I was using...

python script.py

Which defaults to 2.7

What I needed to run is...

python3 script.py

answered Feb 4, 2020 at 13:58

Chris

1,3193 gold badges11 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Fernando · Accepted Answer · 2020-02-04 13:24:11Z

0

I had the same issue, it seems that coding in MS Windows leave some ghost chars (guess you can configure your IDE not to do so).

Try adding # -*- coding: utf-8 -*- at the top of your script file as here:

#!/usr/bin/env python

# -*- coding: utf-8 -*-

# import ipdb; ipdb.set_trace()

import json
import os, sys

class CSV_LOADER():
    """
    Script that handles batch credentials (in CSV format), both locally and
    to remote machines.

...

answered Feb 4, 2020 at 13:24

Fernando

831 silver badge3 bronze badges

Comments

AKX · Accepted Answer · 2020-02-04 13:27:30Z

0

Likely your Python IO encoding is set to ascii for some reason (likely due to misconfigured system locale settings), so everything printed to standard output (and read from standard input) is interpreted as ASCII.

Set the PYTHONIOENCODING environment variable to utf-8 before running your script (or better yet, ensure your system's locale settings are correct).

answered Feb 4, 2020 at 13:27

AKX

171k16 gold badges146 silver badges228 bronze badges

Collectives™ on Stack Overflow

Python unicode error in linux but not windows

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related