0

I followed some guides to piece together this bit of python

import requests
import sys
from bs4 import BeautifulSoup

url = requests.get(sys.argv[1])

html = BeautifulSoup(url.content,'html.parser')

for br in html.find_all("br"):
    br.replace_with(" ")

for tr in html.find_all('tr'):
    data = []   

    for td in tr.find_all('td'):
        data.append(td.text.strip())

    if data:
        print("{}".format(','.join(data)))

In Windows it works as I expect it to.

In Linux I get

Traceback (most recent call last):
  File "html2csv.py", line 19, in <module>
    print("{}".format(','.join(data)))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xb0' in position 4: ordinal not in range(128)

What do I need to change in my script to prevent this? I read that you can ignore problem characters but some say this isn't the proper way to do it? Not sure how to implement any of the solutions I found into what I have.

2
  • 2
    are you sure to be using the same version of python on both systems? (python 2.x vs python3.x) Commented Feb 4, 2020 at 11:58
  • 1
    Thanks this was the issue. "python script.py" defaulted to 2.7. I needed to run "python3 script.py" Commented Feb 4, 2020 at 13:59

3 Answers 3

1

Sorry for wasting your time.

I was using...

python script.py

Which defaults to 2.7

What I needed to run is...

python3 script.py
Sign up to request clarification or add additional context in comments.

Comments

0

I had the same issue, it seems that coding in MS Windows leave some ghost chars (guess you can configure your IDE not to do so).

Try adding # -*- coding: utf-8 -*- at the top of your script file as here:

#!/usr/bin/env python

# -*- coding: utf-8 -*-

# import ipdb; ipdb.set_trace()

import json
import os, sys

class CSV_LOADER():
    """
    Script that handles batch credentials (in CSV format), both locally and
    to remote machines.

...

Comments

0

Likely your Python IO encoding is set to ascii for some reason (likely due to misconfigured system locale settings), so everything printed to standard output (and read from standard input) is interpreted as ASCII.

Set the PYTHONIOENCODING environment variable to utf-8 before running your script (or better yet, ensure your system's locale settings are correct).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.