I followed some guides to piece together this bit of python
import requests
import sys
from bs4 import BeautifulSoup
url = requests.get(sys.argv[1])
html = BeautifulSoup(url.content,'html.parser')
for br in html.find_all("br"):
br.replace_with(" ")
for tr in html.find_all('tr'):
data = []
for td in tr.find_all('td'):
data.append(td.text.strip())
if data:
print("{}".format(','.join(data)))
In Windows it works as I expect it to.
In Linux I get
Traceback (most recent call last):
File "html2csv.py", line 19, in <module>
print("{}".format(','.join(data)))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xb0' in position 4: ordinal not in range(128)
What do I need to change in my script to prevent this? I read that you can ignore problem characters but some say this isn't the proper way to do it? Not sure how to implement any of the solutions I found into what I have.