I am trying to delete empty lines in a text but when using this code here:
import io
with open("outprint6.csv", "r") as f:
for line in f:
cleanedLine = line.strip()
if cleanedLine: # is not empty
print(cleanedLine)
f = io.open('eliminado', 'a')
f.write(unicode(cleanedLine, 'ascii'))
f.write(u'\n')
f.close()
I got this error:
'utf8' codec can't decode byte 0xfa in position 21: invalid start byte.
How can I fix it? I found some answer here but nothing work in this case. (I am really new in programming...)
It solves the problem with the empty lines but I can't write the text already processed to a new csv file. The text is written in Spanish. I see the errors occur while writing these kinds of letters (í, ó, etc.)
I retrieved twitter data using this code:
import tweepy
import json
import io
# Authentication details. To obtain these visit dev.twitter.com
consumer_key = ''
consumer_secret = ''
access_token = ''
access_token_secret = ''
# This is the listener, responsible for receiving data
class StdOutListener(tweepy.StreamListener):
def on_data(self, data):
print '1'
# Twitter returns data in JSON format - we need to decode it first
decoded = json.loads(data)
if not decoded['text'].startswith('RT'):
try:
# Also, we convert UTF-8 to ASCII ignoring all bad characters sent by users
tweet = '@%s; %s; %s; %s; %s; %s; %s; %s; %s; %s; ""[%s]""; %s' % (decoded['user']['id'], decoded['user']['location'], decoded['user']['followers_count'], decoded['user']['created_at'], decoded['user']['utc_offset'], decoded['user']['time_zone'], decoded['coordinates'], decoded['place'], decoded['id'], decoded['created_at'], decoded['text'].encode('ascii', 'ignore'), decoded['retweet_count'])
print tweet
f = io.open('outprint6.csv', 'a')
f.write(tweet)
f.write(u'\n')
f.close()
except:
pass
def on_error(self, status):
print status
# if __name__ == '__main__':
l = StdOutListener()
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
print "Showing all new tweets for"
#There are different kinds of streams: public stream, user stream, multi-user streams
# In this example follow #programming tag
# For more details refer to https://dev.twitter.com/docs/streaming-apis
stream = tweepy.Stream(auth, l)
stream.filter(locations=[-81.397882,-4.972829,-75.288231,0.762316])
For the text within 'The text field', the encoding is 'ascii', but when using it for writing into the new csv file, I got the problem...
0xfais latin1 encoding for ú.