0

I am trying to delete empty lines in a text but when using this code here:

import io
with open("outprint6.csv", "r") as f:
for line in f:
    cleanedLine = line.strip()
    if cleanedLine: # is not empty
        print(cleanedLine)

        f = io.open('eliminado', 'a')
        f.write(unicode(cleanedLine, 'ascii'))
        f.write(u'\n')
        f.close()

I got this error:

'utf8' codec can't decode byte 0xfa in position 21: invalid start byte.

How can I fix it? I found some answer here but nothing work in this case. (I am really new in programming...)

It solves the problem with the empty lines but I can't write the text already processed to a new csv file. The text is written in Spanish. I see the errors occur while writing these kinds of letters (í, ó, etc.)

I retrieved twitter data using this code:

import tweepy
import json
import io

# Authentication details. To  obtain these visit dev.twitter.com
consumer_key = ''
consumer_secret = ''
access_token = ''
access_token_secret = ''

# This is the listener, responsible for receiving data
class StdOutListener(tweepy.StreamListener):
  def on_data(self, data):
    print '1'
    # Twitter returns data in JSON format - we need to decode it first
    decoded = json.loads(data)

    if  not decoded['text'].startswith('RT'):

        try:
            # Also, we convert UTF-8 to ASCII ignoring all bad characters sent by users
            tweet = '@%s; %s; %s; %s; %s; %s; %s; %s; %s; %s; ""[%s]""; %s' % (decoded['user']['id'], decoded['user']['location'], decoded['user']['followers_count'], decoded['user']['created_at'], decoded['user']['utc_offset'], decoded['user']['time_zone'], decoded['coordinates'], decoded['place'], decoded['id'], decoded['created_at'], decoded['text'].encode('ascii', 'ignore'), decoded['retweet_count'])

            print tweet
            f = io.open('outprint6.csv', 'a')
            f.write(tweet)
            f.write(u'\n')
            f.close()           

        except:
            pass    

  def on_error(self, status):
    print status       

# if __name__ == '__main__':
l = StdOutListener()
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

print "Showing all new tweets for"

#There are different kinds of streams: public stream, user stream, multi-user streams
# In this example follow #programming tag
# For more details refer to https://dev.twitter.com/docs/streaming-apis
stream = tweepy.Stream(auth, l)
stream.filter(locations=[-81.397882,-4.972829,-75.288231,0.762316])

For the text within 'The text field', the encoding is 'ascii', but when using it for writing into the new csv file, I got the problem...

8
  • 1
    Can you post the full traceback, including which line fails and also can you provide a small sample of input data, which exhibits the issue? Commented Mar 7, 2017 at 2:50
  • 1
    Thus is most likely to occur when the data you are reading is not actually utf8. The file you're using probably has some other encoding, you just need to use that. Commented Mar 7, 2017 at 2:52
  • 0xfa is latin1 encoding for ú. Commented Mar 7, 2017 at 3:02
  • The encoding is in 'ascii' but even using it, I got the error... Commented Mar 7, 2017 at 3:07
  • Why are you converting what is almost certainly utf8 to something else anyway? There's no reason for that at all. In your data retrieval, i mean. Commented Mar 7, 2017 at 3:12

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.