0

I'm trying to remove accents from data in a csv file. So I use the remove_accents function (See below) but for that I need to encode my csv files in utf-8. But I've got the error 'encoding' is an invalid keyword argument for this function
I've seen that I may have to use Python3 and then execute python3 ./myscript.py? Is this the right way to do it ? Or is there another way to remove accents wihtout having to install python3 ? Any help would be much appreciated

 #!/usr/bin/env python

import re
import string
import csv
import unicodedata

def remove_accents(data):
    return ''.join(x for x in unicodedata.normalize('NFKD', data) if \
    unicodedata.category(x)[0] == 'L').lower()


reader=csv.reader(open('infile.csv', 'r', encoding='utf-8'), delimiter='\t')
writer=csv.writer(open('outfile.csv', 'w', encoding='utf-8'), delimiter=',')

for line in reader:
    if line[0] != '':
        person=re.split(' ',line[0])

        first_name = person[0].strip().upper()
        first_name1=unicode(first_name)
        first_name2=remove_accents(first_name1)
        if len(person) == 2:
            last_name=person[1].strip().upper()
            line[0]=last_name
        line[15]=first_name2

    writer.writerow(line)
1

1 Answer 1

1

You need to use codecs.open() if you want to be able to specify an encoding. Also, unidecode.

Sign up to request clarification or add additional context in comments.

1 Comment

(thanks for answering)I've now the following error : UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 24: ordinal not in range(128)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.