3

I am trying to read each line of a csv file and get a "line contains NULL byte" error.

reader = csv.reader(open(mycsv, 'rU'))
for line in reader:
     print(line)


Traceback (most recent call last):
  File "<stdin>", line 1, in <module
_csv.Error: line contains NULL byte

Using the below I found that I have null bytes.

if '\0' in open(mycsv).read():
     print("have null byte")

What's the best way to work around this? Do a replace '\0' on all lines? I need to process this kind of file daily and have about 400,000 lines (1Gb) of data. I assume a replace would substantially slow this down even more.

1 Answer 1

11

Try this!

import csv 

def mycsv_reader(csv_reader): 
  while True: 
    try: 
      yield next(csv_reader) 
    except csv.Error: 
      # error handling what you want.
      pass
    continue 
  return

if __name__ == '__main__': 
    reader = mycsv_reader(csv.reader(open(mycsv, 'rU')))
    for line in reader:
        print(line)
Sign up to request clarification or add additional context in comments.

2 Comments

That works and gets me through the file, I am just wondering why I am getting these null bytes. Are they maybe used instead of commas as separators? f.count('\x00') returns 1926 of these.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.