2

I have a csv file containing binary fields, and when I read it by csv.reader(f), I get

containing NULL values.

I've tried all kinds of solutions on the web such as this, this and this but still, the same error pops up. I managed to read it line by line and separate it by ,, but some fields have also , within it, so I'm wondering how I can read and extract the columns? An example of a row is as bellow:

212344408,"cp233.net","net","cp233","clientTransferProhibited,ClientDeleteProhibited","ENAME TECHNOLOGY CO., LTD.",1331,"DNS1.IIDNS.COM","DNS2.IIDNS.COM","2017-02-14","2018-02-14","2017-02-14","WANG MIN CHUN","wangminchun","WANG MIN CHUN","wangminchun","[email protected]","QUANZHOUSHIANXIXIANCHANGKENGXIANGHUAMEICUN","QUAN ZHOU HI","FU,JIAN","362421","CN","+86.59523128184","+86.59523128184","%^^<AD>!^S\0<A8>E<98><AC>/^<A5><A0><C9>7","WANG MIN CHUN","WANG MIN CHUN","[email protected]","WANG MIN CHUN","WANG MIN CHUN","[email protected]",0,"2017-03-14 21:33:15","2017-03-12 20:44:02",0,"whois_zone_snr","2017-03-14 21:33:15",\N

I would appreciate any suggestions.

9
  • Why is there a \N in the end ? Commented Apr 6, 2017 at 19:59
  • Could you show your cvs object configuration? Commented Apr 6, 2017 at 20:01
  • 1
    @SatishGarg: that's a common representation for a NUL byte. Commented Apr 6, 2017 at 20:02
  • Is this Python 2 or 3? Have you tried the reader = csv.reader(line.translate({0: None}) for line in f) approach (e.g. simply removing the NUL bytes)? Commented Apr 6, 2017 at 20:03
  • Possible duplicate of "Line contains NULL byte" in CSV reader (Python) Commented Apr 6, 2017 at 20:06

2 Answers 2

3

Pandas worked great for my case and could retrieve the file and skip those rows that were broken because of weird characters.

import pandas as pd

df = pandas.read_csv(filename, verbose =True , warn_bad_lines = True, error_bad_lines=False, names = header)
Sign up to request clarification or add additional context in comments.

1 Comment

Pandas is the way to go. the df above will create a dataframe which is a much more permissive structure. So you should not run into the same errors as when using the csv module.
0

This works fine on your example, I even replaced one string with NULL and it handled it just fine.

test.csv:

212344408,"cp233.net","net","cp233","clientTransferProhibited,ClientDeleteProhibited","ENAME TECHNOLOGY CO., LTD.",1331,"DNS1.IIDNS.COM","DNS2.IIDNS.COM","2017-02-14","2018-02-14","2017-02-14","WANG MIN CHUN","wangminchun","WANG MIN CHUN","wangminchun","[email protected]","QUANZHOUSHIANXIXIANCHANGKENGXIANGHUAMEICUN","QUAN ZHOU HI","FU,JIAN","362421","CN","+86.59523128184","+86.59523128184","%^^<AD>!^S\0<A8>E<98><AC>/^<A5><A0><C9>7","WANG MIN CHUN","WANG MIN CHUN","[email protected]","WANG MIN CHUN","WANG MIN CHUN","[email protected]",0,"2017-03-14 21:33:15","2017-03-12 20:44:02",0,"whois_zone_snr","2017-03-14 21:33:15",\N
212344408,NULL,"net","cp233","clientTransferProhibited,ClientDeleteProhibited","ENAME TECHNOLOGY CO., LTD.",1331,"DNS1.IIDNS.COM","DNS2.IIDNS.COM","2017-02-14","2018-02-14","2017-02-14","WANG MIN CHUN","wangminchun","WANG MIN CHUN","wangminchun","[email protected]","QUANZHOUSHIANXIXIANCHANGKENGXIANGHUAMEICUN","QUAN ZHOU HI","FU,JIAN","362421","CN","+86.59523128184","+86.59523128184","%^^<AD>!^S\0<A8>E<98><AC>/^<A5><A0><C9>7","WANG MIN CHUN","WANG MIN CHUN","[email protected]","WANG MIN CHUN","WANG MIN CHUN","[email protected]",0,"2017-03-14 21:33:15","2017-03-12 20:44:02",0,"whois_zone_snr","2017-03-14 21:33:15",\N

code:

import csv
with open('test.csv', 'r') as f:
    reader = csv.reader(f)
    for row in reader:
        print(row)

If that's not the behaviour you're experiencing could you provide a line where it fails?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.