0

I am writing/storing data in MySql via python. If columns data in MySql contains "NAN" then how can I handle them. At the moment I know how to deal with empty spaces or None values but here I got stuck. the data looks like this

# file no 2
# some description line
#"Time","col.no","dat.va","all_val","graph_da","tem","vel","acc", "all.valu","rand"

"2016-02-25 14:50:00",0,12.94,20.88,17.87,"NAN",14,14,13.9,"NAN"
"2016-02-25 15:00:00",1,13.48,20.86,18,"NAN",13.4,13,13.2,"NAN"
"2016-02-25 15:10:00",2,13.89,20.84,18.07,18.03,12.8,12,12.6,13 
"2016-02-25 21:30:00",3,15.76,20.18,17.96,18.25,0.09,0,0.07,0.16
"2016-02-25 21:40:00",4,15.72,,17.91,,0.08,"NAN",0.09,0.18
"2016-02-25 21:50:00",5,15.66,20.07,17.86,18.17,0.03,"NAN",0.05,0.09
"2016-02-25 23:30:00",6,14.98,19.26,17.12,17.44,"NAN","NAN","NAN","NAN"

there are several thousands line which includes randomly this "NAN" therefore It would be great if somebody give me logic to deal, e.g whenever it comes across NAN, it put NAN or none. my code to deal with empty spaces and None is like this.

with open(filepath) as f:
    lines = f.readlines()
max_lines = len(lines)
for k, line in enumerate(lines):
    if k >= (int(skip_header_line) + int(index_line_number)):
        data_tmp = line.strip().split(',')

        strDate = data_tmp[0].replace("\"", "")
    strDate = strDate.split('.')[0]   
    timestamp = datetime.datetime.strptime(strDate, '%Y-%m-%d %H:%M:%S') 
    ts = calendar.timegm(timestamp.timetuple())           
  #  _ts = ts * 1000000 

        data_buffer = [ts] + data_tmp[1:]                                                                    
        data_buffer = [ts] + data_tmp[1:]
        data_buffer = [v if v is not "" and v is not " " else None for v in data_buffer]
        print data_buffer
        cursor.execute(add_data, data_buffer)
        cnx.commit()
        with open(marker_file, "w") as f:
            f.write(" ".join([ str(item[0]), str(data_tmp[0]), str(max_lines),
                   str(k-int(skip_header_line)+1) ]))
cursor.close()
cnx.close()

Any help or tip would be greatly appreciated because I try my best to find the solution but I found nothing useful . Thanx in advance for help.

6
  • 1
    Why not simply pre-process the entire csv or dat file, and replace "NaN" with empty_string? You will not even need to write a python script for that, just a simple text editor search-and-replace will work. When the data are imported to MySQL, they will be interpreted as null values, just like the null values that appear on line 4 in the sample data you provided in your question. Commented Dec 7, 2016 at 22:30
  • @dreftymac thanx for the useful tips. I want exactly this thing so that in pre process it replace "NAN" with empty string, but I am unable to make the relevant changes. Can you show where I can make those changes. I would be really grateful for this help. Commented Dec 7, 2016 at 22:35
  • 1
    This question already has an answer on stackoverflow. See e.g., stackoverflow.com/a/4128192/42223. Take a look at that first then come back here if you still need help. Commented Dec 7, 2016 at 22:48
  • @dreftymac thanx for help. I will give it a look and will come if I need help. thanx for the tip. Commented Dec 7, 2016 at 22:50
  • this does not have to be this complex. Please take a look at the following python script. github.com/dreftymac/public.lab/blob/master/temp/throwaway.py Commented Dec 8, 2016 at 0:23

1 Answer 1

2

Don't write your own CSV parsing code. Use the module.

Apart from that you can use lambda functions to filter lists, as shown below. There's one function that removes empty lines and comment lines from the csv, and one that converts "NAN" to None on each row.

import csv
import calendar
from datetime import datetime

with open('foo.csv', newline='') as csvfile:
    data_lines = lambda l: l[0] != '#' and l.strip()
    NAN_2_None = lambda i: None if i == "NAN" else i

    csvreader = csv.reader(filter(data_lines, csvfile), delimiter=',', quotechar='"')
    for row in csvreader:
        dt = datetime.strptime(row[0], '%Y-%m-%d %H:%M:%S')
        row[0] = calendar.timegm(dt.timetuple())
        data = map(NAN_2_None, row)

        # cursor.execute(add_data, data)
        print(list(data))

prints

[1456411800, '0', '12.94', '20.88', '17.87', None, '14', '14', '13.9', None]
[1456412400, '1', '13.48', '20.86', '18', None, '13.4', '13', '13.2', None]
[1456413000, '2', '13.89', '20.84', '18.07', '18.03', '12.8', '12', '12.6', '13 ']
[1456435800, '3', '15.76', '20.18', '17.96', '18.25', '0.09', '0', '0.07', '0.16']
[1456436400, '4', '15.72', '', '17.91', '', '0.08', None, '0.09', '0.18']
[1456437000, '5', '15.66', '20.07', '17.86', '18.17', '0.03', None, '0.05', '0.09']
[1456443000, '6', '14.98', '19.26', '17.12', '17.44', None, None, None, None]

However, it will probably be faster to bulk-load the csv with MySQL, think about pre-processing the file with a simple tool like sed and use that approach instead.

Sign up to request clarification or add additional context in comments.

4 Comments

I am extremely sorry for late reply. I was stuck in some other things. I applied your solution now and its works perfectly fine. I am accepting your answer. But I am thankful that you point this Bulk insertion the csv with MySql. I will give a try to use sed and do some research. I will come back if I am stuck with bulk insertion.
Okay, glad to hear that this worked for you. Follow-ups please as new questions, not as comments here.
Well... I can't stop you, but it's not nice to poke people directly. Also there's no guarantee that I could even help you with a mySQL issue. It's much better if you ask a self-contained question that stands on its own. People will come and try to answer it.
Thanks for the useful tips . I will follow these guidelines.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.