parsing csv file in python and csv module

Question

I'm trying to parse a csv file but it seems that I'm missing something basic and can't get it right. Each raw of the csv contains a string in {} including several parameters randomly sorted such as in the example below.

Timestamp,Session Index,Event,Description,Version,Platform,Device,User ID,Params,
"Dec 03, 2014 01:30 AM",1,NoRegister,,1.4.0,iPhone,Apple iPhone 5s (GSM),,{},
"Dec 03, 2014 01:30 AM",2,HomeTab,Which tab the user viewed ,1.4.0,iPhone,Apple iPhone 5s (GSM),,{ UserID : 36875; tabName : QuickAndEasy},
"Dec 03, 2014 01:30 AM",3,UserRecipeOverview,How many users go to Overview of a recipe?,1.4.0,iPhone,Apple iPhone 5s (GSM),,{ RecipeID : 1488;  UserID : 36875},

My code is the following but I get an error that I don't understand:

counter = 0

mappedLines = {}

import csv
with open ('test.csv', 'r') as f:
    reader = csv.reader (f)

    for line in reader:
        counter = counter + 1
        lineDict = {}
        line = line.replace("{","")
        line = line.replace("}","")
        line = line.strip()
        fieldPairs = line.split(";")

        for pair in fieldPairs:
            fields = pair.split(":")
            key = fields[0].strip()
            value = fields[1].strip()
            lineDict[key] = value

        mappedLines[counter] = lineDict

def printFields(keys, lineSets):
    output_line = ""
    for key in keys:
       if key in lineSets:
           output_line = output_line + lineSets[key] + ","
       else:
           output_line += ","
    print output_line[0:len(output_line) - 1]

fields = ["UserID", "tabName", "RecipeID", "type", "searchWord", "isFromLabel"]

for key in range(1,len(mappedLines) + 1):
    lineSets = mappedLines[key]
    printFields(fields,lineSets)

Here's the Traceback:

Traceback (most recent call last):
    File "testV3.py", line 14, in <module>
      line = line.replace("{","")
AttributeError: 'list' object has no attribute 'replace'

EDIT:

I'm now triyng to include the write function to save the output to a new csv file with the following. the csv record the headers only and in column.

import csv


def printfields(keys, linesets):
    output_line = ""
    for key in keys:
        if key in linesets:
            output_line += linesets[key] + ","
        else:
            output_line += ","
    print output_line


def csv_writer(reader, path):
    """
    write reader to a csv file path
    """
    with open(path, "wd") as csv_file:
        writer = csv.writer(csv_file, delimiter=",")
        for line1 in line:
            if line1 in path
            writer.writerow(line1)

if __name__ == "__main__":
    fields = [
        "UserID", "tabName", "RecipeID", "type", "searchWord", "isFromLabel", "targetUID"
    ]
    mappedLines = {}
    with open('test.csv', 'r') as f:
        reader = csv.DictReader(f)
        for line in reader:
            fieldPairs = [
                p for p in
                line['Params'].strip().strip('}').strip('{').strip().split(';')
                if p
            ]
            lineDict = {
                pair.split()[0].strip(): pair.split(':')[1].strip()
                for pair in fieldPairs
            }
            mappedLines[reader.line_num] = lineDict
        path = "output.csv"
        csv_writer(reader, path)

    for key in sorted(mappedLines.keys()):
        linesets = mappedLines[key]
        printfields(fields, linesets)

I have solved your original question and imho your EDIT qualifies as a standalone question. If you agree, would you move your additional edit part of your question into a new question and resolved this question as answered? — dopstar
– dopstar, Commented Jan 4, 2015 at 14:50
Hi @dopstar, Thanks for your help, comments and recommendations when using Stack overflow. As you probably noticed I'm still learning good practices when getting help from the community. your Help helps a lot! I have now created a new post stackoverflow.com/questions/27815100/… including my edits so you can answer it. Thanks! — mmarboeuf
– mmarboeuf, Commented Jan 7, 2015 at 8:24

Burhan Khalid · Accepted Answer · 2015-01-04 08:49:07Z

1

line is a list containing the cells of the current row. To access one of them, use a loop:

for cell in line:
    cell.replace(...)

edited Jan 4, 2015 at 8:49

Burhan Khalid

175k20 gold badges254 silver badges291 bronze badges

answered Dec 18, 2014 at 7:43

user1907906

Sign up to request clarification or add additional context in comments.

4 Comments

mmarboeuf Over a year ago

Hi Thanks for your feedback but I'm sorry I still don't get it. I already use a loop and I don't know what to do with yours. Could you explain more and detail how I should add/replace another loop? Tks! M.

user1907906 Over a year ago

As I wrote: line is an array, not a string. You can't use replace on it. If you want to change the cell content, you must use two loops: one for the rows, and one for the cells in a row.

martineau Over a year ago

@mmarboeuf: line[8] is the only cell/field with the { and } characters in it -- so you could also do something like line[8].replace(...) rather than loop over each of them.

Burhan Khalid Over a year ago

That should be cell.replace :)

dopstar · Accepted Answer · 2014-12-18 11:00:39Z

0

I have rearranged your code and modified it. The changes are that it uses csv.DictReader, and counter variable is no longer used. and the range function in the for loop is no longer used.

import csv


def printFields(keys, lineSets):
    output_line = ""
    for key in keys:
        if key in lineSets:
            output_line += lineSets[key] + ","
        else:
            output_line += ","
    print output_line


if __name__ == "__main__":
    fields = [
        "UserID", "tabName", "RecipeID", "type", "searchWord", "isFromLabel"
    ]
    mappedLines = {}
    with open('test.csv', 'r') as f:
        reader = csv.DictReader(f)
        for line in reader:
            fieldPairs = [
                p for p in
                line['Params'].strip().strip('}').strip('{').strip().split(';')
                if p
            ]
            lineDict = {
                pair.split()[0].strip(): pair.split(':')[1].strip()
                for pair in fieldPairs
            }
            mappedLines[reader.line_num] = lineDict

    for key in sorted(mappedLines.keys()):
        lineSets = mappedLines[key]
        printFields(fields, lineSets)

answered Dec 18, 2014 at 11:00

dopstar

1,48810 silver badges20 bronze badges

4 Comments

mmarboeuf Over a year ago

Thanks! That worked like a charm. still struggling to write the output to a csv file though

mmarboeuf Over a year ago

So now I'm trying to write the ouptupt to a csv file but I only get the headers in the csv file

dopstar Over a year ago

Would you then upvote this answer as useful at the very least and thereafter create a separate question for your additional part?

mmarboeuf Over a year ago

I think I've done what you suggested. let me know if not. Thanks

Mohit Thakur · Accepted Answer · 2014-12-18 13:46:08Z

0

You can use the following statement to remove the "{" and "}" in a list of string

line = ".".join(line).replace("{","").replace("}","").split(",")

edited Dec 18, 2014 at 13:46

Mohit Thakur

5967 silver badges12 bronze badges

answered Dec 18, 2014 at 9:39

Pradip Das

7381 gold badge7 silver badges16 bronze badges

1 Comment

user1907906 Over a year ago

Why parse the file as CSV when you then lump all the cells together again?

Collectives™ on Stack Overflow

parsing csv file in python and csv module

3 Answers 3

4 Comments

4 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

4 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related