0

I'm trying to parse a csv file but it seems that I'm missing something basic and can't get it right. Each raw of the csv contains a string in {} including several parameters randomly sorted such as in the example below.

Timestamp,Session Index,Event,Description,Version,Platform,Device,User ID,Params,
"Dec 03, 2014 01:30 AM",1,NoRegister,,1.4.0,iPhone,Apple iPhone 5s (GSM),,{},
"Dec 03, 2014 01:30 AM",2,HomeTab,Which tab the user viewed ,1.4.0,iPhone,Apple iPhone 5s (GSM),,{ UserID : 36875; tabName : QuickAndEasy},
"Dec 03, 2014 01:30 AM",3,UserRecipeOverview,How many users go to Overview of a recipe?,1.4.0,iPhone,Apple iPhone 5s (GSM),,{ RecipeID : 1488;  UserID : 36875},

My code is the following but I get an error that I don't understand:

counter = 0

mappedLines = {}

import csv
with open ('test.csv', 'r') as f:
    reader = csv.reader (f)

    for line in reader:
        counter = counter + 1
        lineDict = {}
        line = line.replace("{","")
        line = line.replace("}","")
        line = line.strip()
        fieldPairs = line.split(";")

        for pair in fieldPairs:
            fields = pair.split(":")
            key = fields[0].strip()
            value = fields[1].strip()
            lineDict[key] = value

        mappedLines[counter] = lineDict

def printFields(keys, lineSets):
    output_line = ""
    for key in keys:
       if key in lineSets:
           output_line = output_line + lineSets[key] + ","
       else:
           output_line += ","
    print output_line[0:len(output_line) - 1]

fields = ["UserID", "tabName", "RecipeID", "type", "searchWord", "isFromLabel"]

for key in range(1,len(mappedLines) + 1):
    lineSets = mappedLines[key]
    printFields(fields,lineSets)

Here's the Traceback:

Traceback (most recent call last):
    File "testV3.py", line 14, in <module>
      line = line.replace("{","")
AttributeError: 'list' object has no attribute 'replace'

EDIT:

I'm now triyng to include the write function to save the output to a new csv file with the following. the csv record the headers only and in column.

import csv


def printfields(keys, linesets):
    output_line = ""
    for key in keys:
        if key in linesets:
            output_line += linesets[key] + ","
        else:
            output_line += ","
    print output_line


def csv_writer(reader, path):
    """
    write reader to a csv file path
    """
    with open(path, "wd") as csv_file:
        writer = csv.writer(csv_file, delimiter=",")
        for line1 in line:
            if line1 in path
            writer.writerow(line1)

if __name__ == "__main__":
    fields = [
        "UserID", "tabName", "RecipeID", "type", "searchWord", "isFromLabel", "targetUID"
    ]
    mappedLines = {}
    with open('test.csv', 'r') as f:
        reader = csv.DictReader(f)
        for line in reader:
            fieldPairs = [
                p for p in
                line['Params'].strip().strip('}').strip('{').strip().split(';')
                if p
            ]
            lineDict = {
                pair.split()[0].strip(): pair.split(':')[1].strip()
                for pair in fieldPairs
            }
            mappedLines[reader.line_num] = lineDict
        path = "output.csv"
        csv_writer(reader, path)

    for key in sorted(mappedLines.keys()):
        linesets = mappedLines[key]
        printfields(fields, linesets)
2
  • I have solved your original question and imho your EDIT qualifies as a standalone question. If you agree, would you move your additional edit part of your question into a new question and resolved this question as answered? Commented Jan 4, 2015 at 14:50
  • Hi @dopstar, Thanks for your help, comments and recommendations when using Stack overflow. As you probably noticed I'm still learning good practices when getting help from the community. your Help helps a lot! I have now created a new post stackoverflow.com/questions/27815100/… including my edits so you can answer it. Thanks! Commented Jan 7, 2015 at 8:24

3 Answers 3

1

line is a list containing the cells of the current row. To access one of them, use a loop:

for cell in line:
    cell.replace(...)
Sign up to request clarification or add additional context in comments.

4 Comments

Hi Thanks for your feedback but I'm sorry I still don't get it. I already use a loop and I don't know what to do with yours. Could you explain more and detail how I should add/replace another loop? Tks! M.
As I wrote: line is an array, not a string. You can't use replace on it. If you want to change the cell content, you must use two loops: one for the rows, and one for the cells in a row.
@mmarboeuf: line[8] is the only cell/field with the { and } characters in it -- so you could also do something like line[8].replace(...) rather than loop over each of them.
That should be cell.replace :)
0

I have rearranged your code and modified it. The changes are that it uses csv.DictReader, and counter variable is no longer used. and the range function in the for loop is no longer used.

import csv


def printFields(keys, lineSets):
    output_line = ""
    for key in keys:
        if key in lineSets:
            output_line += lineSets[key] + ","
        else:
            output_line += ","
    print output_line


if __name__ == "__main__":
    fields = [
        "UserID", "tabName", "RecipeID", "type", "searchWord", "isFromLabel"
    ]
    mappedLines = {}
    with open('test.csv', 'r') as f:
        reader = csv.DictReader(f)
        for line in reader:
            fieldPairs = [
                p for p in
                line['Params'].strip().strip('}').strip('{').strip().split(';')
                if p
            ]
            lineDict = {
                pair.split()[0].strip(): pair.split(':')[1].strip()
                for pair in fieldPairs
            }
            mappedLines[reader.line_num] = lineDict

    for key in sorted(mappedLines.keys()):
        lineSets = mappedLines[key]
        printFields(fields, lineSets)

4 Comments

Thanks! That worked like a charm. still struggling to write the output to a csv file though
So now I'm trying to write the ouptupt to a csv file but I only get the headers in the csv file
Would you then upvote this answer as useful at the very least and thereafter create a separate question for your additional part?
I think I've done what you suggested. let me know if not. Thanks
0

You can use the following statement to remove the "{" and "}" in a list of string

line = ".".join(line).replace("{","").replace("}","").split(",")

1 Comment

Why parse the file as CSV when you then lump all the cells together again?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.