Extract string from each line of a text file and save the output in csv rows

Question

I am trying to extract the following data srcintf,dstintf,srcaddr,dstaddr,action,schedule,service,logtraffic from a text file and save the values into a csv file with proper rows.

The input file looks like this:

edit 258
    set srcintf "Untrust"
    set dstintf "Trust"
    set srcaddr "all"
    set dstaddr "10.2.22.1/32"
    set action accept
    set schedule "always"
    set service "selling_soft_01"
    set logtraffic all
next
edit 184
    set srcintf "Untrust"
    set dstintf "Trust"
    set srcaddr "Any"
    set dstaddr "10.1.1.1/32"
    set schedule "always"
    set service "HTTPS"
    set logtraffic all
next
edit 124
    set srcintf "Untrust"
    set dstintf "Trust"
    set srcaddr "Any"
    set dstaddr "172.16.77.1/32"
    set schedule "always"
    set service "ping"
    set logtraffic all
    set nat enable
next

This is my first time programming (as you can see from my code) but maybe you can understand more about what I am trying to do. See code below.

import csv

text_file = open("fwpolicy.txt", "r")

lines = text_file.readlines()

mycsv = csv.writer(open('output.csv', 'w'))

mycsv.writerow(['srcintf', 'dstintf', 'srcaddr', 'dstaddr', 'schedule', 'service', 'logtraffic', 'nat'])

n = 0
for line in lines: 
    n = n + 1
n = 0
for line in lines: 
    n = n + 1
    if "set srcintf" in line:
            srcintf = line
    else    srcintf = 'not set'
    if "set dstintf" in line:            
        dstintf = line
    else    dstintf  = 'not set'
    if "set srcaddr" in line:           
        srcaddr = line
    else    srcaddr = 'not set'
    if "set dstaddr" in line:
            dstaddr = line
    else    dstaddr = 'not set'
    if "set action" in line:            
        action = line
    else    action = 'not set'
    if "set schedule" in line:
            schedule = line
    else    schedule = 'not set'
    if "set service" in line:
            service = line
    else    service = 'not set'
    if "set logtraffic" in line:
            logtraffic = line
    else    logtraffic = 'not set'
    if "set nat" in line:
            nat = line
    else    nat = 'not set'            

        mycsv.writerow([srcintf, dstintf, srcaddr, dstaddr, schedule, service, logtraffic, nat])

Expected results(CSV file):

srcintf,dstintf,srcaddr,dstaddr,schedule,service,logtraffic,nat
"Untrust","Trust","all","10.2.22.1/32","always","selling_soft_01",all,,

Actual results:

Traceback (most recent call last):
  File "parse.py", line 45, in <module>
    mycsv.writerow([srcintf, dstintf, srcaddr, dstaddr, schedule, service, logtraffic, nat])
NameError: name 'srcintf' is not defined

Please show real and correctly indented code. This does not even contain the colons after else statements! (use copy/paste and Ctrl-K for code formatting...) — Serge Ballesta
– Serge Ballesta, Commented Jun 12, 2019 at 8:32

doctorlove · Accepted Answer · 2019-06-12 09:01:00Z

1

You are attempting to write a row to the csv for every line in your file. You should only write the row when you see the word next, so check for that before the write to collect the terms fully for each row.

When you get that far, you will notice you have set the value to the whole line, rather than what you need after the strings. e.g. with the line

 set srcintf "Untrust"

your code

 if "set srcintf" in line: srcintf = line
 else srcintf = 'not set'

will give srcintf the value set srcintf "Untrust". Try to split the string to find the actual value?

... something like this:

text_file = open("fwpolicy.txt", "r")
lines = text_file.readlines()
mycsv = csv.writer(open('output.csv', 'w'))
mycsv.writerow(['srcintf', 'dstintf', 'srcaddr', 'dstaddr', 'schedule',
                'service', 'logtraffic', 'nat'])
for line in lines:
    if "edit" in line:
        [srcintf, dstintf, srcaddr, dstaddr, schedule,
         service, logtraffic, nat] = ['not set']*8
    elif 'next' in line:
        mycsv.writerow([srcintf, dstintf, srcaddr, dstaddr, schedule, service, logtraffic, nat])
    elif "set srcintf" in line:
         srcintf = line.split()[2]
    elif "set dstintf" in line:            
         dstintf = line.split()[2]
    elif "set srcaddr" in line:           
         srcaddr = line.split()[2]
    elif "set dstaddr" in line:
        dstaddr = line.split()[2]
    elif "set action" in line:            
        action = line.split()[2]
    elif "set schedule" in line:
        schedule = line.split()[2]
    elif "set service" in line:
        service = line.split()[2]
    elif "set logtraffic" in line:
        logtraffic = line.split()[2]
    elif "set nat" in line:
        nat = line.split()[2]

The important thing is to fill all the values for a row, and only write when you have them. The repetition can be made neater, but hopefully this helps with the idea of a state machine - see where you are at in the file to decide whether to collect values, start a new lot or write a row.

edited Jun 12, 2019 at 9:01

answered Jun 12, 2019 at 8:32

doctorlove

19.4k3 gold badges49 silver badges65 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

newpi99 Over a year ago

Thank you. I will look into "split". Hopefully I will get it right.

doctorlove Over a year ago

I've added a version that's similar to your code, but with the split in place. The dictionary version above from Thomas is much cleaner though

Serge Ballesta · Accepted Answer · 2019-06-12 09:08:57Z

1

Here is how to do that with a DictWriter

with open("fwpolicy.txt", "r") as text_file, open('output.csv', 'w', newline='') as out_file:

    fieldnames = ['srcintf', 'dstintf', 'srcaddr', 'dstaddr', 'schedule',
                  'service', 'logtraffic', 'nat']

    mycsv = csv.DictWriter(out, fieldnames=fieldnames, extrasaction='ignore',
                           quotechar=None, quoting=csv.QUOTE_NONE)
    mycsv.writeheader()

    row = {}
    for line in text_file:
        words = line.strip().split(maxsplit=2)
        if 'set' == words[0]:
            row[words[1]] = words[2]
        elif 'next' == words[0]:
            print(row)
            mycsv.writerow(row)
            row = {}

answered Jun 12, 2019 at 9:08

Serge Ballesta

150k13 gold badges137 silver badges267 bronze badges

Comments

Thomas Kimber · Accepted Answer · 2019-06-13 11:26:35Z

0

Here's how I'd approach this:

import csv
text_file = open("structured_content.txt", "r")
lines = "\n".join(text_file.readlines())
fieldnames = ['srcintf', 'dstintf', 'srcaddr', 'dstaddr', 'schedule', 'service', 'logtraffic', 'nat']

defaults = {'srcintf' : "not set", 'dstintf': "not set", 'srcaddr': "not set", 
            'dstaddr': "not set", 'schedule': "not set", 'service': "not set", 
            'logtraffic': "not set", 'nat': "not set"}

mycsv = csv.DictWriter(open('output.csv', 'w'), fieldnames)
for block in lines.split("next"):
    csv_row = {}
    for p in [(s.strip()) for s in block.replace("\n", "").split("set")]:
        s = p.split()
        if len(s)==2:
            csv_row[s[0]]=s[1]  # n.b. this includes "action" and "edit" fields, which need stripping out
            csv_write_row = {}
            for k,v in csv_row.items():
                print ( "key=",k,"value=",v )
                if k in fieldnames: # a filter to only include fields in the "fieldnames" list
                    print ( k , " is in the list - attach its value to the output dictionary")
                    csv_write_row[k]=v
            for k,v in defaults.items(): 
                if k not in csv_write_row.keys(): # pad-out the output row with any default values not lifted from the file
                    print ( k , " is not in the list - write a default out")
                    csv_write_row[k]=v
    mycsv.writerow(csv_write_row)

What I'm aiming to do here is take advantage of the structure of your file, and using the split command to break up that text string into repeating chunks. Converting your file to csv is just a matter of aligning your chunks (and nested chunks) to the csv format. csv.DictWriter provides a useful interface for saving your content down in a row-by-row basis.

If you want to set defaults for values that aren't there, I'd do that with a dictionary containing fieldname keys, and default (missing) values. You could then "wash" your prepared csv_write_row with these defaults in the case they're not present.

edited Jun 13, 2019 at 11:26

answered Jun 12, 2019 at 8:44

Thomas Kimber

11.1k4 gold badges30 silver badges47 bronze badges

6 Comments

newpi99 Over a year ago

Thank you very much Thomas! I mean, I understand maybe 30% of whats going on in that code but it works. Now I am going to study what you did... thank you again.

Thomas Kimber Over a year ago

Just making an edit here to enact the edit I made about making the defaults using a dictionary of default values. The advantage being in the future, if you want to edit these, it's just a dict edit, and not code. Also, you're welcome!

newpi99 Over a year ago

Hi Thomas. I am still in the process to understand your code and I have a question..hopefully you can help me. Is there a way to know what's going on inside the csv_write_row and debug from there? I mean, I tried changing the "k, v" variables and run the script again to see what changes.. but I was wondering if there is a way to debug in real time step by step.. are there tools that can help me with that?(maybe an IDE can do that for me? do you use one?) Thank you and the others that helped me.

Thomas Kimber Over a year ago

Sure, it's not an IDE as such, but I code in a Jupyter Notebook when I'm trying things out. I think some IDEs will give you steps and breakpoints - but I tend to just insert judicially placed print statements (see edit) run my code and then inspect whether what's printed matches what I expected. Each of those k,v blocks is an iterator that pulls the key(k) and value(v) pairs out of the referenced dictionary contents. In the first, this is used to construct the output based on the fieldnames list, and in the second, to pad out anything that might be missing.

newpi99 Over a year ago

Thank you, Thomas. Even though I don't use the proper terms(I apologize), you can understand what I am trying to ask.

|

Sebastien D · Accepted Answer · 2019-06-12 08:51:31Z

0

Here is a way to do it:

keys = ['srcintf', 'dstintf', 'srcaddr', 'dstaddr', 'schedule', 'service', 'logtraffic', 'nat']
lines
records = []
for line in lines:

    found_key = [key for key in keys if key in line]

    if len(found_key) >0:
        value = line.strip().rstrip("\n\r").replace('"', '').split(" ")[2: ]
        record[found_key[0]] = value[0]

    if 'next' in line:
        records.append(record)
        record = dict()

pd.DataFrame(records).to_csv('output.csv', index=False)

edited Jun 12, 2019 at 8:51

answered Jun 12, 2019 at 8:46

Sebastien D

4,5024 gold badges23 silver badges50 bronze badges

Collectives™ on Stack Overflow

Extract string from each line of a text file and save the output in csv rows

4 Answers 4

2 Comments

Comments

6 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

Comments

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related