1

I have a log file that I would like to parse and plot using matplotlib. After skipping the first 6 lines, I have data of interest. e.g. my log file looks like this:

# 2014-05-09 17:51:50,473 - root - INFO - Epoch = 1, batch = 216, Classif Err = 52.926, lg(p) -1.0350
# 2014-05-09 17:51:53,749 - root - INFO - Test set error = 37.2317

I want to make a plot of the Classif Err vs Test set error for each Epoch.

My first attempt at this:

import numpy
from numpy import *
from pylab import *

f1 = open('log.txt', 'r')
FILE = f1.readlines()
f1.close()

for line in FILE:
    line = line.strip()
    if ('Epoch' in line):
        epoch += line.split('Epoch = ')
    elif('Test set error' in line):
        test_err += line.split('Test set error = ')

I see this error:

Traceback (most recent call last):
  File "logfileparse.py", line 18, in <module>
    epoch += line.split('Epoch = ')
NameError: name 'epoch' is not defined
6
  • 2
    Read the error carefully "epoch" undefined. to concatenate epoch, first epoch should be initialized. like epoch = [] or "" or anything you want Commented May 10, 2014 at 11:53
  • Why you are not using line.spli(' ') to create a list of all words then grab your interest parts using list indexing? Commented May 10, 2014 at 11:56
  • @Shahinism ok. I see ` '2014-05-09', '18:35:59,131', '-', 'root', '-', 'INFO', '-', 'Test', 'set', 'error', '=', '16.0433'` which are different values. Commented May 10, 2014 at 12:10
  • Sorry about my bad suggestion! as @iamsudip said your problem here is that you are not defined the variable epoch in your code! you can define it as a string just before your for loop like epoch="" and I think every think will work just fine. Commented May 10, 2014 at 12:13
  • 1
    what exact output do you want? Commented May 10, 2014 at 12:17

4 Answers 4

1

I guess you need to get a set of epoch and the test set errors together to plot them. Assuming the error line is always after the line with 'epoch', try this:

data_points = []
ep = 'Epoch = (\d+), batch = \d+, Classif Err = (\d+\.?\d+)'

with open('file.txt') as f:
    for line in f:
       epoch = re.findall(ep, line)
       if epoch:
           error_line = next(f) # grab the next line, which is the error line
           error_value = error_line[error_line.rfind('=')+1:]
           data_points.append(map(float,epoch[0]+(error_value,)))

Now data_points will be a list of lists, the first value is the epoch, the second the classif err value, and the third the error value.

The regular expression will return a list with a tuple:

>>> re.findall(ep, i)
[('1', '52.926')]

Here i is your first line

To grab the error code, find the last = and then the error code is the remaining characters:

>>> i2 = '# 2014-05-09 17:51:53,749 - root - INFO - Test set error = 37.2317'
>>> i2[i2.rfind('=')+1:]
' 37.2317'

I used map(float,epoch[0]+(error_value,)) to convert the values from strings to floats:

>>> map(float, re.findall(ep, i)[0]+(i2[i2.rfind('=')+1:],))
[1.0, 52.926, 37.2317]
Sign up to request clarification or add additional context in comments.

1 Comment

thanks for this tip. I was able to plot the values like this: epochs = [x[0] for x in data_points] cerr = [x[1] for x in data_points] terr = [x[2] for x in data_points] plot(epochs, cerr, label='class_err') plot(epochs, terr, label='test_err') xlabel('epochs') ylabel('error') title('Classification Error and Test Error') pylab.legend(loc='upper right') grid(True) show()
1

As I tried your code more, I saw there is another problem after you didn't defined epoch variable. And by that I mean you are trying to concatenate a list object to a string object as your code shows to us! I tried to validate this code and got something like this:

epoch = []
for line in f1.readlines():
    line_list = line.split(' ')
    if 'Epoch' in line_list:
        epoch_index = line_list.index('Epoch')
        message = ' '.join(line_list[epoch_index:])
        epoch.append(message)
    elif 'Test set error' in line_list:
        error_index = line_list.index('Test set error')
        message = ' '.join(line_list[error_index:])
        epoch.append(message)

2 Comments

might pay to actually explain what your code is doing and
Here I just want to extract some keywords and append the string after that as a list argument to the epoch list! It is just what I found in the question explanation. But I think there is enough code here to answer the actual question :D
1

This will find Epoch and its value, appending it to a list.

epoch=[] # define epoch
with open('log.txt', 'r') as f: #  use with to open files as it automatically closes the file
    for line in f:
        if "Epoch" in line:
            epoch.append(line[line.find("Epoch ="):].split(',')[0])
        elif('Test set error' in line):
            test_error.append(line[line.find("Test set error ="):].split(',')[0]) 
print epoch
['Epoch = 1']
print test_error
['Test set error = 37.2317']

Uses index of "Epoch" to slice the string, split on ',' and append first element "Epoch = ..." to the epoch list.

Comments

0

You do not initialize the variable epoch. is important that you do before:

epoch += line.split('Epoch = ')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.