0

I am new to regular expression and python: I have a data stored in a log file which I need to extract using regular expression. Below is the format :

#bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
   0         1000         0.01         0.03         0.02
   4         1000       177.69       177.88       177.79
   8         1000       175.90       176.07       176.01
  16         1000       181.51       181.73       181.60
  32         1000       199.64       199.81       199.72
  64         1000       228.10       228.27       228.19
  28         1000       278.70       278.90       278.75
  256         1000       388.26       388.49       388.39
  512         1000       593.49       593.82       593.63
  1024         1000      1044.27      1044.90      1044.59
4
  • 2
    What is the desired output? What have you tried so far? Commented Apr 16, 2013 at 11:55
  • How is this file formatted? tab seperated? (csv)? Commented Apr 16, 2013 at 11:57
  • 1
    docs.python.org/2/library/csv.html Commented Apr 16, 2013 at 12:03
  • So.. since when is SO a platform for requesting not-paid hiring..? Commented Apr 16, 2013 at 13:03

4 Answers 4

3

You can use split or regex to get a specific column. Split is cleaner for this case:

import re
with open("input") as input_file:
    for line in input_file:
        # using split to get the 4th column
        print line.split()[3]
        # using regex to get the 4th column
        print re.match(r'^\s*(?:[^\s]+[\s]+){3}([^\s]+)', line).group(1)
Sign up to request clarification or add additional context in comments.

Comments

0

If you need to use regular expressions, then this script does the trick:

import re

number_pattern = '(\d+(?:\.\d+)?)'
line_pattern = '^\s+%s\s+$' % ('\s+'.join([number_pattern for x in range(5)]))

f = open('data', 'r')
for line in f:
  match = re.match(line_pattern, line)
  if match is not None:
    print match.groups()

Comments

0

you just need (\S+)

import re
pattern=re.compile('(\S+)')
f=open('data.txt', 'r')
for l in f.readlines():
    print pattern.findall(l)

you can also do the other way

import re
whitespace=re.compile('\s+')
    f=open('data.txt', 'r')
    for l in f.readlines():
        print whitespace.split(l.strip())

Comments

0

You could use the genfromtxt function from numpy instead:

>>> import numpy as np
>>> a = np.genfromtxt("yourlogfile.dat",skip_header=1)

a will be an array of all your data.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.