Regular expression to extract data from a table in python

Question

I am new to regular expression and python: I have a data stored in a log file which I need to extract using regular expression. Below is the format :

#bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
   0         1000         0.01         0.03         0.02
   4         1000       177.69       177.88       177.79
   8         1000       175.90       176.07       176.01
  16         1000       181.51       181.73       181.60
  32         1000       199.64       199.81       199.72
  64         1000       228.10       228.27       228.19
  28         1000       278.70       278.90       278.75
  256         1000       388.26       388.49       388.39
  512         1000       593.49       593.82       593.63
  1024         1000      1044.27      1044.90      1044.59

So.. since when is SO a platform for requesting not-paid hiring..? — Niklas R
– Niklas R, Commented Apr 16, 2013 at 13:03

perreal · Accepted Answer · 2013-04-16 12:07:10Z

3

You can use split or regex to get a specific column. Split is cleaner for this case:

import re
with open("input") as input_file:
    for line in input_file:
        # using split to get the 4th column
        print line.split()[3]
        # using regex to get the 4th column
        print re.match(r'^\s*(?:[^\s]+[\s]+){3}([^\s]+)', line).group(1)

answered Apr 16, 2013 at 12:07

perreal

98.7k23 gold badges159 silver badges187 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

ganzogo · Accepted Answer · 2013-04-16 12:08:41Z

0

If you need to use regular expressions, then this script does the trick:

import re

number_pattern = '(\d+(?:\.\d+)?)'
line_pattern = '^\s+%s\s+$' % ('\s+'.join([number_pattern for x in range(5)]))

f = open('data', 'r')
for line in f:
  match = re.match(line_pattern, line)
  if match is not None:
    print match.groups()

answered Apr 16, 2013 at 12:08

ganzogo

2,62027 silver badges38 bronze badges

Comments

Muayyad Alsadi · Accepted Answer · 2013-04-16 12:14:40Z

0

you just need (\S+)

import re
pattern=re.compile('(\S+)')
f=open('data.txt', 'r')
for l in f.readlines():
    print pattern.findall(l)

you can also do the other way

import re
whitespace=re.compile('\s+')
    f=open('data.txt', 'r')
    for l in f.readlines():
        print whitespace.split(l.strip())

answered Apr 16, 2013 at 12:14

Muayyad Alsadi

1,60316 silver badges25 bronze badges

Comments

Lee · Accepted Answer · 2013-04-16 12:59:36Z

0

You could use the genfromtxt function from numpy instead:

>>> import numpy as np
>>> a = np.genfromtxt("yourlogfile.dat",skip_header=1)

a will be an array of all your data.

answered Apr 16, 2013 at 12:59

Lee

31.4k31 gold badges124 silver badges187 bronze badges

Collectives™ on Stack Overflow

Regular expression to extract data from a table in python

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related