0

I'm trying to parse this ifconfig output. I have seen another example on Stack Overflow where they did this same code however it's creating a nested list. However when I do the same thing I only get the first match options. Also, I would like to add the RX and TX packets into the list and that seems to not work as well.

Ifconfig output

Mg0_RSP0_CPU0_0 Link encap:Ethernet  HWaddr 70:e4:22:32:53:42
          inet addr:20.200.130.1  Mask:255.255.0.0
          inet6 addr: fe80::72e4:22ff:fe32:5342/64 Scope:Link
          UP RUNNING NOARP MULTICAST  MTU:1514  Metric:1
          RX packets:147918 errors:0 dropped:0 overruns:0 frame:0
          TX packets:119226 errors:0 dropped:0 overruns:0 carrier:3
          collisions:0 txqueuelen:1000
          RX bytes:103741434 (98.9 MiB)  TX bytes:5320623 (5.0 MiB)

Tg0_0_0_7_0 Link encap:Ethernet  HWaddr 78:ba:f9:35:66:46
          inet addr:13.13.13.1  Mask:255.255.255.0
          inet6 addr: fe80::7aba:f9ff:fe35:6646/64 Scope:Link
          UP RUNNING NOARP MULTICAST  MTU:1514  Metric:1
          RX packets:26 errors:0 dropped:0 overruns:0 frame:0
          TX packets:5058 errors:0 dropped:0 overruns:0 carrier:3
          collisions:0 txqueuelen:1000
          RX bytes:1832 (1.7 KiB)  TX bytes:454625 (443.9 KiB)

Script

c = []
for paragraph in if_config_output.split('\n\n'):

    ma = re.compile("^(\S+).*?inet addr:(\S+).*?Mask:(\S+)", re.MULTILINE|re.DOTALL)

    result = ma.match(paragraph)

    if result != None:

        result = ma.match(paragraph)

        interface = result.group(1)
        ip = result.group(2)
        mac = result.group(3)

        #print "interface:", interface
        #print "ip:",ip
        #print "mask:", mask

        c.append([interface, ip, mac])

print c





In [145]: c
Out[145]: [['Mg0_RSP0_CPU0_0', '1.83.53.27', '255.255.0.0']]
1
  • I ran your code (on WIN) and got the records for both ifaces. Commented Feb 18, 2016 at 0:11

2 Answers 2

1

Well, I've tested Your code, and at first got one result, second one:

>>> ['Tg0_0_0_7_0', '13.13.13.1', '255.255.255.0']

Then I looked closely at what was in Your regex and it appears that You might have additional new line before second paragraph like I had before my first, thus causing \S to stop. You could fix it with (if I am right about reason why You are getting single result), with adding \s? to beginning Your regex:

\s?^(\S+).*?inet addr:(\S+).*?Mask:(\S+)

Or, if this is the case of simple interface and IP retrieval You might use simpler and faster split...
I'll even timeit, if someone is curious:

import timeit
import re

if_config_output = """
Mg0_RSP0_CPU0_0 Link encap:Ethernet  HWaddr 70:e4:22:32:53:42
          inet addr:20.200.130.1  Mask:255.255.0.0
          inet6 addr: fe80::72e4:22ff:fe32:5342/64 Scope:Link
          UP RUNNING NOARP MULTICAST  MTU:1514  Metric:1
          RX packets:147918 errors:0 dropped:0 overruns:0 frame:0
          TX packets:119226 errors:0 dropped:0 overruns:0 carrier:3
          collisions:0 txqueuelen:1000
          RX bytes:103741434 (98.9 MiB)  TX bytes:5320623 (5.0 MiB)

Tg0_0_0_7_0 Link encap:Ethernet  HWaddr 78:ba:f9:35:66:46
          inet addr:13.13.13.1  Mask:255.255.255.0
          inet6 addr: fe80::7aba:f9ff:fe35:6646/64 Scope:Link
          UP RUNNING NOARP MULTICAST  MTU:1514  Metric:1
          RX packets:26 errors:0 dropped:0 overruns:0 frame:0
          TX packets:5058 errors:0 dropped:0 overruns:0 carrier:3
          collisions:0 txqueuelen:1000
          RX bytes:1832 (1.7 KiB)  TX bytes:454625 (443.9 KiB)
"""

ma = re.compile("^\s?(\S+).*?inet addr:(\S+).*?Mask:(\S+)", re.MULTILINE|re.DOTALL)

def split(paragraph):
    """ ugly, but faster """
    interface = paragraph.split(" Link ")[0]
    inet_mask = paragraph.split("\n")[1].split(':')
    ip, mask = inet_mask[1], inet_mask[2]
    return [interface, ip, mask]

def regex(paragraph):

    result = ma.match(paragraph)
    if result:
        result = ma.match(paragraph)
        interface = result.group(1)
        ip = result.group(2)
        mac = result.group(3)
        return [interface, ip, mac]

def test_split():
    c = []
    for paragraph in if_config_output.split('\n\n'):
        c.append(split(paragraph))
    return len(c)

def test_regex():
    c = []
    for paragraph in if_config_output.split('\n\n'):
        c.append(regex(paragraph))
    return len(c)

print ("split", timeit.timeit(stmt=test_split, number=100000))
print ("regex", timeit.timeit(stmt=test_regex, number=100000))

results

$ python --version
Python 2.7.3
$ python test.py
('split', 3.096487045288086)
('regex', 5.066282033920288)
$ python3 --version
Python 3.2.3
$ python3 test.py
split 4.155041933059692
regex 4.875624895095825
$ python3 test.py
split 4.787220001220703
regex 5.695119857788086

Anyone with Python 3.5 care to join?

Huh, strangely inconclusive.

results from repl.it/languages/python3 (Python 3.4.0)
split 1.2351078800020332
regex 1.3363793969983817

results from ideone.com (Python 2.7.9) 
('split', 0.9004449844360352)
('regex', 0.7017428874969482)

and from ideone.com (Python 3.4.3+)
split 1.2050538789480925
regex 1.7611852046102285 
Sign up to request clarification or add additional context in comments.

6 Comments

Thanks I tried your regex suggestion but wound up with the following output. Out[147]: [['Mg0_RSP0_CPU0_0', '1.83.53.27', '255.255.0.0']]
And do You get data from ifconfig directly or do You keep results in string? Add following to Your loop to examine what actually You get, and post here: print (">{}<".format(paragraph[:5]))
To answer your question I execute the ifconfig command on my linux box, which is storing the data in a variable (if_config_output) here is the output >>> In [170]: type(if_config_output) Out[170]: str
This is strange to me. I just tried the code on my machine where I just had the copied output of ifconfig in a variable, when I do that I get all the matches. But if I run the whole code where the system is executing the ifconfig and storing into a variable then I get the return output of only one group of matches.
Then most probably the cause is, like I wrote in answer - Your regex stops on newline character. This differs between output you copy from screen as You don't copy leading newline. If You want me to help you with subprocess or whatever You are using to get output, You'll need to update question with code.
|
0

Your measurement is incorrect. You call ma.match(paragraph) twice in function.

result = ma.match(paragraph)
    if result:
        result = ma.match(paragraph)

Python 3.7.1 (v3.7.1:260ec2c36a, Oct 20 2018, 03:13:28)
split 0.569942316
regex 0.643881852

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.