2

I believe this is a 3 step process but please bear with me. I'm currently reading Shell output which is being saved to a file and the output looks like this:

Current Output:

Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 123.345.789:1234        0.0.0.0:*               LISTEN      23044/test          
tcp        0      0 0.0.0.0:5915            0.0.0.0:*               LISTEN      99800/./serv    
tcp        0      0 0.0.0.0:1501            0.0.0.0:*                           -    

I'm trying to access each columns information based on the header value. This is something I was able to do in Powershell but not sure how to achieve it in Python.

Expected Output:

Proto,Recv-Q,Send-Q,Local Address,Foreign Address,State,PID/Program name
tcp,0,0,123.345.789:1234,0.0.0.0:*,LISTEN,23044/test          
tcp,0,0,0.0.0.0:5915,0.0.0.0:*,LISTEN,99800/./serv    
tcp,0,0,0.0.0.0:1501,0.0.0.0:*,,-    
proto = data["Proto"]
for p in proto:
    print(p)

Output: tcp tcp tcp

What I've tried?:

Where do I begin.. I've tried Splitting, Replacing and Translate. Also, I did try Regex but couldn't quite figure it out :/

Proto,Recv-Q,Send-Q,Local,Address,,,,,,,,,,,Foreign Address,,,,,,,,,State,,,,,, PID/Program,name    
tcp,,,,,,,,0,,,,,,0 123.345.789:1234,,,,,,,,0.0.0.0:*,,,,,,,,,,,,,,,LISTEN,,,,,,23021/java,,,,,,,,  
tcp,,,,,,,,0,,,,,,0 0.0.0.0:5915,,,,,,,,,,,,0.0.0.0:*,,,,,,,,,,,,,,,LISTEN,,,,,,99859/./statserv    
tcp,,,,,,,,0,,,,,,0 0.0.0.0:1501,,,,,,,,,,,,0.0.0.0:*,,,,,,,,,,,,,,,LISTEN,,,,,,-       

Since some of the headers contain a space in between them it's sort of difficult to map the columns accordingly.

Looking for the best way to approach this.

Thank you.

3 Answers 3

2

Answer updated to handle missing State value

Skip the first row, indicate that there is no header, assign header names and then split on one or more spaces.

df = pd.read_csv(sim_txt, skiprows=1, header=None, sep='\s+', 
                 names=['Proto','cv-Q','Send-Q','Local Address','Foreign Address','State','PID/Program name']
                ).apply(row_fixer, axis=1) 
print(df)

  Proto  cv-Q  Send-Q     Local Address Foreign Address   State  PID/Program name
0   tcp     0       0  123.345.789:1234       0.0.0.0:*  LISTEN        23044/test
1   tcp     0       0      0.0.0.0:5915       0.0.0.0:*  LISTEN      99800/./serv
2   tcp     0       0      0.0.0.0:5916       0.0.0.0:*     NaN      99801/./serv
3   tcp     0       0      0.0.0.0:1501       0.0.0.0:*  LISTEN                 -

df.to_csv('output.csv', index=None)

The above depends on the following function. It looks for a NaN the last column in the row which would indicate that the State value is missing. When that situation is found the last two values are swapped. (Note: this function detects NaNs by leveraging the fact that NaN != NaN):

def row_fixer(x):
    if x.iat[-1] != x.iat[-1]:
        xc = x.copy()
        xc.iat[-1] = xc.iat[-2]
        xc.iat[-2] = np.NaN
        return xc    
    return x

The example above is based on the following example data:

Proto  cv-Q  Send-Q     Local Address Foreign Address   State  PID/Program name
  tcp     0       0  123.345.789:1234       0.0.0.0:*  LISTEN        23044/test
  tcp     0       0      0.0.0.0:5915       0.0.0.0:*  LISTEN      99800/./serv
  tcp     0       0      0.0.0.0:5916       0.0.0.0:*              99801/./serv
  tcp     0       0      0.0.0.0:1501       0.0.0.0:*  LISTEN                 -
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the answer. This works well however I forgot to add the fact that the column "State" sometimes has a missing value meaning, sometimes "LISTEN" is empty. This causes the neighbour columns data to be messed up.
@Jona Updated my answer to handle that situation
2

You are post-processing the output of the netstat command. netstat itself is just reformatting the information in /proc/net/tcp, which you can also read. As with the netstat output, you may need to make your own header line, but the data lines are all space separated. A simple line.split() should do it.

If you still want to use netstat, as I said, just throw away the header line and use split. You know what the columns are.

for ln in output:
    fields = ln.split()
    print( ','.join(fields) )

2 Comments

Hey Tim, appreciate the reply. Believe it or not, after trying to follow this I am still unable to pair or match-up the data based on the headers. Even after ignoring the headers, I am still left with countless amount of spaces in-between. Any other way of attempting this?
Remember that ln.split(' ') and ln.split() do two VERY different things. My guess is you are doing the first, and that would produce the results you describe. Passing no parameters to split treats a SERIES of whitespace as a single unit.
1

Split based on a string with two or more spaces using a regex.

for ln in testset:
    splitted = re.split(r'\s{2,}', ln.replace("\n", ""))
    print(splitted)

1 Comment

Thank you for taking the time and trying to help out! Appreciate it, Freeman.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.