Read rows in text file based on different column headers using Python

Question

I have text file which contains data for two tables with two column headers. I want to read each line and save lines or rows in two files for each table. I want lines after (number amount) columns header in one file and (code volume dim) column header in other. Or, I want to separate data for similar table.

First few lines of the file. Number of rows in tables may vary.

    number amount
    10 34
    23 65
    54 07

    code volume dim
    1 56 34
    23 57 565

    number amount
    40 674
    73 2365

    code volume dim
    341 3456 6534
    23 0957 908565

    number amount
    210 4534
    2343 4565

same pattern repeats.

Are number same in all file? Are all numbers integers?

jezrael
– jezrael

2016-01-28 10:05:58 +00:00
Commented Jan 28, 2016 at 10:05 — jezrael
– jezrael, Commented Jan 28, 2016 at 10:05

jezrael · Accepted Answer · 2016-01-28 10:06:52Z

If text number is same in all file, and text code too, you can use read_csv and then filter columns rows by subset by contains and isnull, notnull:

import pandas as pd
import numpy as np
import io

temp=u""" number amount
    10 34
    23 65
    54 07

    code volume dim
    1 56 34
    23 57 565

    number amount
    40 674
    73 2365

    code volume dim
    341 3456 6534
    23 0957 908565

    number amount
    210 4534
    2343 4565"""

#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), sep="\s+", index_col=None, skip_blank_lines=True, names=['a', 'b', 'c'])
print df
         a       b       c
0   number  amount     NaN
1       10      34     NaN
2       23      65     NaN
3       54      07     NaN
4     code  volume     dim
5        1      56      34
6       23      57     565
7   number  amount     NaN
8       40     674     NaN
9       73    2365     NaN
10    code  volume     dim
11     341    3456    6534
12      23    0957  908565
13  number  amount     NaN
14     210    4534     NaN
15    2343    4565     NaN

df = df[~((df.a.str.contains('number')) | (df.a.str.contains('code')))]

df1 = df[df.c.isnull()]
df1 = df1[['a', 'b']].reset_index(drop=True)
df1.columns = ['number','amount']

#if all numbers are integers
df1 = df1.astype(int)
print df1
   number  amount
0      10      34
1      23      65
2      54       7
3      40     674
4      73    2365
5     210    4534
6    2343    4565

df2 = df[df.c.notnull()].reset_index(drop=True)
df2.columns = ['code','volume', 'dim']
#if all numbers are integers
df2 = df2.astype(int)
print df2
   code  volume     dim
0     1      56      34
1    23      57     565
2   341    3456    6534
3    23     957  908565

Collectives™ on Stack Overflow

Read rows in text file based on different column headers using Python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related