2

I have text file which contains data for two tables with two column headers. I want to read each line and save lines or rows in two files for each table. I want lines after (number amount) columns header in one file and (code volume dim) column header in other. Or, I want to separate data for similar table.

First few lines of the file. Number of rows in tables may vary.

    number amount
    10 34
    23 65
    54 07

    code volume dim
    1 56 34
    23 57 565

    number amount
    40 674
    73 2365

    code volume dim
    341 3456 6534
    23 0957 908565

    number amount
    210 4534
    2343 4565

same pattern repeats.

1
  • Are number same in all file? Are all numbers integers? Commented Jan 28, 2016 at 10:05

1 Answer 1

1

If text number is same in all file, and text code too, you can use read_csv and then filter columns rows by subset by contains and isnull, notnull:

import pandas as pd
import numpy as np
import io

temp=u""" number amount
    10 34
    23 65
    54 07

    code volume dim
    1 56 34
    23 57 565

    number amount
    40 674
    73 2365

    code volume dim
    341 3456 6534
    23 0957 908565

    number amount
    210 4534
    2343 4565"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), sep="\s+", index_col=None, skip_blank_lines=True, names=['a', 'b', 'c'])
print df
         a       b       c
0   number  amount     NaN
1       10      34     NaN
2       23      65     NaN
3       54      07     NaN
4     code  volume     dim
5        1      56      34
6       23      57     565
7   number  amount     NaN
8       40     674     NaN
9       73    2365     NaN
10    code  volume     dim
11     341    3456    6534
12      23    0957  908565
13  number  amount     NaN
14     210    4534     NaN
15    2343    4565     NaN
df = df[~((df.a.str.contains('number')) | (df.a.str.contains('code')))]

df1 = df[df.c.isnull()]
df1 = df1[['a', 'b']].reset_index(drop=True)
df1.columns = ['number','amount']

#if all numbers are integers
df1 = df1.astype(int)
print df1
   number  amount
0      10      34
1      23      65
2      54       7
3      40     674
4      73    2365
5     210    4534
6    2343    4565

df2 = df[df.c.notnull()].reset_index(drop=True)
df2.columns = ['code','volume', 'dim']
#if all numbers are integers
df2 = df2.astype(int)
print df2
   code  volume     dim
0     1      56      34
1    23      57     565
2   341    3456    6534
3    23     957  908565
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.