Python: How to read file and store certain columns in array

Question

I am reading a dataset (separated by whitespace) from a file. I need to store all columns apart from last one in the array data, and the last column in the array target.

Can you guide me how to proceed further?

That's what I have so far:

with open(filename) as f:
    data = f.readlines()

Or should I be reading line by line?

PS: The data type of columns is also different.

Edit: Sample Data

faban       1   0   0.288   withspy
faban       2   0   0.243   withoutspy
simulated   1   0   0.159   withoutspy
faban       1   1   0.189   withoutspy

If you're going to do some sort of analysis later, you can probably also look at pandas (pandas.pydata.org). It provides functionality to read in data from CSV files. You can then separate the columns and play around with the data in the way you wish. — Akshay Damle
– Akshay Damle, Commented Jan 4, 2016 at 7:43

Mike Müller · Accepted Answer · 2016-01-04 07:39:01Z

9

This would work:

data = []
target = []
with open('faban.txt') as fobj:
    for line in fobj:
        row = line.split()
        data.append(row[:-1])
        target.append(row[-1])

Now:

>>> data
[['faban', '1', '0', '0.288'],
 ['faban', '2', '0', '0.243'],
 ['simulated', '1', '0', '0.159'],
 ['faban', '1', '1', '0.189']]

>>> target
['withspy', 'withoutspy', 'withoutspy', 'withoutspy']

answered Jan 4, 2016 at 7:39

Mike Müller

86k21 gold badges174 silver badges165 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

timgeb · Accepted Answer · 2016-01-04 08:18:25Z

4

I think numpy has a clean, easy solution here.

>>> import numpy as np
>>> data, target = np.array_split(np.loadtxt('file', dtype=str), [-1], axis=1)

results in:

>>> data.tolist()
[['faban', '1', '0', '0.288'], 
 ['faban', '2', '0', '0.243'], 
 ['simulated', '1', '0', '0.159'], 
 ['faban', '1', '1', '0.189']]
>>> target.flatten().tolist()
['withspy', 'withoutspy', 'withoutspy', 'withoutspy']

edited Jan 4, 2016 at 8:18

answered Jan 4, 2016 at 7:56

timgeb

79.2k20 gold badges129 silver badges150 bronze badges

Comments

Anton Protopopov · Accepted Answer · 2016-01-04 09:29:54Z

3

You could do that with pandas using read_table to read your data, iloc to subset your data, values to get values from DataFrame and tolist method to convert numpy array to list:

import pandas as pd
df = pd.read_table('path_to_your_file', delim_whitespace=True, header=None)
print(df)
           0  1  2      3           4
0      faban  1  0  0.288     withspy
1      faban  2  0  0.243  withoutspy
2  simulated  1  0  0.159  withoutspy
3      faban  1  1  0.189  withoutspy


data = df.iloc[:,:-1].values.tolist()
target = df.iloc[:,-1].tolist()

print(data)
[['faban', 1, 0, 0.28800000000000003],
 ['faban', 2, 0, 0.243],
 ['simulated', 1, 0, 0.159],
 ['faban', 1, 1, 0.18899999999999997]]

print(target)
['withspy', 'withoutspy', 'withoutspy', 'withoutspy']

answered Jan 4, 2016 at 9:29

Anton Protopopov

31.9k13 gold badges93 silver badges96 bronze badges

1 Comment

Skippy le Grand Gourou Over a year ago

read_table is deprecated, modern version : pd.read_csv('path_to_your_file', sep='\t', header=None). As a bonus note that you can name columns with names=['foo','bar','whatever','target'].

Tom · Accepted Answer · 2016-01-04 07:48:17Z

0

The following works nicely:

data = open('<FILE>', 'r').read().split('\n')
out = []
for l in data:
    out.append([e for e in l.split(' ') if e])

out will then have the the format [['faban', '1', '0', '0.288', 'withspy'],[...],...] (Note, all elements are strings)

answered Jan 4, 2016 at 7:48

Tom

4141 gold badge7 silver badges18 bronze badges

Collectives™ on Stack Overflow

Python: How to read file and store certain columns in array

4 Answers 4

Comments

Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related