Read datafile with python to an array

Question

This question is totally simple, but i don't get it done since hours: I got a datafile, that contains two columns of data, separated by an tab. I want to read and process them with python. allData contains the data, but how can i access parts of it?

with open( "file.txt", "r" ) as mergeData:
    allData = mergeData.read()

print allData

Tim Pietzcker · Accepted Answer · 2013-09-27 21:19:36Z

3

The most flexible way would be to use the csv module:

import csv
with open("file.txt", "rb") as infile:
    reader = csv.reader(infile, delimiter="\t")
    allData = list(reader)

Note that all the elements will be strings. If you want to convert, say, the first column to an int and the second column to a float, you could do something like

    allData = [(int(first), float(second)) for first,second in reader]

To split it up into two lists of floats, one for each column, use zip() together with the tuple unpacking operator (*):

    first, second = zip(*((float(x), float(y)) for x,y in reader))

edited Sep 27, 2013 at 21:19

answered Sep 27, 2013 at 20:19

Tim Pietzcker

337k59 gold badges520 silver badges572 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

user2003965 Over a year ago

This works so far just fine. For my understanding: Now everything is in the allData file. How can i split my data up into two files of floats?

Tim Pietzcker Over a year ago

allData is a list, not a file. At which point do you want to split that?

user2003965 Over a year ago

sure, you're right, its a list. I want two lists, each containing one column.

Tim Pietzcker Over a year ago

@user2003965: OK, that's a bit more complicated. See my edit.

zero323 · Accepted Answer · 2013-09-27 20:23:58Z

2

Short and simple:

with open( "file.txt", "r" ) as mergeData:
     allData = [line.strip().split('\t') for line in mergeData]

csv module mentioned by @TimPietzcker is nice but doesn’t handle unicode.

answered Sep 27, 2013 at 20:23

zero323

331k108 gold badges981 silver badges958 bronze badges

1 Comment

martineau Over a year ago

I'm not sure, but I think the Python 3 csv module handles Unicode.

martineau · Accepted Answer · 2013-09-28 07:45:07Z

-1

Thecsv module is a good choice for reading in files of delimited data fields. The following creates a list of lists, and each one will contain the data read from the corresponding column in the data file. It can also easily be adapted to any number of columns of data:

import csv

NUM_COLS = 2
columns = [[] for _ in range(NUM_COLS)]
with open("datafile.txt", "rb") as infile:
    for row in csv.reader(infile, delimiter="\t"):
        for i, col in enumerate(row):
            columns[i].append(col)

for col in columns:
    print col

Sample tab-delimited input file:

Output produced:

['1', '2', '3', '4']
['5', '6', '7', '8']

edited Sep 28, 2013 at 7:45

answered Sep 27, 2013 at 21:19

martineau

124k29 gold badges181 silver badges319 bronze badges

Comments

mjhalwa · Accepted Answer · 2013-09-28 05:46:38Z

-2

why not:

fp = open("file.txt","r")
mylist = fp.readlines()   # get list of lines.
fp.close()  # i forgot that line [EDIT]
for i in range(len(mylist)):
    mylist[i] = mylist[i].strip()   #get rid of ' ' and '\n' and such
    mylist[i] = mylist[i].split('separator') # splits line into list of elements in the line

mylist should then be a 2D array / list of your lines and single elements in each line. separator should then be swapped with the char or string that separates your line elements.

edited Sep 28, 2013 at 5:46

answered Sep 27, 2013 at 20:23

mjhalwa

5431 gold badge6 silver badges19 bronze badges

16 Comments

Tim Pietzcker Over a year ago

This does not work (look at the .strip() line...) and is quite unpythonic and inefficient.

mjhalwa Over a year ago

@ Tim: yeah sry, I am currently programming c++, so there come the ';' from^^. But why shall that be unpythonic, if uses the variable type lists which is typically to python? And it is also more basic than using a libary for that simple task.

zero323 Over a year ago

1. File object is iterable so it can be used instead of mylist 2. Lists are iterable too so you don't need indexes 3. If you really need indexes most of the time enumarate is cleaner if not more effective.

Tim Pietzcker Over a year ago

It's still wrong (I didn't even notice the ;s). Strings are immutable. Building a list of strings first, then chopping that into a list of a list of strings is very slow and memory-inefficient. Plus the problems outlined by @zero323.

zero323 Over a year ago

Don't forget about split('separator').

|

Collectives™ on Stack Overflow

Read datafile with python to an array

4 Answers 4

4 Comments

1 Comment

Comments

16 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

4 Comments

1 Comment

Comments

16 Comments

Your Answer

Sign up or log in

Post as a guest

Related