0

This question is totally simple, but i don't get it done since hours: I got a datafile, that contains two columns of data, separated by an tab. I want to read and process them with python. allData contains the data, but how can i access parts of it?

with open( "file.txt", "r" ) as mergeData:
    allData = mergeData.read()

print allData

4 Answers 4

3

The most flexible way would be to use the csv module:

import csv
with open("file.txt", "rb") as infile:
    reader = csv.reader(infile, delimiter="\t")
    allData = list(reader)

Note that all the elements will be strings. If you want to convert, say, the first column to an int and the second column to a float, you could do something like

    allData = [(int(first), float(second)) for first,second in reader]

To split it up into two lists of floats, one for each column, use zip() together with the tuple unpacking operator (*):

    first, second = zip(*((float(x), float(y)) for x,y in reader))
Sign up to request clarification or add additional context in comments.

4 Comments

This works so far just fine. For my understanding: Now everything is in the allData file. How can i split my data up into two files of floats?
allData is a list, not a file. At which point do you want to split that?
sure, you're right, its a list. I want two lists, each containing one column.
@user2003965: OK, that's a bit more complicated. See my edit.
2

Short and simple:

with open( "file.txt", "r" ) as mergeData:
     allData = [line.strip().split('\t') for line in mergeData]

csv module mentioned by @TimPietzcker is nice but doesn’t handle unicode.

1 Comment

I'm not sure, but I think the Python 3 csv module handles Unicode.
-1

Thecsv module is a good choice for reading in files of delimited data fields. The following creates a list of lists, and each one will contain the data read from the corresponding column in the data file. It can also easily be adapted to any number of columns of data:

import csv

NUM_COLS = 2
columns = [[] for _ in range(NUM_COLS)]
with open("datafile.txt", "rb") as infile:
    for row in csv.reader(infile, delimiter="\t"):
        for i, col in enumerate(row):
            columns[i].append(col)

for col in columns:
    print col

Sample tab-delimited input file:

1   5
2   6
3   7
4   8

Output produced:

['1', '2', '3', '4']
['5', '6', '7', '8']

Comments

-2

why not:

fp = open("file.txt","r")
mylist = fp.readlines()   # get list of lines.
fp.close()  # i forgot that line [EDIT]
for i in range(len(mylist)):
    mylist[i] = mylist[i].strip()   #get rid of ' ' and '\n' and such
    mylist[i] = mylist[i].split('separator') # splits line into list of elements in the line

mylist should then be a 2D array / list of your lines and single elements in each line. separator should then be swapped with the char or string that separates your line elements.

16 Comments

This does not work (look at the .strip() line...) and is quite unpythonic and inefficient.
@ Tim: yeah sry, I am currently programming c++, so there come the ';' from^^. But why shall that be unpythonic, if uses the variable type lists which is typically to python? And it is also more basic than using a libary for that simple task.
1. File object is iterable so it can be used instead of mylist 2. Lists are iterable too so you don't need indexes 3. If you really need indexes most of the time enumarate is cleaner if not more effective.
It's still wrong (I didn't even notice the ;s). Strings are immutable. Building a list of strings first, then chopping that into a list of a list of strings is very slow and memory-inefficient. Plus the problems outlined by @zero323.
Don't forget about split('separator').
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.