2

I've a very simple question: which is the most efficient way to read different entries from a txt file with Python?

Suppose I've a text file like:

42017     360940084.621356  21.00  09/06/2015  13:08:04
42017     360941465.680841  29.00  09/06/2015  13:31:05
42017     360948446.517761  16.00  09/06/2015  15:27:26
42049     361133954.539315  31.00  11/06/2015  18:59:14
42062     361208584.222483  10.00  12/06/2015  15:43:04
42068     361256740.238150  19.00  13/06/2015  05:05:40

In C I would do:

while(fscanf(file_name, "%d %lf %f %d/%d/%d %d:%d:%d", &id, &t0, &score, &day, &month, &year, &hour, &minute, &second) != EOF){...some instruction...}

What would be the best way to do something like this in Python? In order to store every value into a different variable (since I've got to work with those variables throughout the code).

Thanks in advance!

3

3 Answers 3

2

I feel like the muddyfish answer is good, here is another way (maybe a bit lighter)

import time
with open(file) as f:
    for line in f:
        identifier, t0, score, date, hour = line.split()

        # You can also get a time_struct from the time
        timer = time.strptime(date + hour, "%d/%m/%Y%H:%M:%S")
Sign up to request clarification or add additional context in comments.

3 Comments

note that id is a reserved word. If you want to use it as an identifier, use id_ = value instead
Thanks FunkySayu! I also ended up to something similar... since I need each single entry (day, month, year, etc.), I was wondering whether there is a faster way or do I have to use line.split("/") and line.split(":") another time?
The point is that I've got to work with each single entry (like make operations with the t0 and the different days and months), so I need to store data into different variables
0

I would look up the string.split() method

For example you could use

for line in file.readlines():
    data = dict(zip(("id", "t0", "score", "date", "time"), line.split(" ")))
    instructions()

Comments

0

Depending on what you want to do with the data, pandas may be something to look into:

import pandas as pd

with open(file_name) as infile:
    df = pd.read_fwf(infile, header=None, parse_dates=[[3, 4]], 
        date_parser=lambda x: pd.to_datetime(x, format='%d/%m/%Y %H:%M:%S'))

The double list [[3, 4]], together with the date_parser argument, will read the the third and fourth (0-indexed) columns as a single data-time object. You can then access individual parts of that column with

>>> df['3_4'].dt.hour
0    13
1    13
2    15
3    18
4    15
5     5
dtype: int64

(If you don't like the '3_4' key, use the parse_dates argument above as follows:

parse_dates={'timestamp': [3, 4]}

)

read_fwf is for reading fixed width columns, which your data seems to adhere to. Alternatively, there are functions such as read_csv, read_table and a lot more.

(This answer is pretty much a duplicate of this SO answer, but since this question here is more general, I've put this here as another answer, not as a comment.)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.