Parsing a text file into a list in python

Question

I'm completely new to Python, and I'm trying to read in a txt file that contains a combination of words and numbers. I can read in the txt file just fine, but I'm struggling to get the string into a format I can work with.

import matplotlib.pyplot as plt
import numpy as np
from numpy import loadtxt

f= open("/Users/Jennifer/Desktop/test.txt", "r")

lines=f.readlines()

Data = []

list=lines[3]
i=4
while i<12:
        list=list.append(line[i])
        i=i+1

print list

f.close()

I want a list that contains all the elements in lines 3-12 (starting from 0), which is all numbers. When I do print lines[1], I get the data from that line. When I do print lines, or print lines[3:12], I get each character preceded by \x00. For example, the word "Plate" becomes: ['\x00P\x00l\x00a\x00t\x00e. Using lines = [line.strip() for line in f] gets the same result. When I try to put individual lines together in the while loop above, I get the error "AttributeError: 'str' object has no attribute 'append'."

How can I get a selection of lines from a txt file into a list? Thank you so much!!!

Edit: The txt file looks like this:

BLOCKS= 1 Plate: Phosphate Noisiness Assay 2000x 1.3 PlateFormat Endpoint Absorbance Raw FALSE 1 1 650 1 12 96 1 8
Temperature(¡C) 1 2 3 4 5 6 7 8 9 10 11 12
21.4 0.4977 0.5074 0.5183 0.5128 0.5021 0.5114 0.4993 0.5308 0.4837 0.5286 0.5231 0.5227
0.488 0.4742 0.5011 0.4868 0.4976 0.4845 0.4848 0.5179 0.4772 0.5363 0.5109 0.5197
0.4882 0.4913 0.4941 0.5188 0.4766 0.4914 0.495 0.5172 0.4826 0.5039 0.504 0.5451
0.4771 0.4875 0.523 0.4851 0.4757 0.4767 0.4918 0.5212 0.4742 0.5153 0.5027 0.5235
0.4474 0.4841 0.5193 0.4755 0.4649 0.4883 0.5165 0.5223 0.4799 0.5269 0.5091 0.5191
0.4721 0.4794 0.501 0.4467 0.4785 0.4792 0.4894 0.511 0.4778 0.5223 0.4888 0.5273
0.4122 0.4454 0.314 0.2747 0.4621 0.4416 0.3716 0.2534 0.4497 0.5778 0.2319 0.1038
0.4479 0.5368 0.3046 0.3115 0.4745 0.5116 0.3689 0.3915 0.4803 0.5209 0.1981 0.1062

~End Original Filename: 2013-08-06 Phosphate Noisiness; Date Last Saved: 8/6/2013 7:00:55 PM

Update I used this code:

f= open("/Users/Jennifer/Desktop/test.txt", "r")
file_list = f.readlines()

first_twelve = file_list[3:11]

data = [x.replace('\t',' ') for x in first_twelve]
data = [x.replace('\x00','') for x in data]
data = [x.replace(' \r\n','') for x in data]

print data

to get this result: [' 21.4 0.4977 0.5074 0.5183 0.5128 0.5021 0.5114 0.4993 0.5308 0.4837 0.5286 0.5231 0.5227 ', ' 0.488 0.4742 0.5011 0.4868 0.4976 0.4845 0.4848 0.5179 0.4772 0.5363 0.5109 0.5197 ', ' 0.4882 0.4913 0.4941 0.5188 0.4766 0.4914 0.495 0.5172 0.4826 0.5039 0.504 0.5451 ', ' 0.4771 0.4875 0.523 0.4851 0.4757 0.4767 0.4918 0.5212 0.4742 0.5153 0.5027 0.5235 ', ' 0.4474 0.4841 0.5193 0.4755 0.4649 0.4883 0.5165 0.5223 0.4799 0.5269 0.5091 0.5191 ', ' 0.4721 0.4794 0.501 0.4467 0.4785 0.4792 0.4894 0.511 0.4778 0.5223 0.4888 0.5273 ', ' 0.4122 0.4454 0.314 0.2747 0.4621 0.4416 0.3716 0.2534 0.4497 0.5778 0.2319 0.1038 ', ' 0.4479 0.5368 0.3046 0.3115 0.4745 0.5116 0.3689 0.3915 0.4803 0.5209 0.1981 0.1062 ']

Which is (correct me if I'm wrong, very new to Python!) a list of lists, which I should be able to work with. Thank you so much to everyone who responded!!!

Elyase--I included it in an edit above. I'm new to Stack Overflow too, is there a better way to include it? — Rachel Rose
– Rachel Rose, Commented Aug 19, 2013 at 15:39

Peter Foti · Accepted Answer · 2013-08-19 00:18:15Z

6

When you write the code lines = f.readlines() a list of lines is being return to you. When you then say lines[3], you're getting the 3rd line. Thats why you're ending up with individual characters.

All you need to do is say

files = open("Your File.txt")

file_list =  files.readlines()

first_twelve = file_list[0:12] #returns a list with the first 12 lines

Once you've got the first_twelve array you can do whatever you want with it.

To print each line you would do:

for each_line in first_twelve:
    print each_line

That should work for you.

answered Aug 19, 2013 at 0:18

Peter Foti

5,6646 gold badges36 silver badges48 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Rachel Rose Over a year ago

Thank you so much Peter! However, when I print first_twelve, I still get it by character (printing each line in the for loop works great). I think I might just be misunderstanding how Python works here...is the array an array of characters or an array of numbers?

Peter Foti Over a year ago

Is your .txt file not separated by \n's?

dawg · Accepted Answer · 2013-08-20 01:58:08Z

You have the line list=lines[3] in your source code.

Two issues here.

Don't use list as a variable name. You silently overwrote the built-in list constructor when you did that.
When you take one item from a list lines[3] now you only have that object -- in this case a string. When you try to append to it you can't -- it isn't a list.

You can demonstrate your bug easily in the console:

>>> li=['1']
>>> li.append('2')
>>> li
['1', '2']
>>> st='1'
>>> st.append('2')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'append'

Other comments, in general, on your code.

Assume you have a text file called '/tmp/test/txt' that contains this text:

Line 1
Line 2
...
Line 19

Reading the contents of that file is a simple as this:

with open('/tmp/test.txt', 'r') as fin:
    lines=fin.readlines()

If you want a subset of the lines, you can use a slice:

subset=lines[3:12]

If you want to process each line for something, like strip the carriage return, use the file object as an iterator:

with open('/tmp/test.txt', 'r') as fin:
    lines=[]
    for line in fin:
        lines.append(line.strip())

For your specific problem of having NULs in the data, perhaps you are reading a binary file masquerading as text? You need to post an example of the file.

Edit

Your file contains Unicode characters. (right after 'Temperature') which may be some of the odd characters you are seeing. If you are only interested in the lines with numbers, you can ignore them.

You do not YET have a list of lists, but it easy to get:

data=[]                               # will hold the lines of the file
with open(ur_file,'rU') as fin:       
    for line in fin:                  # for each line of the file
        line=line.strip()             # remove CR/LF
        if line:                      # skip blank lines
            data.append(line)

print data                            # list of STRINGS separated by spaces
matrix=[map(float,line.split()) for line in data[3:10]]  # convert the strings..
print matrix                          # NOW you have a list of list of floats...

Thanks so much Drewk! I really appreciate the general comments on the code, it's immensely helpful. If you don't mind, could you take a look at what I posted above and let me know if that's a better way of going about it? Thank you!!

chapter3 · Accepted Answer · 2013-08-20 15:02:14Z

1

The tweak below might help you to get rid of the \00 character embedded in your data

f = open("/Users/Jennifer/Desktop/test.text", "r")

lines = f.readlines()
lines = [x.replace('\x00','') for x in lines]

for i in range(3,12):
    l = []
    l.append(lines[i])

I am not sure if your data has other delimiters (say comma or space) to separate the numbers. If so, a simple split will help to convert the line into a list:

line = '123.00,456.00,789.00'

l = line.split(',')  # list will become ['123.00','456.00','789.00']

Edit

Continue from Rachel's updated code:

f= open("/Users/Jennifer/Desktop/test.txt", "r")
file_list = f.readlines()

first_twelve = file_list[3:11]

data = [x.replace('\t',' ') for x in first_twelve]
data = [x.replace('\x00','') for x in data]
data = [x.replace(' \r\n','') for x in data]

items = []
for dataline in data:
    items += dataline.split(' ')
items = [float(x) for x in items if len(x) > 0]  # remove dummy items left in the list

print items

edited Aug 20, 2013 at 15:02

answered Aug 19, 2013 at 0:13

chapter3

9942 gold badges13 silver badges24 bronze badges

1 Comment

Rachel Rose Over a year ago

Thank you so, so much Toruk! I used this to make a list of lists (I think) which I should be able to use, thank you!

AnupamChugh · Accepted Answer · 2020-04-10 14:49:53Z

0

Using readLines() is memory-inefficient. It takes the whole file into memory. Instead, do this:

[i.split() for i in open('filename.txt')]

answered Apr 10, 2020 at 14:49

AnupamChugh

1,9191 gold badge27 silver badges36 bronze badges

Collectives™ on Stack Overflow

Parsing a text file into a list in python

4 Answers 4

2 Comments

1 Comment

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

1 Comment

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related