Reading text files into lists in Python

Question

Instead of defining documentslike this ...

documents = ["the mayor of new york was there", "machine learning can be useful sometimes","new york mayor was present"]

... I want to read the same three sentences from two different txt files with the first sentence in the first file, and sentence 2 and 3 in the second file.

I have come up with this code:

# read txt documents
os.chdir('text_data')
documents = []
for file in glob.glob("*.txt"): # read all txt files in working directory
    file_content = open(file, "r")
    lines = file_content.read().splitlines()
    for line in lines:
        documents.append(line)

But the documents resulting from the two strategies seem to be in different format. I want the second strategy to produce the same output as the first.

... what is wrong? Please try to be specific with your problem statements. — juanpa.arrivillaga
– juanpa.arrivillaga, Commented Mar 25, 2017 at 23:38
My point was that instead of writing "the documents resulting form the two strategies seem to be in different format" you should instead show the output — juanpa.arrivillaga
– juanpa.arrivillaga, Commented Mar 25, 2017 at 23:45
Also, doing this: lines = file_content.read().splitlines() is not necessary. You can iterate directly over the file handler, and it iterates over lines. So just for line in file_content: would be sufficient (although you'll get the trailing newlines). Likely, you just want documents.append(file_content.read()) And you don't have to iterate over the file at all... — juanpa.arrivillaga
– juanpa.arrivillaga, Commented Mar 25, 2017 at 23:48
Possible duplicate of combine multiple text files into one text file using python — OneCricketeer
– OneCricketeer, Commented Mar 26, 2017 at 0:35

OneCricketeer · Accepted Answer · 2017-03-26 00:32:16Z

1

If I understand your code correctly, this is equivalent and more performant (no reading the entire file into a string, then splitting to a list).

os.chdir('text_data')
documents = []
for file in glob.glob("*.txt"): # read all txt files in working directory
    documents.extend( line for line in open(file) )

Or maybe even one line.

documents = [ line for line in open(file) for file in glob.glob("*.txt") ]

answered Mar 26, 2017 at 0:32

OneCricketeer

193k20 gold badges146 silver badges276 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

C S Over a year ago

you need to reverse the order of the "for"s in the list comprehension

Darkstarone · Accepted Answer · 2017-03-26 00:26:56Z

0

Instead of .read().splitlines(), you can use .readlines(). This will place every file's contents into a list.

edited Mar 26, 2017 at 0:26

Darkstarone

4,7408 gold badges40 silver badges76 bronze badges

answered Mar 25, 2017 at 23:41

K.Land_bioinfo

1701 gold badge3 silver badges12 bronze badges

1 Comment

K.Land_bioinfo Over a year ago

I am new to stack overflow, @juanpa.arrivillaga. What I meant was that the contents of the list that .readlines() creates could be further appended to documents, but I see that your most recent comment answered what I was trying to explain. Thank you.

Raymond Hettinger · Accepted Answer · 2017-03-26 00:42:00Z

0

... I want to read the same three sentences from two different txt files with the first sentence in the first file, and sentence 2 and 3 in the second file.

Translating the requirements directly gives:

with open('somefile1.txt') as f1:
    lines_file1 = f1.readlines()
with open('somefile2.txt') as f2:
    lines_file2 = f2.readlines()
documents = lines_file1[0:1] + lines_file2[1:3]

FWIW, given the kind of work you're doing, the [fileinput module][1] may be helpful.

Hope this get you back in business :-)

answered Mar 26, 2017 at 0:42

Raymond Hettinger

229k67 gold badges405 silver badges504 bronze badges

Collectives™ on Stack Overflow

Reading text files into lists in Python

3 Answers 3

1 Comment

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related