Instead of defining documentslike this ...
documents = ["the mayor of new york was there", "machine learning can be useful sometimes","new york mayor was present"]
... I want to read the same three sentences from two different txt files with the first sentence in the first file, and sentence 2 and 3 in the second file.
I have come up with this code:
# read txt documents
os.chdir('text_data')
documents = []
for file in glob.glob("*.txt"): # read all txt files in working directory
file_content = open(file, "r")
lines = file_content.read().splitlines()
for line in lines:
documents.append(line)
But the documents resulting from the two strategies seem to be in different format. I want the second strategy to produce the same output as the first.
documentsresulting form the two strategies seem to be in different format" you should instead show the outputlines = file_content.read().splitlines()is not necessary. You can iterate directly over the file handler, and it iterates over lines. So justfor line in file_content:would be sufficient (although you'll get the trailing newlines). Likely, you just wantdocuments.append(file_content.read())And you don't have to iterate over the file at all...