I have a Dataset, which has 5 folders, in which each folder has 100 .txt files. Below code you can see that I am looping through every file, and removing certain words from those files using my StopWords.txt file.
After I remove the words I am appending the output in one file(filteredtext.txt). But I want to have these output exactly as my Dataset (5 folders which has 100 .txt file).
This is my code.
import re
import os
#insert stopwords files
stopwordfile = open("StopWords.txt", encoding='utf-8')
# Use this to read file content as a stream:
readstopword = stopwordfile.read()
stop_words = readstopword.split()
#file path to dataset
for path, _, files in os.walk("sinhala-set1"):
for file_name in files:
filepath = os.path.join(path, file_name)
print(f"Checking --> {filepath}")
file1 = open(filepath, encoding='utf-8')
# Use this to read file content as a stream:
line = file1.read()
words = line.split()
for r in words:
if not r in stop_words:
appendFile = open('filteredtext.txt','a', encoding='utf-8')
appendFile.write(" "+r)
appendFile.close()
'filteredtext.txt'for writing every time, why not do something likeos.path.join(path, 'filtered_' + file_name)? Have you tried anything like that?