I've tried a plethora of solutions which don't work. I've figured out how to get the converted data into a csv column, but all text is in 1 cell and I haven't figured out how to get the FileNames added as a column.
import pandas as pd
import os
from striprtf.striprtf import rtf_to_text
dir_path = 'C:\\Users\\mairi\\Desktop\\testing txt to excel\\'
def getFiles():
list = []
FileNames = []
for path in os.listdir(dir_path):
if os.path.isfile(os.path.join(dir_path,path)):
with open (os.path.join(dir_path,path)) as file:
text = file.read()
rtfText = rtf_to_text(text,encoding='utf-8')
for text in rtfText:
list.append(rtfText)
FileNames.append(os.path.basename(rtfText))
return list
return FileNames
list = getFiles()
FileNames = getFiles()
Data = pd.DataFrame(columns: 'list','FileNames')
NewPath = 'C:\\Users\\mairi\\Desktop\\testing txt to excel\\NEW\\'
Data.to_csv(os.path.join(NewPath,r'Data.csv'), index = False, header = False)
I've tried for a few days to scrape Stackoverflow and find a solution and now I seem to be getting duplicate file data in each row ?
I think my main issues are. Possible that I need to create an empty dataframe before the function, but I haven't got it working yet.
- separating the text so each new line is a new cell
- Distinct file content so there aren't duplicate rows
- adding the FileNames.
Hopefully, the outcome looks like this...
| Filename | Data |
|---|---|
| Filename_1 | Data line 1 |
| Filename_1 | Data line 2 |
| Filename_2 | Data line 1 |
| Filename_2 | Data line 2 |
| Filename_2 | Data line 3 |
Thank you for any help :)