I ran a Powershell code that outputted a bunch of text files.
The text files look like this:
This is my aText.txt
Clark Kent
Dolly Parten
Charlie Brown
Gary Numan
It's just text files with names, no header. I want these to now be converted to csv files, so I turned to Python and wrote this code:
import os
import pandas as pd
folder = '\path\text\'
csvFolder = '\path\csv\'
for filename in os.listdir(folder):
if filename.endswith('.txt'):
file_path = os.path.join(folder, filename)
csvpath = os.path.join(csvFolder, filename)
#if file is empty
if os.stat(file_path).st_size == 0:
df = pd.DataFrame()
#for other files
else:
df = pd.read_csv(file_path, header=0, names=None)
csv_path = os.path.splitext(csvpath)[0] + '.csv'
df.to_csv(csv_path, index=False)
print("Text files have been converted to csv")
When I ran it, it gave me an error: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
I did some research but didn't see anything for Pandas, only for the csv function. Someone included this under some responses:
df = pd.read_csv(file_path, encoding='cp1252', header=0, names=None)
I tried it out and the program ran, but the csv files were corrupted with strange characters. I tried this on a test folder where I created text files and it ran fine and the output was good, but with the text files created from Powershell, the code runs (with no error messages) but the output isn't correct.
Here is an example of what I am seeing in the csv files after the conversion:
¿ Ã Ÿâ
The else statement seems to be where the error is occurring since this is where the conversion takes place. I ran df:
df = pd.read_csv(file_path, encoding='cp1252', header=0, names=None)
print("This is df: ", df)
This is the sample output:
This is df: ÿþA
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN