python code
#!python3
import sys
import os.path
import codecs
if not os.path.exists(sys.argv[1]):
print("File does not exist: " + sys.argv[1])
sys.exit(1)
file_name = sys.argv[1]
with codecs.open(file_name, 'rb', errors='ignore') as file:
file_contents = file.readlines()
for line_content in file_contents:
print(type(line_content))
line_content = codecs.decode(line_content)
print(line_content)
print(type(line_content))
File content : Log.txt
b'\x03\x00\x00\x00\xc3\x8a\xc3\xacRb\x00\x00\x00\x00042284899:ATBADSFASF:DSF456582:US\r\n1'
Output:
python3 file_convert.py Log.txt ✔ 19:08:22
<class 'bytes'>
b'\x03\x00\x00\x00\xc3\x8a\xc3\xacRb\x00\x00\x00\x00042284899:ATBADSFASF:DSF456582:US\r\n1'
<class 'str'>
I tried all the below methods
line_content = line_content.decode('UTF-8')
line_content = line_content.decode()
line_content = codecs.decode(line_content, 'UTF-8')
Is there any other way to handle this?
The line_content variable still holds the byte data and only the type changes to str which is kind off confusing.
line_contentvariable doesn't hold the byte data, it holds the ASCII representation of the byte data that was in your file. If youprint(repr(line_content))you'll see another level of quotes around it because it's astr, and if youprint(line_content)before callingdecode()on it you'll see that it's all ASCII bytes (e.g. there are no null bytes in it).'b'flag, it really does return bytes and not a string when you read it. ASCII is not involved.Bytesobjects to the file. Maybe the answer is to fix what wrote that file.bytesobject, but if the file is ASCII, it's still ASCII data. That is to say,"\x00"is not0, it's92 120 48 48. I'm not clear on what OP is trying to get -- do they want to get the actual bytes represented by that string, such that\x00becomes0? If so,ast.literal_evalmight be the easiest way.