Convert bytes object to string object in python

Question

python code

#!python3

import sys
import os.path
import codecs

if not os.path.exists(sys.argv[1]):
    print("File does not exist: " + sys.argv[1])
    sys.exit(1)
file_name = sys.argv[1]

with codecs.open(file_name, 'rb', errors='ignore') as file:
    file_contents = file.readlines()

for line_content in file_contents:
    print(type(line_content))
    line_content = codecs.decode(line_content)
    print(line_content)
    print(type(line_content))

File content : Log.txt

b'\x03\x00\x00\x00\xc3\x8a\xc3\xacRb\x00\x00\x00\x00042284899:ATBADSFASF:DSF456582:US\r\n1'

Output:

python3 file_convert.py Log.txt                                                                                                                                               ✔  19:08:22 
<class 'bytes'>
b'\x03\x00\x00\x00\xc3\x8a\xc3\xacRb\x00\x00\x00\x00042284899:ATBADSFASF:DSF456582:US\r\n1'
<class 'str'>

I tried all the below methods

line_content = line_content.decode('UTF-8')
line_content = line_content.decode()
line_content = codecs.decode(line_content, 'UTF-8')

Is there any other way to handle this?
The line_content variable still holds the byte data and only the type changes to str which is kind off confusing.

The line_content variable doesn't hold the byte data, it holds the ASCII representation of the byte data that was in your file. If you print(repr(line_content)) you'll see another level of quotes around it because it's a str, and if you print(line_content) before calling decode() on it you'll see that it's all ASCII bytes (e.g. there are no null bytes in it). — Samwise
– Samwise, Commented Apr 20, 2022 at 2:24
@Samwise because the file was opened in binary mode with the 'b' flag, it really does return bytes and not a string when you read it. ASCII is not involved. — Mark Ransom
– Mark Ransom, Commented Apr 20, 2022 at 2:30
Log.txt is literally the string you post? Then somebody saved what looks like python Bytes objects to the file. Maybe the answer is to fix what wrote that file. — tdelaney
– tdelaney, Commented Apr 20, 2022 at 2:32
It returns a bytes object, but if the file is ASCII, it's still ASCII data. That is to say, "\x00" is not 0, it's 92 120 48 48. I'm not clear on what OP is trying to get -- do they want to get the actual bytes represented by that string, such that \x00 becomes 0? If so, ast.literal_eval might be the easiest way. — Samwise
– Samwise, Commented Apr 20, 2022 at 2:33
@Samwise if you look at all the bytes it's clear its not ASCII. In fact it looks to me like intermixed binary data and text, and turning the whole thing into text will be nearly impossible unless you have a detailed file specification. — Mark Ransom
– Mark Ransom, Commented Apr 20, 2022 at 2:38

tdelaney · Accepted Answer · 2022-04-20 03:00:18Z

3

The data in Log.txt is the string representation of a python Bytes object. That is odd but we can deal with it. Since its a Bytes literal, evaluate it, which converts it to a real python Bytes object. Now there is still a question of what its encoding is.

I don't see any advantage to using codecs.open. That's a way to read unicode files in python 2.7, not usually needed in python 3. Guessing UTF-8, your code would be

#!python3

import sys
import os
import ast

if not os.path.exists(sys.argv[1]):
    print("File does not exist: " + sys.argv[1])
    sys.exit(1)
file_name = sys.argv[1]

with open(file_name) as file:
    file_contents = file.readlines()

for line_content in file_contents:
    print(type(line_content))
    line_content = ast.literal_eval(line_content).decode("utf-8")
    print(line_content)
    print(type(line_content))

edited Apr 20, 2022 at 3:00

answered Apr 20, 2022 at 2:38

tdelaney

77.9k6 gold badges91 silver badges129 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Balaji Over a year ago

This solution worked for me. Thanks

danangjoyoo · Accepted Answer · 2022-04-20 02:26:34Z

-1

I think it's a list not a string. Whenever you look at byte-string started with \ (reverse backslash), it's potentially a list

try this

decoded_line_content = list(line_content)

answered Apr 20, 2022 at 2:26

danangjoyoo

3601 silver badge7 bronze badges

Collectives™ on Stack Overflow

Convert bytes object to string object in python

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related