0

string.decode() throws an error, when i try to decode the line output of an stdout.PIPE. The error message is:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x84 in position 8: invalid start byte

0x84 should be the letter 'ä'. The line that fails reads as follows:

b' Datentr\x84ger in Laufwerk C: ist System'

I can't nail it down. I already checked the encoding using sys.stdout.encoding, which is utf-8.

import subprocess
import re

prc = subprocess.Popen(["cmd.exe"], shell = False, stdout=subprocess.PIPE, stdin=subprocess.PIPE)
prc.stdin.write(b"dir\n")
outp, inp = prc.communicate()

regex = re.compile(r"^.*(\d\d:\d\d).*$")

for line in outp.splitlines():
    match = regex.match(line.decode('utf-8'))#  <--- decode fails here.
    if match:
        print(match.groups())

prc.stdin.close()

2 Answers 2

2

CMD encodes text using ISO-8859-15. So the text that comes through the PIPE needs to be decoded using ISO, even if python encodes the stdout using utf-8.

Sign up to request clarification or add additional context in comments.

1 Comment

Is there a way to get this information automatically? I've tried sys.getdefaultencoding() and locale.getpreferredencoding(), but both return something different. I'd like to make sure the script runs on any Windows.
0

If you don’t know the encoding, the cleanest way to solve this is to specify the errors param of bytearray.decode, e.g.:

import subprocess
p = subprocess.run(['echo', b'Evil byte: \xe2'], stdout=subprocess.PIPE)
p.stdout.decode(errors='backslashreplace')

Output:

'Evil byte: \\xe2\n'

The list of possible values can be found here: https://docs.python.org/3/library/codecs.html#codecs.register_error

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.