Python convert strings of bytes to byte array

Question

For example given an arbitrary string. Could be chars or just random bytes:

string = '\xf0\x9f\xa4\xb1'

I want to output:

b'\xf0\x9f\xa4\xb1'

This seems so simple, but I could not find an answer anywhere. Of course just typing the b followed by the string will do. But I want to do this runtime, or from a variable containing the strings of byte.

if the given string was AAAA or some known characters I can simply do string.encode('utf-8'), but I am expecting the string of bytes to just be random. Doing that to '\xf0\x9f\xa4\xb1' ( random bytes ) produces unexpected result b'\xc3\xb0\xc2\x9f\xc2\xa4\xc2\xb1'.

There must be a simpler way to do this?

Edit:

I want to convert the string to bytes without using an encoding

Do you want to convert the string to bytes? It is not clear what the desired solution is... if you know it is a byte string without the b, you can do some string formatting. If you need it in bytes, you can call bytes(string). Does this help: stackoverflow.com/questions/606191/convert-bytes-to-a-string ? — Scott Skiles
– Scott Skiles, Commented Aug 8, 2018 at 20:06
The bytes function takes in a string and an encoding. Since the bytes I'm expecting are random, I don't want to pick an encoding for it — AznBoyStride
– AznBoyStride, Commented Aug 8, 2018 at 20:13

tripleee · Accepted Answer · 2020-12-28 11:57:50Z

5

The Latin-1 character encoding trivially (and unlike every other encoding supported by Python) encodes every code point in the range 0x00-0xff to a byte with the same value.

byteobj = '\xf0\x9f\xa4\xb1'.encode('latin-1')

You say you don't want to use an encoding, but the alternatives which avoid it seem far inferior.

The UTF-8 encoding is unsuitable because, as you already discovered, code points above 0x7f map to a sequence of multiple bytes (up to four bytes) none of which are exactly the input code point as a byte value.

Omitting the argument to .encode() (as in a now-deleted answer) forces Python to guess an encoding, which produces system-dependent behavior (probably picks UTF-8 on most systems except Windows, where it will typically instead choose something much more unpredictable, as well as usually much more sinister and horrible).

edited Dec 28, 2020 at 11:57

answered Dec 28, 2020 at 11:45

tripleee

192k37 gold badges318 silver badges367 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

AznBoyStride · Accepted Answer · 2018-08-08 20:26:37Z

3

I found a working solution

import struct

def convert_string_to_bytes(string):
    bytes = b''
    for i in string:
        bytes += struct.pack("B", ord(i))
    return bytes

string = '\xf0\x9f\xa4\xb1'

print (convert_string_to_bytes(string)))

output: b'\xf0\x9f\xa4\xb1'

answered Aug 8, 2018 at 20:26

AznBoyStride

3173 silver badges12 bronze badges

1 Comment

Sadique Khan Over a year ago

b'\'\\x1e\\x03\\xcd\\xb6\\x93:\\x87\\xfc\\xcfp\\xfc\\xb7\\xba\\x8a\\x0es\\x81P\\xe1\\x1b\\n4a\\xe4"\\xdfA\\x8e\\x8a\\x15\\x18\\xb8\\x12\\xfcB/\\xea\\x83\\xd4\\x1dd\\xb8\\x14\\xd3\\xb9\\xfa\\x97B\\xfe\\x89\\xe1\\xff\\xbe\\x02\\xedY\\xc9pk\\\'\\xf8\\x1d9\\x1a\'' output is like this

Collectives™ on Stack Overflow

Python convert strings of bytes to byte array

2 Answers 2

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related