5

For example given an arbitrary string. Could be chars or just random bytes:

string = '\xf0\x9f\xa4\xb1'

I want to output:

b'\xf0\x9f\xa4\xb1'

This seems so simple, but I could not find an answer anywhere. Of course just typing the b followed by the string will do. But I want to do this runtime, or from a variable containing the strings of byte.

if the given string was AAAA or some known characters I can simply do string.encode('utf-8'), but I am expecting the string of bytes to just be random. Doing that to '\xf0\x9f\xa4\xb1' ( random bytes ) produces unexpected result b'\xc3\xb0\xc2\x9f\xc2\xa4\xc2\xb1'.

There must be a simpler way to do this?

Edit:

I want to convert the string to bytes without using an encoding

4
  • Do you want to convert the string to bytes? It is not clear what the desired solution is... if you know it is a byte string without the b, you can do some string formatting. If you need it in bytes, you can call bytes(string). Does this help: stackoverflow.com/questions/606191/convert-bytes-to-a-string ? Commented Aug 8, 2018 at 20:06
  • Yes I want to simply convert the string to bytes Commented Aug 8, 2018 at 20:07
  • Okay I see your problem. You might need to use a raw string Commented Aug 8, 2018 at 20:11
  • The bytes function takes in a string and an encoding. Since the bytes I'm expecting are random, I don't want to pick an encoding for it Commented Aug 8, 2018 at 20:13

2 Answers 2

5

The Latin-1 character encoding trivially (and unlike every other encoding supported by Python) encodes every code point in the range 0x00-0xff to a byte with the same value.

byteobj = '\xf0\x9f\xa4\xb1'.encode('latin-1')

You say you don't want to use an encoding, but the alternatives which avoid it seem far inferior.

The UTF-8 encoding is unsuitable because, as you already discovered, code points above 0x7f map to a sequence of multiple bytes (up to four bytes) none of which are exactly the input code point as a byte value.

Omitting the argument to .encode() (as in a now-deleted answer) forces Python to guess an encoding, which produces system-dependent behavior (probably picks UTF-8 on most systems except Windows, where it will typically instead choose something much more unpredictable, as well as usually much more sinister and horrible).

Sign up to request clarification or add additional context in comments.

Comments

3

I found a working solution

import struct

def convert_string_to_bytes(string):
    bytes = b''
    for i in string:
        bytes += struct.pack("B", ord(i))
    return bytes       

string = '\xf0\x9f\xa4\xb1'

print (convert_string_to_bytes(string)))

output: b'\xf0\x9f\xa4\xb1'

1 Comment

b'\'\\x1e\\x03\\xcd\\xb6\\x93:\\x87\\xfc\\xcfp\\xfc\\xb7\\xba\\x8a\\x0es\\x81P\\xe1\\x1b\\n4a\\xe4"\\xdfA\\x8e\\x8a\\x15\\x18\\xb8\\x12\\xfcB/\\xea\\x83\\xd4\\x1dd\\xb8\\x14\\xd3\\xb9\\xfa\\x97B\\xfe\\x89\\xe1\\xff\\xbe\\x02\\xedY\\xc9pk\\\'\\xf8\\x1d9\\x1a\'' output is like this

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.