19

I have searched many times online and I have not been able to find a way to convert my binary string variable, X

X = "1000100100010110001101000001101010110011001010100"

into a UTF-8 string value.

I have found that some people are using methods such as

b'message'.decode('utf-8')

however, this method has not worked for me, as 'b' is said to be nonexistent, and I am not sure how to replace the 'message' with a variable. Not only, but I have not been able to comprehend how this method works. Is there a better alternative?

So how could I convert a binary string into a text string?

EDIT: I also do not mind ASCII decoding

CLARIFICATION: Here is specifically what I would like to happen.

def binaryToText(z):
    # Some code to convert binary to text
    return (something here);
X="0110100001101001"
print binaryToText(X)

This would then yield the string...

hi
7
  • Since ASCII is effectively a subset of UTF-8 you'll find that your string X is already a UTF8 string. What is your expected output? Commented Nov 11, 2016 at 22:43
  • +mhawke I am looking for a returned value of a UTF-8 string. The binary is initially a string, and I want to be able to convert that binary, into a UTF-8 string. Please ask me if you need more clarification! Commented Nov 11, 2016 at 22:46
  • Are you using Python 2 or 3? Why did you tag BOTH? In Python 3, strings are utf by default. Commented Nov 11, 2016 at 22:48
  • +juanpa.arrivillaga I have the flexibility to use both, dependant upon which option is best for me to use. I can accept solutions for both versions. Commented Nov 11, 2016 at 22:50
  • Well, if you use Python 3, all strings are unicode, so that seems to be the most straightforward solution... Commented Nov 11, 2016 at 22:57

6 Answers 6

17

It looks like you are trying to decode ASCII characters from a binary string representation (bit string) of each character.

You can take each block of eight characters (a byte), convert that to an integer, and then convert that to a character with chr():

>>> X = "0110100001101001"
>>> print(chr(int(X[:8], 2)))
h
>>> print(chr(int(X[8:], 2)))
i

Assuming that the values encoded in the string are ASCII this will give you the characters. You can generalise it like this:

def decode_binary_string(s):
    return ''.join(chr(int(s[i*8:i*8+8],2)) for i in range(len(s)//8))

>>> decode_binary_string(X)
hi

If you want to keep it in the original encoding you don't need to decode any further. Usually you would convert the incoming string into a Python unicode string and that can be done like this (Python 2):

def decode_binary_string(s, encoding='UTF-8'):
    byte_string = ''.join(chr(int(s[i*8:i*8+8],2)) for i in range(len(s)//8))
    return byte_string.decode(encoding)
Sign up to request clarification or add additional context in comments.

3 Comments

Could you also add the reverse code? For converting string to binary. That would be great :)
@Dan: ''.join([bin(ord(c))[2:].rjust(8,'0') for c in 'hi'])
I'm way, way late to this solution but I'm curious. When I run the last of the code snippets above I get 'str' object has no attribute 'decode'. I bring this up because this solution appears perfect for what I need but the encoding (or rather decoding) part doesn't seem to work.
6

To convert bits given as a "01"-string (binary digits) into the corresponding text in Python 3:

>>> bits = "0110100001101001"
>>> n = int(bits, 2)
>>> n.to_bytes((n.bit_length() + 7) // 8, 'big').decode()
'hi'

For Python 2/3 solution, see Convert binary to ASCII and vice versa.

Comments

1

In Python 2, an ascii-encoded (byte) string is also a utf8-encoded (byte) string. In Python 3, a (unicode) string must be encoded to utf8-encoded bytes. The decoding example was going the wrong way.

>>> X = "1000100100010110001101000001101010110011001010100"
>>> X.encode()
b'1000100100010110001101000001101010110011001010100'

Strings containing only the digits '0' and '1' are a special case and the same rules apply.

1 Comment

So how could I decode X? X.decode() does not seem to work.
0

Provide the optional base argument to int to convert:

>> x = "1000100100010110001101000001101010110011001010100"
>> int(x, 2)
301456912901716

Comments

0
# Simple not elegant, used for a CTF challenge, did the trick

# Input of Binary, Seperated in Bytes
binary = "01000011 01010100 01000110 01111011 01000010 01101001 01110100 01011111 01000110 01101100 01101001 01110000 01110000 01101001 01101110 01111101"
# Add each item to a list at spaces
binlist = binary.split(" ")
# List to Hold Characters
chrlist = []
# Loop to convert
for i in binlist:
    chrlist.append(chr(int(i,2)))
# Print The list a joined string
print("".join(chrlist))

1 Comment

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.
-1

A working code for python 3

Binstr = '00011001 00001000'
Binstr.split(' ')
s = []
for i in Binstr:
    s.append(chr(i))
print(''.join(s))

1 Comment

Code syntax is invalid

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.