-1

I have a string which I get from a function

>>> example = Some_function()

This Some_function return a very long combination of Unicode and ASCII string like 'gn1\ud123a\ud123\ud123\ud123\ud919\ud123\ud123'
My Problem is that when I try to convert this unicode string to bytes it gives me an error that \ud919 cannot be encoded by utf-8. I tried :

>>> further=bytes(example,encoding='utf-8')

Note: I cannot ignore this \ud919. If there is a way to solve this problem or how can I convert 'gn1\ud123a\ud123\ud123\ud123\ud919\ud123\ud123' to 'gn1\ud123a\ud123\ud123\ud123\\ud919\ud123\ud123' to treat \ud919 as simple string not unicode.

0

2 Answers 2

0

based on the version. print type(unicode_string), repr(unicode_string) Python 3.x : print type(unicode_string), ascii(unicode_string)

Sign up to request clarification or add additional context in comments.

1 Comment

Hello and welcome to SO! Please read the tour, and How do I write a good answer?
0

\ud919 is a surrogate character, one does not simply convert it. Use surrogatepass flag:

'gn1\ud123a\ud123\ud123\ud123\ud919\ud123\ud123'.encode('utf-8', 'surrogatepass')
>>> b'gn1\xed\x84\xa3a\xed\x84\xa3\xed\x84\xa3\xed\x84\xa3\xed\xa4\x99\xed\x84\xa3\xed\x84\xa3'

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.