How to convert unicode string to bytes Python [duplicate]

Question

I have a string which I get from a function

>>> example = Some_function()

This Some_function return a very long combination of Unicode and ASCII string like 'gn1\ud123a\ud123\ud123\ud123\ud919\ud123\ud123'
My Problem is that when I try to convert this unicode string to bytes it gives me an error that \ud919 cannot be encoded by utf-8. I tried :

>>> further=bytes(example,encoding='utf-8')

Note: I cannot ignore this \ud919. If there is a way to solve this problem or how can I convert 'gn1\ud123a\ud123\ud123\ud123\ud919\ud123\ud123' to 'gn1\ud123a\ud123\ud123\ud123\\ud919\ud123\ud123' to treat \ud919 as simple string not unicode.

abidlatif · Accepted Answer · 2021-02-07 13:08:09Z

0

based on the version. print type(unicode_string), repr(unicode_string) Python 3.x : print type(unicode_string), ascii(unicode_string)

answered Feb 7, 2021 at 13:08

abidlatif

1

Sign up to request clarification or add additional context in comments.

1 Comment

Tomer Shetah Over a year ago

Hello and welcome to SO! Please read the tour, and How do I write a good answer?

Alderven · Accepted Answer · 2021-02-07 13:27:32Z

0

\ud919 is a surrogate character, one does not simply convert it. Use surrogatepass flag:

'gn1\ud123a\ud123\ud123\ud123\ud919\ud123\ud123'.encode('utf-8', 'surrogatepass')
>>> b'gn1\xed\x84\xa3a\xed\x84\xa3\xed\x84\xa3\xed\x84\xa3\xed\xa4\x99\xed\x84\xa3\xed\x84\xa3'

answered Feb 7, 2021 at 13:27

Alderven

8,2885 gold badges28 silver badges42 bronze badges

Collectives™ on Stack Overflow

How to convert unicode string to bytes Python [duplicate]

2 Answers 2

1 Comment

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Linked

Related