1

As a French user of Python 2.7, I'm trying to properly print strings containing accents such as "é", "è", "à", etc. in the Python console.

I already know the trick of using u before the explicit value of a string, such as :

print(u'Université')

which properly prints the last character.

Now, my question is: how can I do the same for a string that is stored as a variable?

Indeed, I know that I could do the following:

mystring = u'Université'
print(mystring)

but the problem is that the value of mystring is bound to be passed into a SQL query (using psycopg2), and therefore I can't afford to store the u inside the value of mystring.

so how could I do something like "print the unicode value of mystring" ?

5
  • u'...' creates an object of type unicode. Your mystring object is *already such an object; it's not the print() function that turns it into something else. Commented Oct 8, 2018 at 9:45
  • Most SQL database adapters can give you Unicode string objects directly, no need to convert. For str objects (byte strings), you need to decode from bytes to Unicode. See nedbatchelder.com/text/unipain.html for a great article on the subject. Commented Oct 8, 2018 at 9:46
  • Then also read joelonsoftware.com/2003/10/08/… and the Python Unicode HOWTO. Commented Oct 8, 2018 at 9:46
  • I don't really see why this is closed as too broad. We have the same question for "raw strings", and that one isn't closed. Commented Oct 8, 2018 at 9:48
  • does it mean that if I do mystring = u'Université' and then I send a query like "INSERT INTO mytable VALUES "+mystring+";" the value passed to SQL will be understood as 'Université'? Commented Oct 8, 2018 at 9:49

1 Answer 1

2

The u sigil is not part of the value, it's just a type indicator. To convert a string into a Unicode string, you need to know the encoding.

unicodestring = mystring.decode('utf-8')  # or 'latin-1' or ... whatever

and to print it you typically (in Python 2) need to convert back to whatever the system accepts on the output filehandle:

print(unicodestring.encode('utf-8'))  # or 'latin-1' or ... whatever

Python 3 clarifies (though not directly simplifies) the situation by keeping Unicode strings and (what is now called) bytes objects separate.

Sign up to request clarification or add additional context in comments.

1 Comment

print(mystring.decode('utf-8')) works like a charm for displaying accented characters properly in the console. Thanks.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.