Python, Unicode, and the Windows console

Question

When I try to print a string in a Windows console, sometimes I get an error that says UnicodeEncodeError: 'charmap' codec can't encode character ..... I assume this is because the Windows console cannot handle all Unicode characters.

How can I work around this? For example, how can I make the program display a replacement character (such as ?) instead of failing?

What version of Python are you on? I've seen references that this was broken in 2.4.3 and fixed in 2.4.4. — Stu
– Stu, Commented Aug 7, 2008 at 22:30

Blairg23 · Accepted Answer · 2021-05-03 18:53:07Z

96

+50

Update: Python 3.6 implements PEP 528: Change Windows console encoding to UTF-8: the default console on Windows will now accept all Unicode characters. Internally, it uses the same Unicode API as the win-unicode-console package mentioned below. print(unicode_string) should just work now.

I get a UnicodeEncodeError: 'charmap' codec can't encode character... error.

The error means that Unicode characters that you are trying to print can't be represented using the current (chcp) console character encoding. The codepage is often 8-bit encoding such as cp437 that can represent only ~0x100 characters from ~1M Unicode characters:

>>> u"\N{EURO SIGN}".encode('cp437')
Traceback (most recent call last):
...
UnicodeEncodeError: 'charmap' codec can't encode character '\u20ac' in position 0:
character maps to

I assume this is because the Windows console does not accept Unicode-only characters. What's the best way around this?

Windows console does accept Unicode characters and it can even display them (BMP only) if the corresponding font is configured. WriteConsoleW() API should be used as suggested in @Daira Hopwood's answer. It can be called transparently i.e., you don't need to and should not modify your scripts if you use win-unicode-console package:

T:\> py -m pip install win-unicode-console
T:\> py -m run your_script.py

See What's the deal with Python 3.4, Unicode, different languages and Windows?

Is there any way I can make Python automatically print a ? instead of failing in this situation?

If it is enough to replace all unencodable characters with ? in your case then you could set PYTHONIOENCODING envvar:

T:\> set PYTHONIOENCODING=:replace
T:\> python3 -c "print(u'[\N{EURO SIGN}]')"
[?]

In Python 3.6+, the encoding specified by PYTHONIOENCODING envvar is ignored for interactive console buffers unless PYTHONLEGACYWINDOWSIOENCODING envvar is set to a non-empty string.

edited May 3, 2021 at 18:53

Blairg23

12.2k7 gold badges77 silver badges75 bronze badges

answered Aug 24, 2015 at 7:35

jfs

417k210 gold badges1k silver badges1.7k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

JinSnow Over a year ago

"the default console on Windows will now accept all Unicode characters" BUT you need to configure the console: right click on the top of the windows (of the cmd or the python IDLE), in default/font choose the "Lucida console". (Japanese and Chinese don't work for me, but I should survive without it...)

jfs Over a year ago

@Guillaume: the answer contains the phrase in bold about Windows console: "if the corresponding font is configured." This answer doesn't mention IDLE but you don't need to configure the font in it (I see Japanese and Chinese characters just fine in IDLE by default. Try print('\u4E01'), print('\u6b63')).

Mark Tolonen Over a year ago

@Guillaume You can even get Chinese if you install the language pack in Windows 10. It added console fonts that support Chinese.

alvas · Accepted Answer · 2016-01-04 17:18:53Z

38

Note: This answer is sort of outdated (from 2008). Please use the solution below with care!!

Here is a page that details the problem and a solution (search the page for the text Wrapping sys.stdout into an instance):

PrintFails - Python Wiki

Here's a code excerpt from that page:

$ python -c 'import sys, codecs, locale; print sys.stdout.encoding; \
    sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout); \
    line = u"\u0411\n"; print type(line), len(line); \
    sys.stdout.write(line); print line'
  UTF-8
  <type 'unicode'> 2
  Б
  Б

  $ python -c 'import sys, codecs, locale; print sys.stdout.encoding; \
    sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout); \
    line = u"\u0411\n"; print type(line), len(line); \
    sys.stdout.write(line); print line' | cat
  None
  <type 'unicode'> 2
  Б
  Б

There's some more information on that page, well worth a read.

edited Jan 4, 2016 at 17:18

alvas

123k118 gold badges503 silver badges805 bronze badges

answered Aug 7, 2008 at 22:32

Lasse V. Karlsen

394k107 gold badges651 silver badges845 bronze badges

4 Comments

0xC0000022L Over a year ago

The link is dead and the gist of the answer wasn't quoted. -1

user2357112 Over a year ago

When I try the given advice about wrapping sys.stdout, it prints the wrong things. For example, u'\u2013' becomes û instead of an en-dash.

Lasse V. Karlsen Over a year ago

@user2357112 You will have to post a new question about that. Unicode and system console is not necessarily the best combination, but I don't know enough about this, so if you need a definite answer, post a question here on SO about it.

jfs Over a year ago

the link is dead. The code example is wrong for Windows console where the codepage (OEM) such as cp437 is different from Windows ANSI codepage such as cp1252. The code does not fix UnicodeEncodeError: 'charmap' codec can't encode character error and may lead to mojibake e.g., Ø§© is silently replaced with ╪º⌐.

Daira-Emma Hopwood · Accepted Answer · 2022-10-19 08:26:35Z

31

Update: On Python 3.6 or later, printing Unicode strings to the console on Windows just works.

So, upgrade to recent Python and you're done. At this point I recommend using 2to3 to update your code to Python 3.x if needed, and just dropping support for Python 2.x. Note that there has been no security support for any version of Python before 3.7 (including Python 2.7) since December 2021.

If you really still need to support earlier versions of Python (including Python 2.7), you can use https://github.com/Drekin/win-unicode-console , which is based on, and uses the same APIs as the code in the answer that was previously linked here. (That link does include some information on Windows font configuration but I doubt it still applies to Windows 8 or later.)

Note: despite other plausible-sounding answers that suggest changing the code page to 65001, that did not work prior to Python 3.8. (It does kind-of work since then, but as pointed out above, you don't need to do so for Python 3.6+ anyway.) Also, changing the default encoding using sys.setdefaultencoding is (still) not a good idea.

edited Oct 19, 2022 at 8:26

answered Jan 9, 2011 at 5:07

Daira-Emma Hopwood

2,37922 silver badges16 bronze badges

2 Comments

jfs Over a year ago

win-unicode-console Python package (based on your code) allows to avoid modifying your script if it prints Unicode directly using py -mrun your_script.py command.

Daira-Emma Hopwood Over a year ago

The answer has been updated. There is no need to include anything else from the previously linked answer.

Giampaolo Rodolà · Accepted Answer · 2012-05-19 18:48:28Z

11

If you're not interested in getting a reliable representation of the bad character(s) you might use something like this (working with python >= 2.6, including 3.x):

from __future__ import print_function
import sys

def safeprint(s):
    try:
        print(s)
    except UnicodeEncodeError:
        if sys.version_info >= (3,):
            print(s.encode('utf8').decode(sys.stdout.encoding))
        else:
            print(s.encode('utf8'))

safeprint(u"\N{EM DASH}")

The bad character(s) in the string will be converted in a representation which is printable by the Windows console.

answered May 19, 2012 at 18:48

Giampaolo Rodolà

13.2k6 gold badges72 silver badges61 bronze badges

3 Comments

jfs Over a year ago

.encode('utf8').decode(sys.stdout.encoding) leads to mojibake e.g., u"\N{EM DASH}".encode('utf-8').decode('cp437') -> ΓÇö

CODE-REaD Over a year ago

Simply print(s.encode('utf-8')) may be a better way to avoid compiler errors. Instead, you get \xNN output for unprintable characters, which was enough for my diagnostic messages.

Martijn Pieters Over a year ago

This is enormously, spectacularly wrong. Encoding to UTF-8 then decoding as an 8-bit charset will a) often fail, not all codepages have characters for all 256 byte values, and b) always the wrong interpretation of the data, producing a Mojibake mess instead.

sorin · Accepted Answer · 2013-01-12 20:45:44Z

10

The below code will make Python output to console as UTF-8 even on Windows.

The console will display the characters well on Windows 7 but on Windows XP it will not display them well, but at least it will work and most important you will have a consistent output from your script on all platforms. You'll be able to redirect the output to a file.

Below code was tested with Python 2.6 on Windows.


#!/usr/bin/python
# -*- coding: UTF-8 -*-

import codecs, sys

reload(sys)
sys.setdefaultencoding('utf-8')

print sys.getdefaultencoding()

if sys.platform == 'win32':
    try:
        import win32console 
    except:
        print "Python Win32 Extensions module is required.\n You can download it from https://sourceforge.net/projects/pywin32/ (x86 and x64 builds are available)\n"
        exit(-1)
    # win32console implementation  of SetConsoleCP does not return a value
    # CP_UTF8 = 65001
    win32console.SetConsoleCP(65001)
    if (win32console.GetConsoleCP() != 65001):
        raise Exception ("Cannot set console codepage to 65001 (UTF-8)")
    win32console.SetConsoleOutputCP(65001)
    if (win32console.GetConsoleOutputCP() != 65001):
        raise Exception ("Cannot set console output codepage to 65001 (UTF-8)")

#import sys, codecs
sys.stdout = codecs.getwriter('utf8')(sys.stdout)
sys.stderr = codecs.getwriter('utf8')(sys.stderr)

print "This is an Е乂αmp١ȅ testing Unicode support using Arabic, Latin, Cyrillic, Greek, Hebrew and CJK code points.\n"

edited Jan 12, 2013 at 20:45

answered Jan 6, 2010 at 13:38

sorin

173k194 gold badges582 silver badges861 bronze badges

4 Comments

endolith Over a year ago

Is there a way to avoid this by just using a different console?

0xC0000022L Over a year ago

@sorin: Why do you first import win32console outside a try and later you do it conditionally inside a try? Isn't that kind of pointless (the first import)

Jaykul Over a year ago

For what it's worth, the one provided by David-Sarah Hopwood works (I didn't get this one to even run because I haven't bothered installing the win32 extensions module)

Martijn Pieters Over a year ago

Don't change the system default encoding; fix your Unicode values instead. Changing the default encoding can break libraries that rely on the, you know, default behaviour. There is a reason you have to force a module reload before you can do this.

c97 · Accepted Answer · 2018-10-02 22:11:03Z

5

Just enter this code in command line before executing python script:

chcp 65001 & set PYTHONIOENCODING=utf-8

answered Oct 2, 2018 at 22:11

c97

5754 silver badges10 bronze badges

Comments

mike rodent · Accepted Answer · 2023-03-16 08:55:03Z

Like Giampaolo Rodolà's answer, but even more dirty: I really, really intend to spend a long time (soon) understanding the whole subject of encodings and how they apply to Windoze consoles,

For the moment I just wanted sthg which would mean my program would NOT CRASH, and which I understood ... and also which didn't involve importing too many exotic modules (in particular I'm using Jython, so half the time a Python module turns out not in fact to be available).

def pr(s):
    try:
        print(s)
    except UnicodeEncodeError:
        for c in s:
            try:
                print(c, end='')
            except UnicodeEncodeError:
                print('?', end='')
                # if a logger is available (a proper one will handle any and all Unicode):
                # logger.error(f'encoding problem with character |{c}| in string |{s}|, ord(c) |{ord(c)}|, c.encode('utf-8') |{c.encode('utf-8')}|')

NB "pr" is shorter to type than "print" (and quite a bit shorter to type than "safeprint")...!

Matthew Estock · Accepted Answer · 2022-11-22 19:00:52Z

3

TL;DR:

print(yourstring.encode('ascii','replace').decode('ascii'))

I ran into this myself, working on a Twitch chat (IRC) bot. (Python 2.7 latest)

I wanted to parse chat messages in order to respond...

msg = s.recv(1024).decode("utf-8")

but also print them safely to the console in a human-readable format:

print(msg.encode('ascii','replace').decode('ascii'))

This corrected the issue of the bot throwing UnicodeEncodeError: 'charmap' errors and replaced the unicode characters with ?.

edited Nov 22, 2022 at 19:00

answered Jul 1, 2018 at 15:52

Matthew Estock

1062 silver badges6 bronze badges

1 Comment

Wok Over a year ago

This is the solution towards I tend for my simple use case where i) the erroneous characters don't matter, and ii) I would rather have them replaced by ? than mojibake. However, you need to call decode() after your call to encode(), hence my edit. Otherwise, you get bytes instead of str.

Kinjal Dixit · Accepted Answer · 2015-12-16 07:53:43Z

1

Kind of related on the answer by J. F. Sebastian, but more direct.

If you are having this problem when printing to the console/terminal, then do this:

>set PYTHONIOENCODING=UTF-8

answered Dec 16, 2015 at 7:53

Kinjal Dixit

7,9412 gold badges63 silver badges68 bronze badges

1 Comment

jfs Over a year ago

set PYTHONIOENCODING=UTF-8 may lead to mojibake if the console uses say cp437. cp65001 has issues. To print Unicode to Windows console, use Unicode API (WriteConsoleW()) as suggested in my answer where PYTHONIOENCODING is used only to replace characters that can't be represented in the current OEM code page with ?. PYTHONIOENCODING can be used for output to a file.

J. Does · Accepted Answer · 2017-05-11 20:08:34Z

1

Python 3.6 windows7: There is several way to launch a python you could use the python console (which has a python logo on it) or the windows console (it's written cmd.exe on it).

I could not print utf8 characters in the windows console. Printing utf-8 characters throw me this error:

OSError: [winError 87] The paraneter is incorrect 
Exception ignored in: (_io-TextIOwrapper name='(stdout)' mode='w' ' encoding='utf8') 
OSError: [WinError 87] The parameter is incorrect

After trying and failing to understand the answer above I discovered it was only a setting problem. Right click on the top of the cmd console windows, on the tab font chose lucida console.

answered May 11, 2017 at 20:08

J. Does

8933 gold badges12 silver badges24 bronze badges

Comments

Akshay · Accepted Answer · 2018-01-17 05:07:50Z

1

For Python 2 try:

print unicode(string, 'unicode-escape')

For Python 3 try:

import os
string = "002 Could've Would've Should've"
os.system('echo ' + string)

Or try win-unicode-console:

pip install win-unicode-console
py -mrun your_script.py

edited Jan 17, 2018 at 5:07

Akshay

2,8455 gold badges42 silver badges78 bronze badges

answered Aug 24, 2017 at 18:00

shubaly

473 bronze badges

Comments

Puddle · Accepted Answer · 2024-10-08 06:43:30Z

Here's a simple way to run the script as UTF-8: (i use Python 3.4.1 also)

import os, sys, ctypes

if sys.stdout.encoding!="cp65001":                       # not UTF-8?
    if ctypes.windll.kernel32.SetConsoleOutputCP(65001): # not IDLE? change code page to UTF-8
        os.system('python "%s"'%__file__)                # run script with UTF-8
        sys.exit()                                       # exit original instance (cp850 / cp1252)

print("ö ☺ ☻ – ‖")
input("Press enter to close...")

If launched in IDLE, then SetConsoleOutputCP returns 0 and we have no encoding issue to solve.
If launched directly, then it sets UTF-8 and runs a new instance of the script.
(so the initial launch acts as a UTF-8 launcher for itself)

You can also use os.system("chcp 65001 >nul") to set UTF-8. (>nul omits the output)
But to also use in IDLE then we need to detect IDLE and skip the launch or it'll launch a console.

You can also use SetConsoleTitleW("") to change the console title:
(to replace "C:\Windows\py.exe")

ctypes.windll.kernel32.SetConsoleTitleW(os.path.basename(__file__)) # "test.py"
os.system('python "%s"'%__file__) # you can set the title before running the script

Csa77 · Accepted Answer · 2020-06-21 17:35:04Z

0

The cause of your problem is NOT the Win console not willing to accept Unicode (as it does this since I guess Win2k by default). It is the default system encoding. Try this code and see what it gives you:

import sys
sys.getdefaultencoding()

if it says ascii, there's your cause ;-) You have to create a file called sitecustomize.py and put it under python path (I put it under /usr/lib/python2.5/site-packages, but that is differen on Win - it is c:\python\lib\site-packages or something), with the following contents:

import sys
sys.setdefaultencoding('utf-8')

and perhaps you might want to specify the encoding in your files as well:

# -*- coding: UTF-8 -*-
import sys,time

Edit: more info can be found in excellent the Dive into Python book

edited Jun 21, 2020 at 17:35

Csa77

77714 silver badges19 bronze badges

answered Aug 11, 2008 at 17:58

Bartosz Radaczyński

18.6k14 gold badges57 silver badges61 bronze badges

6 Comments

Jon Cage Over a year ago

setdefaultencoding() is nolonger in sys (as of v2.0 according to the module docs).

Bartosz Radaczyński Over a year ago

I cannot prove it right now, but I know that I've used this trick on a later version - 2.5 on Windows.

Bartosz Radaczyński Over a year ago

OK, after quite a while I have found out that: "This function is only intended to be used by the site module implementation and, where needed, by sitecustomize. Once used by the site module, it is removed from the sys module’s namespace."

Bartosz Radaczyński Over a year ago

actually you can set the windows console to be utf-8. you need to say chcp 65001 and it will be unicode.

Martijn Pieters Over a year ago

To make it absolutely clear: it is a is very a bad idea to change the default encoding. This is akin to spalking your broken leg and walking on as if nothing happened, rather than have a doctor set the bone properly. All code handling Unicode text should do so consistently instead of relying on implicit encoding / decoding.

|

Wok · Accepted Answer · 2022-11-20 15:18:19Z

0

Nowadays, the Windows console does not encounter this error, unless you redirect the output.

Here is an example Python script scratch_1.py:

s = "∞"

print(s)

If you run the script as follows, everything works as intended:

python scratch_1.py

∞

However, if you run the following, then you get the same error as in the question:

python scratch_1.py > temp.txt

Traceback (most recent call last):
  File "C:\Users\Wok\AppData\Roaming\JetBrains\PyCharmCE2022.2\scratches\scratch_1.py", line 3, in <module>
    print(s)
  File "C:\Users\Wok\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'charmap' codec can't encode character '\u221e' in position 0: character maps to <undefined>

To solve this issue with the suggestion present in the original question, i.e. by replacing the erroneous characters with question marks ?, one can proceed as follows:

s = "∞"

try:
    print(s)
except UnicodeEncodeError:
    output_str = s.encode("ascii", errors="replace").decode("ascii")

    print(output_str)

It is important:

to call decode(), so that the type of the output is str instead of bytes,
with the same encoding, here "ascii", to avoid the creation of mojibake.

answered Nov 20, 2022 at 15:18

Wok

5,4038 gold badges47 silver badges73 bronze badges

1 Comment

Karl Knechtel Over a year ago

This is caused by an unrelated issue. Any file on any system can use any text encoding, but not all text encodings can represent all characters. The default is platform dependent, and is not necessarily anything to do with the terminal output. The canonical for this problem is What encoding does open() use by default?.

CODE-REaD · Accepted Answer · 2016-05-24 16:19:05Z

James Sulak asked,

Is there any way I can make Python automatically print a ? instead of failing in this situation?

Other solutions recommend we attempt to modify the Windows environment or replace Python's print() function. The answer below comes closer to fulfilling Sulak's request.

Under Windows 7, Python 3.5 can be made to print Unicode without throwing a UnicodeEncodeError as follows:

In place of: print(text)
substitute: print(str(text).encode('utf-8'))

Instead of throwing an exception, Python now displays unprintable Unicode characters as \xNN hex codes, e.g.:

Halmalo n\xe2\x80\x99\xc3\xa9tait plus qu\xe2\x80\x99un point noir

Instead of

Halmalo n’était plus qu’un point noir

Granted, the latter is preferable ceteris paribus, but otherwise the former is completely accurate for diagnostic messages. Because it displays Unicode as literal byte values the former may also assist in diagnosing encode/decode problems.

Note: The str() call above is needed because otherwise encode() causes Python to reject a Unicode character as a tuple of numbers.

Itachi · Accepted Answer · 2021-07-24 07:16:22Z

-1

The issue is with windows default encoding being set to cp1252, and need to be set to utf-8. (check PEP)

Check default encoding using:

import locale 
locale.getpreferredencoding()

You can override locale settings

import os
if os.name == "nt":
    import _locale
    _locale._gdl_bak = _locale._getdefaultlocale
    _locale._getdefaultlocale = (lambda *args: (_locale._gdl_bak()[0], 'utf8'))

referenced code from stack link

answered Jul 24, 2021 at 7:16

Itachi

3,0351 gold badge32 silver badges36 bronze badges

Collectives™ on Stack Overflow

Python, Unicode, and the Windows console

16 Answers 16

3 Comments

4 Comments

2 Comments

3 Comments

4 Comments

Comments

Comments

1 Comment

1 Comment

Comments

Comments

Comments

6 Comments

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

16 Answers 16

3 Comments

4 Comments

2 Comments

3 Comments

4 Comments

Comments

Comments

1 Comment

1 Comment

Comments

Comments

Comments

6 Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related