Python sys.maxint, sys.maxunicode on Linux and windows

Question

On 64-bit Debian Linux 6:

Python 2.6.6 (r266:84292, Dec 26 2010, 22:31:48)
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.maxint
9223372036854775807
>>> sys.maxunicode
1114111

On 64-bit Windows 7:

Python 2.7.1 (r271:86832, Nov 27 2010, 17:19:03) [MSC v.1500 64 bit (AMD64)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.maxint
2147483647
>>> sys.maxunicode
65535

Both Operating Systems are 64-bit. They have sys.maxunicode, according to wikipedia There are 1,114,112 code points in unicode. Is sys.maxunicode on Windows wrong?

And why do they have different sys.maxint?

on my 32-bit machines: when using linux, both python 2 and python 3 return 1114111 for sys.maxunicode, using windows i get 65535 for sys.maxunicode — Adrien Plisson
– Adrien Plisson, Commented Nov 17, 2011 at 9:34
"Why" questions are not really well suited for StackOverflow, but perhaps @RaymondHettinger can shed some light on this. — Tim Pietzcker
– Tim Pietzcker, Commented Nov 17, 2011 at 10:32

Eric O. Lebigot · Accepted Answer · 2011-11-17 09:45:07Z

4

I don't know what your question is, but sys.maxunicode is not wrong on Windows.

See the docs:

sys.maxunicode

An integer giving the largest supported code point for a Unicode character. The value of this depends on the configuration option that specifies whether Unicode characters are stored as UCS-2 or UCS-4.

Python on Windows uses UCS-2, so the largest code point is 65,535 (and the supplementary-plane characters are encoded by 2*16 bit "surrogate pairs").

About sys.maxint, this shows at which point Python 2 switches from "simple integers" (123) to "long integers" (12345678987654321L). Obviously Python for Windows uses 32 bits, and Python for Linux uses 64 bits. Since Python 3, this has become irrelevant because the simple and long integer types have been merged into one. Therefore, sys.maxint is gone from Python 3.

edited Nov 17, 2011 at 9:45

Eric O. Lebigot

95.1k49 gold badges223 silver badges263 bronze badges

answered Nov 17, 2011 at 9:32

Tim Pietzcker

337k59 gold badges520 silver badges572 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Adrien Plisson Over a year ago

i would also add that sys.maxunicode has no relation whatsoever with sys.maxint.

Keith Thompson Over a year ago

As I understand it, "surrogate pairs" apply only to UTF-16; UCS-2 is simply incapable of representing characters past 65535.

Eric O. Lebigot Over a year ago

@TimPietzcker: I would like to add a pointer to the documentation about supplementary character planes: "Any Unicode character can be encoded [with \Uxxxxxxxx], but characters outside the Basic Multilingual Plane (BMP) will be encoded using a surrogate pair if Python is compiled to use 16-bit code units (the default). Individual code units which form parts of a surrogate pair can be encoded using this escape sequence." (docs.python.org/reference/lexical_analysis.html#string-literals).

Eric O. Lebigot Over a year ago

@KeithThompson: it looks like Python can encode characters outside of the Basic Multilingual Plane (BMP) even when it has sys.maxunicode==65535: print repr(u"\U00010120") correctly returns the original input string representation. So, it looks like Python is using UCS-2 internally, with a convention that allows it to represent characters outside of the BMP. In fact, if you look at the internal representation with u"\U00010120".encode('unicode_internal').encode('hex'), you see that Python uses the special code 0xd800, which is guaranteed not to point to any character (like d800-dfff).

Keith Thompson Over a year ago

Is UCS-2 "with a convention that allows it to represent characters outside the BMP" just a way to describe UTF-16, or does Python's convention differ from UTF-16?

|

Community · Accepted Answer · 2017-05-23 12:24:40Z

1

Regarding the difference is sys.maxint, see What is the bit size of long on 64-bit Windows?. Python uses the long type internally to store a small integer on Python 2.x.

edited May 23, 2017 at 12:24

CommunityBot

11 silver badge

answered Nov 17, 2011 at 13:43

casevh

11.5k1 gold badge29 silver badges37 bronze badges

Collectives™ on Stack Overflow

Python sys.maxint, sys.maxunicode on Linux and windows

2 Answers 2

8 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

8 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related