22

I want to use a larger variety of Unicode symbols for variable names in my Python 3 scripts. What characters are acceptable to use in Python 3 variable names?

I recently started using Unicode symbols (such as Greek and Asian symbols) for code obfuscation.

8
  • 10
    just out of curiosity, why? Is 元亀 better than genki as an variable name? Commented Jun 11, 2013 at 12:22
  • 3
    I could then easily distinguish variables. Commented Jun 11, 2013 at 12:26
  • 23
    That sounds like something you should cover with naming conventions, unless you can guarantee you'll never have a maintainer or contributer who doesn't understand one of the languages you use. Commented Jun 11, 2013 at 12:32
  • 3
    I know that using odd symbols is not customary, but if we keep programming traditionally, then we keep get traditional programs. We need to think outside-of-the-box. Commented Jun 11, 2013 at 12:50
  • 4
    @DevynCollierJohnson There are other ways to break traditions which don't affect readability. Commented Jun 25, 2013 at 7:39

2 Answers 2

27

According to PEP 3131, the first character of an identifier needs to belong to ID_Start, the rest to ID_Continue, defined as follows:

ID_Start is defined as all characters having one of the general categories uppercase letters (Lu), lowercase letters (Ll), titlecase letters (Lt), modifier letters (Lm), other letters (Lo), letter numbers (Nl), the underscore, and characters carrying the Other_ID_Start property. XID_Start then closes this set under normalization, by removing all characters whose NFKC normalization is not of the form ID_Start ID_Continue* anymore.

ID_Continue is defined as all characters in ID_Start, plus nonspacing marks (Mn), spacing combining marks (Mc), decimal number (Nd), connector punctuations (Pc), and characters carryig the Other_ID_Continue property. Again, XID_Continue closes this set under NFKC-normalization; it also adds U+00B7 to support Catalan.

That's a long list (currently around 120.000 characters) - fortunately there is a helpful project on GitHub that contains the list and a script to generate it.

Sign up to request clarification or add additional context in comments.

10 Comments

Where can I find the list of symbols that match \w?
Why are useful characters, like 🍉 (watermelon), not included?
It is really frustrating, that we can use glagolitic characters or viking runes to start our variable names, yet we cannot use pretty common symbols that you can type on most mobile devices and input on most computers. I get that we shouldn't start the variable names with numbers or math symbols with special meaning, but I think emojis would make damn good variable names in plenty of cases.
@VickiB: That's impossible because % is the modulo operator and thus can't be part of a variable name, just like you can't use + or -.
|
0

Cyrillic letters are allowed, but I don't know whether they would work on every machine.

I wrote a short script to demonstrate Unicode support for Cyrillic.

If it prints "Всем привет!" to the console, then your computer supports Cyrillic identifiers.

# тест кирилица

# Это программа тестирует, если Ваш компьютер
# корректно работает с кириллическим шрифтом


привет = "Всем привет!"

def скажи_привет (мой_привет):
    print (мой_привет)

скажи_привет (привет)

1 Comment

Why wouldn’t it work? See the accepted answer from 2013: stackoverflow.com/a/17043983/735926

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.