2

I have a string and want to check if it can be used as a valid variable without getting a syntax error. For example

def variableName(string):
    #if string is valid variable name:
        #return True
    #else:
        #return False

input >>> variableName("validVariable")
output >>> True
input >>> variableName("992variable")
output >>> False

I would not like to use the .isidentifier(). I want to make a function of my own.

3 Answers 3

7

The following answer is true only for "old-style" Python-2.7 identifiers;

"validVariable".isidentifier()
#True
"992variable".isidentifier()
#False

Since you changed your question after I posted the answer, consider writing a regular expression:

re.match(r"[_a-z]\w*$", yourstring,flags=re.I)
Sign up to request clarification or add additional context in comments.

9 Comments

OP says:"I would not like to use the .isidentifier(). I want to make a function of my own." So your solution isn't answeing the question, i think. forgive me if im wrong.
Yes. @BOi is correct. your answer does not answer my question. I do not want to use .isidentifier() i want to create my own function
@RadheKrishna You changed your question after I posted my answer. Consider using regular expressions, then. (I modified the answer.)
You can simplify the second part of your regular expression to \\w which matches letters, digits, and _.
@DyZ >>> Ä = 1 >>> print(Ä) 1 (python3 extends identifiers to a bunch of non-ascii characters)
|
5

In Python 3 a valid identifier can have characters outside of ASCII range, as you don't want to use str.isidentifier, you can write your own version of it in Python.

Its specification can be found here: https://www.python.org/dev/peps/pep-3131/#specification-of-language-changes

Implementation:

import keyword
import re
import unicodedata


def is_other_id_start(char):
    """
    Item belongs to Other_ID_Start in
    http://unicode.org/Public/UNIDATA/PropList.txt
    """
    return bool(re.match(r'[\u1885-\u1886\u2118\u212E\u309B-\u309C]', char))


def is_other_id_continue(char):
    """
    Item belongs to Other_ID_Continue in
    http://unicode.org/Public/UNIDATA/PropList.txt
    """
    return bool(re.match(r'[\u00B7\u0387\u1369-\u1371\u19DA]', char))


def is_xid_start(char):

    # ID_Start is defined as all characters having one of
    # the general categories uppercase letters(Lu), lowercase
    # letters(Ll), titlecase letters(Lt), modifier letters(Lm),
    # other letters(Lo), letter numbers(Nl), the underscore, and
    # characters carrying the Other_ID_Start property. XID_Start
    # then closes this set under normalization, by removing all
    # characters whose NFKC normalization is not of the form
    # ID_Start ID_Continue * anymore.

    category = unicodedata.category(char)
    return (
        category in {'Lu', 'Ll', 'Lt', 'Lm', 'Lo', 'Nl'} or
        is_other_id_start(char)
    )


def is_xid_continue(char):
    # ID_Continue is defined as all characters in ID_Start, plus
    # nonspacing marks (Mn), spacing combining marks (Mc), decimal
    # number (Nd), connector punctuations (Pc), and characters
    # carryig the Other_ID_Continue property. Again, XID_Continue
    # closes this set under NFKC-normalization; it also adds U+00B7
    # to support Catalan.

    category = unicodedata.category(char)
    return (
        is_xid_start(char) or
        category in {'Mn', 'Mc', 'Nd', 'Pc'} or
        is_other_id_continue(char)
    )


def is_valid_identifier(name):
    # All identifiers are converted into the normal form NFKC
    # while parsing; comparison of identifiers is based on NFKC.
    name = unicodedata.normalize(
        'NFKC', name
    )

    # check if it's a keyword
    if keyword.iskeyword(name):
        return False

    # The identifier syntax is <XID_Start> <XID_Continue>*.
    if not (is_xid_start(name[0]) or name[0] == '_'):
        return False

    return all(is_xid_continue(char) for char in name[1:])

if __name__ == '__main__':
    # From goo.gl/pvpYg6
    assert is_valid_identifier("a") is True
    assert is_valid_identifier("Z") is True
    assert is_valid_identifier("_") is True
    assert is_valid_identifier("b0") is True
    assert is_valid_identifier("bc") is True
    assert is_valid_identifier("b_") is True
    assert is_valid_identifier("µ") is True
    assert is_valid_identifier("𝔘𝔫𝔦𝔠𝔬𝔡𝔢") is True

    assert is_valid_identifier(" ") is False
    assert is_valid_identifier("[") is False
    assert is_valid_identifier("©") is False
    assert is_valid_identifier("0") is False

You can check CPython and Pypy's implmentation here and here respectively.

Comments

0

You could use a regular expression.

For example:

isValidIdentifier = re.match("[A-Za-z_](0-9A-Za-z_)*",identifier)

Note that his only checks for alphanumeric characters. The actual standard supports other characters. See here: https://www.python.org/dev/peps/pep-3131/

You may also need to exclude reserved words such as def, True, False, ... see here: https://www.programiz.com/python-programming/keywords-identifier

2 Comments

Your regular expression is malformed. I think you meant [0-9A-Za-z_] instead of (0-9A-Za-z_).
Strangely enough it works in IDLE (MacOS). Anyhow DyZ had already provided that answer (with a proper regexp). I didn't notice it before posting.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.