5

I want to check if a string only contains A-Z and a-z and 0-9 and underscore and dash (_ -)

Any other special signs like !"#\% should not be contained

How can I write the regular expression?

and use match or ?

My strings look like these: QOIWU_W QWLJ2-1

5 Answers 5

9

Yes, re.match seems like a good match (pardon the pun). As for the regular expression, how about something like this: '[A-Za-z0-9-_]*'?

Sign up to request clarification or add additional context in comments.

1 Comment

You can also use [\w-] instead of [A-Za-z0-9-_]
9

Using re doesn't harm in any way, but just for scientific curiosity, another approach that doesn't require you to pass through re is using sets:

>>> valid = set('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_ ')
>>> def test(s):
...    return set(s).issubset(valid)
... 
>>> test('ThiS iS 4n example_sentence that should-pass')
True
>>> test('ThiS iS 4n example_sentence that should fail!!')
False

For conciseness, the testing function could also be written:

>>> def test(s):
...    return set(s) <= valid

EDIT: A bit of timing for the sake of curiosity (times are in seconds, for each test implementation it runs three sets of iterations):

>>> T(lambda : re.match(r'^[a-zA-Z0-9-_]*$', s)).repeat()
[1.8856699466705322, 1.8666279315948486, 1.8670001029968262]
>>> T(lambda : set(y) <= valid).repeat()
[3.595816135406494, 3.568570852279663, 3.564558982849121]
>>> T(lambda : all([c in valid for c in y])).repeat()
[6.224508047103882, 6.2116711139678955, 6.209425926208496]

2 Comments

You don't need the list calls to get the sets of characters.
@MichaelJ.Barber - Thank you, fixed (and it took off 1 sec from the timings...)
1

You can use the regular expression module.

import re
if (re.match('^[a-zA-Z0-9-_]*$',testString)):
    //successful match

3 Comments

What kind of Python has that syntax?
@Oliver thank you, but I guess, ^ and $ are required in PHP, not in python.
@manxing Not so. ^ and $ mark the start and end of the string.
0

No need to go regexp.

import string

# build a string containing all valid characters
match=string.ascii_letters + string.digits + '_' + '-' + ' '

In [25]: match
Out[25]: 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_- '

test='QOIWU_W QWLJ2-'

In [22]: all([c in match for c in test])
Out[22]: True

In [23]: test2='abc ;'

In [24]: all([c in match for c in test2])
Out[24]: False

1 Comment

time for in is linear with the length of the search string so it wasn't a major surprise. Thanks for the benchmark though!
-1
import re
re.search('[^a-zA-Z0-9-_]+', your_string) == None

re.search() will return a match object if it comes across any instances of one or more non-alphanumeric characters and None otherwise. So you'd be checking if the string is safe.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.