How can I write a regex that matches only letters?
24 Answers
Use a character set: [a-zA-Z] matches one letter from A–Z in lowercase and uppercase. [a-zA-Z]+ matches one or more letters and ^[a-zA-Z]+$ matches only strings that consist of one or more letters only (^ and $ mark the begin and end of a string respectively).
If you want to match other letters than A–Z, you can either add them to the character set: [a-zA-ZäöüßÄÖÜ]. Or you use predefined character classes like the Unicode character property class \p{L} that describes the Unicode characters that are letters.
14 Comments
\p{L} matches anything that is a Unicode letter if you're interested in alphabets beyond the Latin one
8 Comments
\p as "Printable character".\p{L}\p{M}*+ to cover letters made up of multiple codepoints, e.g. a letter followed by accent marks. As per regular-expressions.info/unicode.htmlu after regex to detect the unicode group: /\p{Letter}/guDepending on your meaning of "character":
[A-Za-z]- all letters (uppercase and lowercase)
[^0-9]- all non-digit characters
6 Comments
[A-Za-z] are letters then the language that's being used must be specifiedThe closest option available is
[\u\l]+
which matches a sequence of uppercase and lowercase letters. However, it is not supported by all editors/languages, so it is probably safer to use
[a-zA-Z]+
as other users suggest
2 Comments
You would use
/[a-z]/gi
[]checks for any characters between given inputsa-zcovers the entire alphabetgglobally throughout the whole stringigetting upper and lowercase
1 Comment
In python, I have found the following to work:
[^\W\d_]
This works because we are creating a new character class (the []) which excludes (^) any character from the class \W (everything NOT in [a-zA-Z0-9_]), also excludes any digit (\d) and also excludes the underscore (_).
That is, we have taken the character class [a-zA-Z0-9_] and removed the 0-9 and _ bits. You might ask, wouldn't it just be easier to write [a-zA-Z] then, instead of [^\W\d_]? You would be correct if dealing only with ASCII text, but when dealing with unicode text:
\W
Matches any character which is not a word character. This is the opposite of \w. > If the ASCII flag is used this becomes the equivalent of [^a-zA-Z0-9_].
^ from the python re module documentation
That is, we are taking everything considered to be a word character in unicode, removing everything considered to be a digit character in unicode, and also removing the underscore.
For example, the following code snippet
import re
regex = "[^\W\d_]"
test_string = "A;,./>>?()*)&^*&^%&^#Bsfa1 203974"
re.findall(regex, test_string)
Returns
['A', 'B', 's', 'f', 'a']
5 Comments
çéàñ. Your regex is less readable than \p{L}re module doesn't support Unicode properties. You have to use the re.UNICODE flag for Unicode support. Hence the [^\W\d_] pattern, which is the closest thing for "any letter" in Python's regex engine.Java:
String s= "abcdef";
if(s.matches("[a-zA-Z]+")){
System.out.println("string only contains letters");
}
2 Comments
ŹŻŚĄRegular expression which few people has written as "/^[a-zA-Z]$/i" is not correct because at the last they have mentioned /i which is for case insensitive and after matching for first time it will return back. Instead of /i just use /g which is for global and you also do not have any need to put ^ $ for starting and ending.
/[a-zA-Z]+/g
- [a-z_]+ match a single character present in the list below
- Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed
- a-z a single character in the range between a and z (case sensitive)
- A-Z a single character in the range between A and Z (case sensitive)
- g modifier: global. All matches (don't return on first match)
Comments
/[a-zA-Z]+/
Super simple example. Regular expressions are extremely easy to find online.
Comments
For PHP, following will work fine
'/^[a-zA-Z]+$/'
1 Comment
Just use \w or [:alpha:]. It is an escape sequences which matches only symbols which might appear in words.
4 Comments
\w may not be a good solution in all cases. At least in PCRE, \w can match other characters as well. Quoting the PHP manual: "A "word" character is any letter or digit or the underscore character, that is, any character which can be part of a Perl "word". The definition of letters and digits is controlled by PCRE's character tables, and may vary if locale-specific matching is taking place. For example, in the "fr" (French) locale, some character codes greater than 128 are used for accented letters, and these are matched by \w.".\w means match letters and numbers1 Comment
So, I've been reading a lot of the answers, and most of them don't take exceptions into account, like letters with accents or diaeresis (á, à, ä, etc.).
I made a function in typescript that should be pretty much extrapolable to any language that can use RegExp. This is my personal implementation for my use case in TypeScript. What I basically did is add ranges of letters with each kind of symbol that I wanted to add. I also converted the char to upper case before applying the RegExp, which saves me some work.
function isLetter(char: string): boolean {
return char.toUpperCase().match('[A-ZÀ-ÚÄ-Ü]+') !== null;
}
If you want to add another range of letters with another kind of accent, just add it to the regex. Same goes for special symbols.
I implemented this function with TDD and I can confirm this works with, at least, the following cases:
character | isLetter
${'A'} | ${true}
${'e'} | ${true}
${'Á'} | ${true}
${'ü'} | ${true}
${'ù'} | ${true}
${'û'} | ${true}
${'('} | ${false}
${'^'} | ${false}
${"'"} | ${false}
${'`'} | ${false}
${' '} | ${false}
2 Comments
firstLetter-lastLetter). To make sure that it works, you can implement a test that checks your use cases.If you mean any letters in any character encoding, then a good approach might be to delete non-letters like spaces \s, digits \d, and other special characters like:
[!@#\$%\^&\*\(\)\[\]:;'",\. ...more special chars... ]
Or use negation of above negation to directly describe any letters:
\S \D and [^ ..special chars..]
Pros:
- Works with all regex flavors.
- Easy to write, sometimes save lots of time.
Cons:
- Long, sometimes not perfect, but character encoding can be broken as well.
Comments
You can try this regular expression : [^\W\d_] or [a-zA-Z].
4 Comments
[^\W|\d] means[^\W|\d] means not \W and not | and not \d. It has the same net effect since | is part of \W but the | does not work as you think it does. Even then that means it accepts the _ character. You are probably looking for [^\W\d_]_. But "NOT" | is equal than "AND", so [^\W|\d] means : NOT \W AND NOT \d[^ab] means not a and not b. [^a|b] means not a and not | and not b. To give a second example [a|b|c|d] is exactly the same as [abcd|||] which is exactly the same as [abcd|] - all of which equate to ([a]|[b]|[c]|[d]|[|]) the | is a literal character, not an OR operator. The OR operator is implied between each character in a character class, putting an actual | means you want the class to accept the | (pipe) character.Lately I have used this pattern in my forms to check names of people, containing letters, blanks and special characters like accent marks.
pattern="[A-zÀ-ú\s]+"
1 Comment
A-z matches more than just letters, as well as À-úJavaScript
If you want to return matched letters:
('Example 123').match(/[A-Z]/gi) // Result: ["E", "x", "a", "m", "p", "l", "e"]
If you want to replace matched letters with stars ('*') for example:
('Example 123').replace(/[A-Z]/gi, '*') //Result: "****** 123"*
1 Comment
/\p{Letter}/gu ref: developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/…/^[A-z]+$/.test('asd')
// true
/^[A-z]+$/.test('asd0')
// false
/^[A-z]+$/.test('0asd')
// false
1 Comment
All answers here have a caveats and a Western/US-Bias, which in this case is avoidable. So here is an overview of downsides of solutions given here, followed by an international solution:
matching with [a-zA-Z]+ is mostly fine for English (and maybe Italian). Note though, that English text does use accents by preserving them in foreign words (like café), names (like José) or alternative spellings (like zoölogy).
Explicit extensions of this character list like [a-zA-ZäÄöÖüÜß]+ for German are possible but suffer from the same problem as the English solution.
\p{L} has been dropped here and there, it matches "all Unicode letters" (from all languages)
If your regex-flavor allows it, use Unicode Scripts. Every unicode character is assigned to a 'script', like Latin, Thai or Hiragana. To match a character using this property, use this syntax (The link contains a list with many more scripts): \p{Latin} \p{Hiragana} \p{Greek} ...
There is also \p{L} for all letters: 汉字äひら
So where is this syntax available? Check your language/library.
RE2, PCRE and PCRE2 support it. That covers languages like PHP, R and GoLang.
In Python, the stdlib-regex-module re does not support it, but the third-party regex-module does.
Java has the \p-feature, but Scripts are not available. So \p{Latin} or \p{Arabic} don't work, but \p{L} will.
JavaScript has the \p-feature, but it needs to be enabled with the v or u suffix to the regex, like /\p{L}/u. Scripts like \p{Latin} are unsupported, so it behaves like Java here.
Comments
pattern = /[a-zA-Z]/
puts "[a-zA-Z]: #{pattern.match("mine blossom")}" OK
puts "[a-zA-Z]: #{pattern.match("456")}"
puts "[a-zA-Z]: #{pattern.match("")}"
puts "[a-zA-Z]: #{pattern.match("#$%^&*")}"
puts "[a-zA-Z]: #{pattern.match("#$%^&*A")}" OK
1 Comment
The answers here either do not cover all possible letters, or are incomplete.
Complete regex to match ONLY unicode LETTERS, including those made up of multiple codepoints:
^(\p{L}\p{M}*)+$
(based on @ZoFreX comment)
Test it here: https://regex101.com/r/Mo5qdq/1
characters? ASCII? Kanji? Iso-XXXX-X? UTF8?regex? Perl? Emacs? Grep?/\p{L}+/u