Regex to match only letters

Question

How can I write a regex that matches only letters?

What's your definition of characters? ASCII? Kanji? Iso-XXXX-X? UTF8? — Ivo Wetzel
– Ivo Wetzel, Commented Sep 1, 2010 at 12:10
I have noticed that \p{L} for a letter and /u flag for the Unicode matches any letter in my regex i.e. /\p{L}+/u — MaxZoom
– MaxZoom, Commented Sep 26, 2019 at 16:59

Gumbo · Accepted Answer · 2010-09-01 12:17:11Z

615

Use a character set: [a-zA-Z] matches one letter from A–Z in lowercase and uppercase. [a-zA-Z]+ matches one or more letters and ^[a-zA-Z]+$ matches only strings that consist of one or more letters only (^ and $ mark the begin and end of a string respectively).

If you want to match other letters than A–Z, you can either add them to the character set: [a-zA-ZäöüßÄÖÜ]. Or you use predefined character classes like the Unicode character property class \p{L} that describes the Unicode characters that are letters.

edited Sep 1, 2010 at 12:17

answered Sep 1, 2010 at 12:09

Gumbo

657k112 gold badges792 silver badges852 bronze badges

Sign up to request clarification or add additional context in comments.

14 Comments

Joachim Sauer Over a year ago

That's a very ASCII-centric solution. This will break on pretty much any non-english text.

Gumbo Over a year ago

@Joachim Sauer: It will rather break on languages using non-latin characters.

Ivo Wetzel Over a year ago

Already breaks on 90% of German text, don't even mention French or Spanish. Italian might still do pretty well though.

Joachim Sauer Over a year ago

that depends on what definition of "latin character" you choose. J, U, Ö, Ä can all be argued to be latin characters or not, based on your definition. But they are all used in languages that use the "latin alphabet" for writing.

Radu Simionescu Over a year ago

\p{L} matches all the umlauts sedilla accents etc, so you should go with that.

|

RobV · Accepted Answer · 2010-09-01 12:10:31Z

285

\p{L} matches anything that is a Unicode letter if you're interested in alphabets beyond the Latin one

answered Sep 1, 2010 at 12:10

RobV

28.8k11 gold badges84 silver badges122 bronze badges

8 Comments

Philip Potter Over a year ago

not in all regex flavours. For example, vim regexes treat \p as "Printable character".

Philip Potter Over a year ago

this page suggests only java, .net, perl, jgsoft, XML and XPath regexes support \p{L}. But major omissions: python and ruby (though python has the regex module).

Jörg W Mittag Over a year ago

@Philip Potter: Ruby supports Unicode character properties using that exact same syntax.

ZoFreX Over a year ago

I think this should be \p{L}\p{M}*+ to cover letters made up of multiple codepoints, e.g. a letter followed by accent marks. As per regular-expressions.info/unicode.html

jave.web Over a year ago

JavaScript needs u after regex to detect the unicode group: /\p{Letter}/gu

|

António Almeida · Accepted Answer · 2014-02-08 01:28:49Z

75

Depending on your meaning of "character":

[A-Za-z] - all letters (uppercase and lowercase)

[^0-9] - all non-digit characters

edited Feb 8, 2014 at 1:28

António Almeida

10.2k8 gold badges62 silver badges72 bronze badges

answered Sep 1, 2010 at 12:12

Kristof Mols

3,5752 gold badges41 silver badges51 bronze badges

6 Comments

Nike Over a year ago

I meant lettters. It doesn't appear to be working though. preg_match('/[a-zA-Z]+/', $name);

Kristof Mols Over a year ago

[A-Za-z] is just the declaration of characters you can use. You still need to declare howmany times this declaration has to be used: [A-Za-z]{1,2} (to match 1 or 2 letters) or [A-Za-z]{1,*} (to match 1 or more letters)

phuclv Over a year ago

well à, á, ã, Ö, Ä... are letters too, so are অ, আ, ই, ঈ, Є, Ж, З, ﺡ, ﺥ, ﺩא, ב, ג, ש, ת, ... en.wikipedia.org/wiki/Letter_%28alphabet%29

Catalina Chircu Over a year ago

@phuclv: Indeed, but that depends on the encoding, and the encoding is part of the settings of the program (either the default config or the one declared in a config file of the program). When I worked on different languages, I used to store that in a constant, in a config file.

phuclv Over a year ago

@CatalinaChircu encoding is absolutely irrelevant here. Encoding is a way to encode a code point in a character set in binary, for example UTF-8 is an encoding for Unicode. Letters OTOH depends on the language, and if one says [A-Za-z] are letters then the language that's being used must be specified

|

blue note · Accepted Answer · 2014-10-17 11:50:04Z

39

The closest option available is

[\u\l]+

which matches a sequence of uppercase and lowercase letters. However, it is not supported by all editors/languages, so it is probably safer to use

[a-zA-Z]+

as other users suggest

answered Oct 17, 2014 at 11:50

blue note

29.5k10 gold badges83 silver badges110 bronze badges

2 Comments

Nyerguds Over a year ago

Won't match any special characters though.

Eric Soyke Over a year ago

For a long time I had been using [A-z]+ but just noticed this allows a few special characters like ` and [ to slip in. [a-zA-Z]+ is indeed the way to go.

Zachiah · Accepted Answer · 2024-01-08 18:50:18Z

29

You would use

/[a-z]/gi

[] checks for any characters between given inputs
a-z covers the entire alphabet
g globally throughout the whole string
i getting upper and lowercase

edited Jan 8, 2024 at 18:50

Zachiah

2,65713 silver badges32 bronze badges

answered Apr 4, 2016 at 10:01

Scott

2993 silver badges2 bronze badges

1 Comment

Amine KOUIS Over a year ago

For non Latin letter. Here is the regex that worked for me.

Eric Salina · Accepted Answer · 2020-08-20 20:27:50Z

26

In python, I have found the following to work:

[^\W\d_]

This works because we are creating a new character class (the []) which excludes (^) any character from the class \W (everything NOT in [a-zA-Z0-9_]), also excludes any digit (\d) and also excludes the underscore (_).

That is, we have taken the character class [a-zA-Z0-9_] and removed the 0-9 and _ bits. You might ask, wouldn't it just be easier to write [a-zA-Z] then, instead of [^\W\d_]? You would be correct if dealing only with ASCII text, but when dealing with unicode text:

\W

Matches any character which is not a word character. This is the opposite of \w. > If the ASCII flag is used this becomes the equivalent of [^a-zA-Z0-9_].

^ from the python re module documentation

That is, we are taking everything considered to be a word character in unicode, removing everything considered to be a digit character in unicode, and also removing the underscore.

For example, the following code snippet

import re
regex = "[^\W\d_]"
test_string = "A;,./>>?()*)&^*&^%&^#Bsfa1 203974"
re.findall(regex, test_string)

Returns

['A', 'B', 's', 'f', 'a']

answered Aug 20, 2020 at 20:27

Eric Salina

3913 silver badges7 bronze badges

5 Comments

Toto Over a year ago

What about non Latin letter? For example çéàñ. Your regex is less readable than \p{L}

Frederic Over a year ago

Clever answer. Works perfectly for accented letters as well.

Thegerdfather Over a year ago

@Toto Python's re module doesn't support Unicode properties. You have to use the re.UNICODE flag for Unicode support. Hence the [^\W\d_] pattern, which is the closest thing for "any letter" in Python's regex engine.

Amine KOUIS Over a year ago

@Toto Here is the regex that works for non Latin letter in Python.

CCC Over a year ago

how is this simpler than [a-zA-Z]?

Wiktor Stribiżew · Accepted Answer · 2018-04-15 19:39:55Z

18

Java:

String s= "abcdef";

if(s.matches("[a-zA-Z]+")){
     System.out.println("string only contains letters");
}

edited Apr 15, 2018 at 19:39

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

answered Mar 22, 2017 at 17:25

Udeshika Sewwandi

2432 silver badges2 bronze badges

2 Comments

karoluS Over a year ago

it doesn't include diacritic signs such as ŹŻŚĄ

Dimitar Bogdanov Over a year ago

^ or any Cyrillic letters

Yogesh Chauhan · Accepted Answer · 2016-09-13 07:05:40Z

17

Regular expression which few people has written as "/^[a-zA-Z]$/i" is not correct because at the last they have mentioned /i which is for case insensitive and after matching for first time it will return back. Instead of /i just use /g which is for global and you also do not have any need to put ^ $ for starting and ending.

/[a-zA-Z]+/g

[a-z_]+ match a single character present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed
a-z a single character in the range between a and z (case sensitive)
A-Z a single character in the range between A and Z (case sensitive)
g modifier: global. All matches (don't return on first match)

answered Sep 13, 2016 at 7:05

Yogesh Chauhan

3584 silver badges11 bronze badges

Comments

Scott Radcliff · Accepted Answer · 2010-09-01 12:12:41Z

14

/[a-zA-Z]+/

Super simple example. Regular expressions are extremely easy to find online.

http://www.regular-expressions.info/reference.html

answered Sep 1, 2010 at 12:12

Scott Radcliff

1,5511 gold badge9 silver badges13 bronze badges

Comments

Rohit Dubey · Accepted Answer · 2013-11-14 16:22:08Z

13

For PHP, following will work fine

'/^[a-zA-Z]+$/'

answered Nov 14, 2013 at 16:22

Rohit Dubey

1,29415 silver badges16 bronze badges

1 Comment

Amine KOUIS Over a year ago

But, this will not work for Latin character. Check it here.

Amal · Accepted Answer · 2014-06-08 19:53:42Z

11

Just use \w or [:alpha:]. It is an escape sequences which matches only symbols which might appear in words.

edited Jun 8, 2014 at 19:53

Amal

76.8k18 gold badges133 silver badges154 bronze badges

answered May 28, 2014 at 13:33

Agaspher

4943 silver badges10 bronze badges

4 Comments

Amal Over a year ago

\w may not be a good solution in all cases. At least in PCRE, \w can match other characters as well. Quoting the PHP manual: "A "word" character is any letter or digit or the underscore character, that is, any character which can be part of a Perl "word". The definition of letters and digits is controlled by PCRE's character tables, and may vary if locale-specific matching is taking place. For example, in the "fr" (French) locale, some character codes greater than 128 are used for accented letters, and these are matched by \w.".

V-SHY Over a year ago

words include other characters from letters

Eugen Konkov Over a year ago

\w means match letters and numbers

y_159 Over a year ago

how to match words with only alphabet characters?

Tomáš Nedělka · Accepted Answer · 2017-06-27 11:44:42Z

9

Use character groups

\D

Matches any character except digits 0-9

^\D+$

See example here

answered Jun 27, 2017 at 11:44

Tomáš Nedělka

2192 silver badges2 bronze badges

1 Comment

DaveMongoose Over a year ago

This will also match whitespace, symbols, etc. which does not seem to be what the question is asking for.

Javi Marzán · Accepted Answer · 2021-03-26 14:43:24Z

8

So, I've been reading a lot of the answers, and most of them don't take exceptions into account, like letters with accents or diaeresis (á, à, ä, etc.).

I made a function in typescript that should be pretty much extrapolable to any language that can use RegExp. This is my personal implementation for my use case in TypeScript. What I basically did is add ranges of letters with each kind of symbol that I wanted to add. I also converted the char to upper case before applying the RegExp, which saves me some work.

function isLetter(char: string): boolean {
  return char.toUpperCase().match('[A-ZÀ-ÚÄ-Ü]+') !== null;
}

If you want to add another range of letters with another kind of accent, just add it to the regex. Same goes for special symbols.

I implemented this function with TDD and I can confirm this works with, at least, the following cases:

    character | isLetter
    ${'A'}    | ${true}
    ${'e'}    | ${true}
    ${'Á'}    | ${true}
    ${'ü'}    | ${true}
    ${'ù'}    | ${true}
    ${'û'}    | ${true}
    ${'('}    | ${false}
    ${'^'}    | ${false}
    ${"'"}    | ${false}
    ${'`'}    | ${false}
    ${' '}    | ${false}

edited Mar 26, 2021 at 14:43

answered Aug 21, 2020 at 13:19

Javi Marzán

1,35018 silver badges24 bronze badges

2 Comments

Vadim Aidlin Over a year ago

what about Arabic, Hebrew, Japanese etc.?

Javi Marzán Over a year ago

@VadimAidlin then you need to add it to the RegExp string like in the provided code.(firstLetter-lastLetter). To make sure that it works, you can implement a test that checks your use cases.

Amal · Accepted Answer · 2014-06-08 19:53:19Z

6

If you mean any letters in any character encoding, then a good approach might be to delete non-letters like spaces \s, digits \d, and other special characters like:

[!@#\$%\^&\*\(\)\[\]:;'",\. ...more special chars... ]

Or use negation of above negation to directly describe any letters:

\S \D and [^  ..special chars..]

Pros:

Works with all regex flavors.
Easy to write, sometimes save lots of time.

Cons:

Long, sometimes not perfect, but character encoding can be broken as well.

edited Jun 8, 2014 at 19:53

Amal

76.8k18 gold badges133 silver badges154 bronze badges

answered Dec 12, 2013 at 12:48

Sławomir Lenart

8,6335 gold badges51 silver badges64 bronze badges

Comments

Motlab · Accepted Answer · 2014-07-29 07:33:48Z

5

You can try this regular expression : [^\W\d_] or [a-zA-Z].

edited Jul 29, 2014 at 7:33

answered Jul 25, 2014 at 13:27

Motlab

711 silver badge3 bronze badges

4 Comments

OGHaza Over a year ago

That is not what [^\W|\d] means

OGHaza Over a year ago

[^\W|\d] means not \W and not | and not \d. It has the same net effect since | is part of \W but the | does not work as you think it does. Even then that means it accepts the _ character. You are probably looking for [^\W\d_]

Motlab Over a year ago

I agree with you, it accepts the _. But "NOT" | is equal than "AND", so [^\W|\d] means : NOT \W AND NOT \d

OGHaza Over a year ago

[^ab] means not a and not b. [^a|b] means not a and not | and not b. To give a second example [a|b|c|d] is exactly the same as [abcd|||] which is exactly the same as [abcd|] - all of which equate to ([a]|[b]|[c]|[d]|[|]) the | is a literal character, not an OR operator. The OR operator is implied between each character in a character class, putting an actual | means you want the class to accept the | (pipe) character.

cblnpa · Accepted Answer · 2020-02-11 18:27:47Z

5

Lately I have used this pattern in my forms to check names of people, containing letters, blanks and special characters like accent marks.

pattern="[A-zÀ-ú\s]+"

answered Feb 11, 2020 at 18:27

cblnpa

5372 gold badges10 silver badges22 bronze badges

1 Comment

Toto Over a year ago

You should have look at an ASCII table. A-z matches more than just letters, as well as À-ú

Predrag Davidovic · Accepted Answer · 2020-07-10 10:42:12Z

2

JavaScript

If you want to return matched letters:

('Example 123').match(/[A-Z]/gi) // Result: ["E", "x", "a", "m", "p", "l", "e"]

If you want to replace matched letters with stars ('*') for example:

('Example 123').replace(/[A-Z]/gi, '*') //Result: "****** 123"*

edited Jul 10, 2020 at 10:42

answered Jul 10, 2020 at 10:25

Predrag Davidovic

1,5762 gold badges18 silver badges24 bronze badges

1 Comment

jave.web Over a year ago

For letters beyond english: /\p{Letter}/gu ref: developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/…

jarraga · Accepted Answer · 2020-08-16 16:56:35Z

2

/^[A-z]+$/.test('asd')
// true

/^[A-z]+$/.test('asd0')
// false

/^[A-z]+$/.test('0asd')
// false

answered Aug 16, 2020 at 16:56

jarraga

4486 silver badges9 bronze badges

1 Comment

ndrwnaguib Over a year ago

Hello @jarraga. Welcome to SO, did you read how to answer a question?. It should assist the clearance of your answer, and hence avoid down voting.

Zoidbergseasharp · Accepted Answer · 2023-09-22 14:49:02Z

2

[A-Za-zÀ-ÿ]

Note: western bias.

edited Sep 22, 2023 at 14:49

answered Sep 22, 2023 at 14:41

Zoidbergseasharp

4,6681 gold badge19 silver badges22 bronze badges

Comments

julaine · Accepted Answer · 2025-09-30 10:08:35Z

All answers here have a caveats and a Western/US-Bias, which in this case is avoidable. So here is an overview of downsides of solutions given here, followed by an international solution:

matching with [a-zA-Z]+ is mostly fine for English (and maybe Italian). Note though, that English text does use accents by preserving them in foreign words (like café), names (like José) or alternative spellings (like zoölogy).

Explicit extensions of this character list like [a-zA-ZäÄöÖüÜß]+ for German are possible but suffer from the same problem as the English solution.

\p{L} has been dropped here and there, it matches "all Unicode letters" (from all languages)

If your regex-flavor allows it, use Unicode Scripts. Every unicode character is assigned to a 'script', like Latin, Thai or Hiragana. To match a character using this property, use this syntax (The link contains a list with many more scripts): \p{Latin} \p{Hiragana} \p{Greek} ...

There is also \p{L} for all letters: 汉字äひら So where is this syntax available? Check your language/library.

RE2, PCRE and PCRE2 support it. That covers languages like PHP, R and GoLang.

In Python, the stdlib-regex-module re does not support it, but the third-party regex-module does.

Java has the \p-feature, but Scripts are not available. So \p{Latin} or \p{Arabic} don't work, but \p{L} will.

JavaScript has the \p-feature, but it needs to be enabled with the v or u suffix to the regex, like /\p{L}/u. Scripts like \p{Latin} are unsupported, so it behaves like Java here.

Snm Maurya · Accepted Answer · 2014-06-30 05:36:26Z

1

pattern = /[a-zA-Z]/

puts "[a-zA-Z]: #{pattern.match("mine blossom")}" OK

puts "[a-zA-Z]: #{pattern.match("456")}"

puts "[a-zA-Z]: #{pattern.match("")}"

puts "[a-zA-Z]: #{pattern.match("#$%^&*")}"

puts "[a-zA-Z]: #{pattern.match("#$%^&*A")}" OK

answered Jun 30, 2014 at 5:36

Snm Maurya

1,15510 silver badges13 bronze badges

1 Comment

The Witness Over a year ago

And what about for instance, “Zażółć gęslą jaźń”?

Bersan · Accepted Answer · 2023-08-17 08:55:42Z

1

The answers here either do not cover all possible letters, or are incomplete.

Complete regex to match ONLY unicode LETTERS, including those made up of multiple codepoints:

^(\p{L}\p{M}*)+$

(based on @ZoFreX comment)

Test it here: https://regex101.com/r/Mo5qdq/1

edited Aug 17, 2023 at 8:55

answered Jul 28, 2023 at 20:02

Bersan

3,6013 gold badges29 silver badges40 bronze badges

Comments

Erfan Eghterafi · Accepted Answer · 2023-08-16 15:34:59Z

0

This one works for me, ONLY unicode characters (not valid for numbers, special characters, emojis ...)

// notice: unicode: true
RegExp(r"^[\p{L}\p{M} ]*$", unicode: true)

answered Aug 16, 2023 at 15:34

Erfan Eghterafi

5,9951 gold badge40 silver badges48 bronze badges

Comments

Alan Moore · Accepted Answer · 2016-05-24 03:48:52Z

-2

Pattern pattern = Pattern.compile("^[a-zA-Z]+$");

if (pattern.matcher("a").find()) {

   ...do something ......
}

edited May 24, 2016 at 3:48

Alan Moore

75.6k13 gold badges109 silver badges161 bronze badges

answered May 23, 2016 at 23:26

Fikreselam Elala

2152 silver badges2 bronze badges

Collectives™ on Stack Overflow

Regex to match only letters

24 Answers 24

14 Comments

8 Comments

6 Comments

2 Comments

1 Comment

5 Comments

2 Comments

Comments

Comments

1 Comment

4 Comments

1 Comment

2 Comments

Comments

4 Comments

1 Comment

1 Comment

1 Comment

Comments

Comments

1 Comment

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

24 Answers 24

14 Comments

8 Comments

6 Comments

2 Comments

1 Comment

5 Comments

2 Comments

Comments

Comments

1 Comment

4 Comments

1 Comment

2 Comments

Comments

4 Comments

1 Comment

1 Comment

1 Comment

Comments

Comments

1 Comment

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related