Regex to capture string except last 2 letters in PCRE

Question

I have a strings test01, abcd02, xyz05 from those strings, I will have last 2 characters are always numbers. From those strings, I want a regex expression to capture test, abcd, xyx. How can I capture it?

Something like (test|abcd|xyz)\d\d. What stopped you from writing this yourself? — Luatic
– Luatic, Commented Sep 6, 2023 at 17:54
@Luatic i just gave those strings as an example, sorry if i was not clear. I need to capture the word except last 2 letters is my ask here. — DevOpsWorld
– DevOpsWorld, Commented Sep 6, 2023 at 17:55
So (.*)..$ will capture everything before the last 2 characters. — Barmar
– Barmar, Commented Sep 6, 2023 at 17:56
Obviously it works, it's trivial. What problem were you having coming up with this? — Barmar
– Barmar, Commented Sep 6, 2023 at 19:16

Patrick Janser · Accepted Answer · 2023-09-07 08:44:49Z

2

A few questions:

Could your string have more or less than 2 digits?
If it's fixed to two digits, then why not just dropping the 2 last chars and not use a regular expression?
Is it because we have to validate the input? Typically, what about "# @123"?

If you have to check that it's ending with digits, then don't use the solution (.*)..$ proposed in the comments as . matches any character and you'll get, for example, "Hel" out of "Hello". It has the same effect as just truncating your string.

I would personally be more precise and also take in consideration only words, to avoid matching something like "12345" or "!#@123".

I would suggest this:

/^(\p{L}+)\d+$/u

Explanation:

The u flag at the end is for unicode, so that you can handle special chars, such as emojis or other special characters, not knowing what is your input text.
With PCRE, you can use unicode character classes. This can help you match a word character in any language with \p{L}, which means Letter. It's about the same as \w but with the handling of multiple codepoint sequences.
If the end of your string must be digits then you can use \d+. If it really has to be only 2 digits, then replace it by \d{2}.

const strings = [
  'test01',  // Ok
  'abcd02',  // Ok
  'test123', // More than 2 digits, perhaps ok also?
  'vidéo05', // Accented chars in the word, ok or not?
  '123456',  // Only digits => should it match? maybe not!
  '####03',  // Not word chars before the digits... hmm, no match.
  'Hello'    // No digits at all... no match.
];

const regex = /^(\p{L}+)\d+$/u;

strings.forEach(string => {
  const match = regex.exec(string);
  if (match) {
    console.log(`Word found in "${string}" is "${match[1]}"`);
  }
  else {
    console.log(`Does NOT match "${string}"`);
  }
});

With PCRE you'll get the same: https://regex101.com/r/bvY3dg/1

edited Sep 7, 2023 at 8:44

answered Sep 7, 2023 at 7:27

Patrick Janser

4,4631 gold badge20 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

halfer Over a year ago

An exceptionally good answer for an exceptionally lazy question. I don't have your tolerance, but this is good stuff - well done.

Patrick Janser Over a year ago

@halfer Thanks a lot! Yes, you are totally right, I probably shouldn't have replied to the rather lazy question. He probably even didn't see my answer, haha. I'll be less tolerant in the future, as you say!

halfer Over a year ago

It's actually fine - since you have some upvotes I think the question won't be deleted. But some questions are so terrible that folks will vote to delete, and any good answers will vanish with them.

halfer Over a year ago

It perhaps is not a requirement that any given question author responds to or appreciates an answer, since good ones will be appreciated by future readers anyway.

halfer · Accepted Answer · 2023-09-15 20:51:21Z

0

This regex worked for me:

(.*)..$

https://regex101.com/r/MF7S5A/1

edited Sep 15, 2023 at 20:51

halfer

20.2k20 gold badges110 silver badges207 bronze badges

answered Sep 15, 2023 at 20:48

DevOpsWorld

111 silver badge5 bronze badges

1 Comment

halfer Over a year ago

That pattern is too generous - the two dots at the end anchor will match any characters, not just digits.

Collectives™ on Stack Overflow

Regex to capture string except last 2 letters in PCRE

2 Answers 2

4 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related