0

I'm working with an SAP Information Steward and creating a rule where names will have to be in title case (i.e. each word is capitalized).

I've formulated the following rule:

BEGIN

IF(match_regex($name, '(^(\b[A-Z]\w*\s*)+$)', null)) RETURN TRUE;

ELSE RETURN FALSE;

END

Although it is successful it appears to accept inputs which should be identified as 'FALSE'. Please see the attached screenshot.

'TesT Name' and 'TEST NAME' should be FALSE but are instead passing under this regex.

Any help/guidance with the regex would be very useful.

7
  • 1
    \w matches both cases. Change it to [a-z]. Commented Feb 18, 2019 at 14:34
  • ^[A-Z][a-z]*(\s+[A-Z][a-z]*)*$ (demo) should do. Commented Feb 18, 2019 at 14:37
  • @Wiktor Stribizew That worked! Thank you so much - it worked like a charm Commented Feb 18, 2019 at 14:44
  • Can there be digits in the names? Or underscores? Commented Feb 18, 2019 at 14:49
  • @WiktorStribiżew I'm guessing in my scenario yes - both digits and underscores can be present. Would the regex then become this: ^[A-Z][a-z0-9_\-]*(\s+[A-Z][a-z0-9_\-]*)*$ Commented Feb 18, 2019 at 14:54

2 Answers 2

1

The (^(\b[A-Z]\w*\s*)+$) regex presents a pattern that matches a string that fully matches:

  • ^ - start of string
  • (\b[A-Z]\w*\s*)+ - 1 or more occurrences (due to (...)+) of
    • \b - a word boundary
    • [A-Z] - an uppercase ASCII letter
    • \w* - 0 or more letters/digits/underscores
    • \s* - 0+ whitespaces
  • $ - end of string.

As you see, it allows trailing whitespace, and \w matches what [A-Za-z0-9_] matches, i.e. it matches both lower- and uppercase letters.

You want to only match lowercase letters after initial uppercase ones, also allowing - and _ chars. You may use

^[A-Z][a-z0-9_-]*(\s+[A-Z][a-z0-9_-]*)*$

See the regex demo.

Details

  • ^ - start of string anchor
  • [A-Z][a-z0-9_-]* - an uppercase letter followed with 0+ lowercase letters, digits, _ or - chars
  • (\s+[A-Z][a-z0-9_-]*)* - zero or more occurrences of:
    • \s+ - 1 or more whitespaces
    • [A-Z][a-z0-9_-]* - an uppercase letter followed with 0+ lowercase letters, digits, _ or - chars
  • $ - end of string.
Sign up to request clarification or add additional context in comments.

Comments

0

I would write your regex as:

^[A-Z]\w*(?:\s+[A-Z]\w*)*$

This says to match a single word starting with a capital letter, then followed by one or more spaces and another word starting with a capital, this quantity zero or more times.

I phrase a matching word as starting with [A-Z] followed by \w*, meaning zero or more word characters. This allows for things like A to match.

Demo

Edit:

Based on the comments above, if you want some other character class to represent what follows the initial uppercase letter, then do that instead:

^[A-Z][something]*(?:\s+[A-Z][something]*)*$

where [something] is your character class.

3 Comments

Thank you so much for your fast response. It makes a lot more sense. But it still appears with the same results when I'm applying the rule to my test variables. Could it possible just be the software i'm using?
@Z.K.I ^[A-Z]\w*(?:\s+[A-Z]\w*)*$ is basically the same pattern as yours, but does not allow trailing whitespaces.
@Z.K.I If you want to allow for trailing whitespace, I can modify my pattern. I assumed that you would not want this.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.