How to write a regex in title case - SAP Information Steward

Question

I'm working with an SAP Information Steward and creating a rule where names will have to be in title case (i.e. each word is capitalized).

I've formulated the following rule:

BEGIN

IF(match_regex($name, '(^(\b[A-Z]\w*\s*)+$)', null)) RETURN TRUE;

ELSE RETURN FALSE;

END

Although it is successful it appears to accept inputs which should be identified as 'FALSE'. Please see the attached screenshot.

'TesT Name' and 'TEST NAME' should be FALSE but are instead passing under this regex.

Any help/guidance with the regex would be very useful.

@Wiktor Stribizew That worked! Thank you so much - it worked like a charm — Z.K.I
– Z.K.I, Commented Feb 18, 2019 at 14:44
@WiktorStribiżew I'm guessing in my scenario yes - both digits and underscores can be present. Would the regex then become this: ^[A-Z][a-z0-9_\-]*(\s+[A-Z][a-z0-9_\-]*)*$ — Z.K.I
– Z.K.I, Commented Feb 18, 2019 at 14:54

Wiktor Stribiżew · Accepted Answer · 2019-02-18 19:04:11Z

The (^(\b[A-Z]\w*\s*)+$) regex presents a pattern that matches a string that fully matches:

^ - start of string
(\b[A-Z]\w*\s*)+ - 1 or more occurrences (due to (...)+) of
- \b - a word boundary
- [A-Z] - an uppercase ASCII letter
- \w* - 0 or more letters/digits/underscores
- \s* - 0+ whitespaces
$ - end of string.

As you see, it allows trailing whitespace, and \w matches what [A-Za-z0-9_] matches, i.e. it matches both lower- and uppercase letters.

You want to only match lowercase letters after initial uppercase ones, also allowing - and _ chars. You may use

^[A-Z][a-z0-9_-]*(\s+[A-Z][a-z0-9_-]*)*$

See the regex demo.

Details

^ - start of string anchor
[A-Z][a-z0-9_-]* - an uppercase letter followed with 0+ lowercase letters, digits, _ or - chars
(\s+[A-Z][a-z0-9_-]*)* - zero or more occurrences of:
- \s+ - 1 or more whitespaces
- [A-Z][a-z0-9_-]* - an uppercase letter followed with 0+ lowercase letters, digits, _ or - chars
$ - end of string.

Tim Biegeleisen · Accepted Answer · 2019-02-18 15:47:22Z

0

I would write your regex as:

^[A-Z]\w*(?:\s+[A-Z]\w*)*$

This says to match a single word starting with a capital letter, then followed by one or more spaces and another word starting with a capital, this quantity zero or more times.

I phrase a matching word as starting with [A-Z] followed by \w*, meaning zero or more word characters. This allows for things like A to match.

Demo

Edit:

Based on the comments above, if you want some other character class to represent what follows the initial uppercase letter, then do that instead:

^[A-Z][something]*(?:\s+[A-Z][something]*)*$

where [something] is your character class.

edited Feb 18, 2019 at 15:47

answered Feb 18, 2019 at 14:35

Tim Biegeleisen

526k32 gold badges323 silver badges399 bronze badges

3 Comments

Z.K.I Over a year ago

Thank you so much for your fast response. It makes a lot more sense. But it still appears with the same results when I'm applying the rule to my test variables. Could it possible just be the software i'm using?

Wiktor Stribiżew Over a year ago

@Z.K.I ^[A-Z]\w*(?:\s+[A-Z]\w*)*$ is basically the same pattern as yours, but does not allow trailing whitespaces.

Tim Biegeleisen Over a year ago

@Z.K.I If you want to allow for trailing whitespace, I can modify my pattern. I assumed that you would not want this.

Collectives™ on Stack Overflow

How to write a regex in title case - SAP Information Steward

2 Answers 2

Comments

Demo

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related