regular expression for title case - Python

Question

I need to find a combination of 2 consecutive title case words.

This is my code so far,

text='Hi my name is Moh Shai and This Is a Python Code with Regex and Needs Some Expertise'

rex=r'[A-Z][a-z]+\s+[A-Z][a-z]+'

re.findall(rex,text)

This gives me,

['Moh Shai', 'This Is', 'Python Code', 'Needs Some']

However, I need all the combinations. Something like,

['Moh Shai', 'This Is', 'Python Code', 'Needs Some','Some Expertise']

Can someone please help?

If you can install a third-party module, the easiest way is with the regex module, which supports an overlapped=True flag on findall(). — kindall
– kindall, Commented Apr 19, 2016 at 23:39
@kindall you are awesome. That works great! Can you please post an answer so I may accept? — Md. Mohsin
– Md. Mohsin, Commented Apr 19, 2016 at 23:41

Right Of Zen · Accepted Answer · 2016-04-19 23:38:13Z

4

You can use a regex lookahead in combination with the re.finditer function in order to get the desired outcome:

import re

text='Hi my name is Moh Shai and This Is a Python Code with Regex and Needs Some Expertise'
rex=r'(?=([A-Z][a-z]+\s+[A-Z][a-z]+))'

matches = re.finditer(rex,text)
results = [match.group(1) for match in matches]

Now results will contain the information you need:

>>> results
['Moh Shai', 'This Is', 'Python Code', 'Needs Some', 'Some Expertise']

edit: For what it's worth, you don't even really need the finditer function. You can replace those bottom two lines with your previous line re.findall(rex,text) for the same effect.

answered Apr 19, 2016 at 23:38

Right Of Zen

9231 gold badge8 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Uri Goren Over a year ago

This answer identifies only Title Case of 2 words, it would fail on "The United States Of America"

Right Of Zen Over a year ago

Yes, as requested in the question.

Uri Goren · Accepted Answer · 2017-08-13 09:03:46Z

I came to this question by It's title and was disappointed that the solution wasn't what I expected.

The accepted answer only works for titles of exactly 2 words

This code would return all of the tokens that are in title case, without assuming anything on the amount of words in the title

import re, collections
def title_case_to_token(c):
    totoken = lambda s: s[0] + "<" + s[1:-2].replace(" ","_") + ">" + s[-2:]
    tokenized = re.sub("([\s\.\,;]([A-Z][a-z]+[\s\.\,;])+[^A-Z])", lambda m: totoken(m.group(0))," " + c + " x")[1:-2]
    tokens = collections.Counter(re.compile("<\w+>").findall(tokenized))
    return (tokens, tokenized)

For example

text='Hi my name is Moh Shai and This Is a Python Code with Regex and Needs Some Expertise'
tokens, tokenized = title_case_to_token(text)

Value of tokens

Counter({'<Hi>': 1, '<Moh_Shai>': 1, '<This_Is>': 1, '<Python_Code>': 1, '<Regex>': 1, '<Needs_Some_Expertise>': 1})

Note that `Needs_Some_Expertise` is also caught by this regex, and it has 3 words

Value of tokenized

<Hi> my name is <Moh_Shai> and <This_Is> a <Python_Code> with <Regex> and <Needs_Some_Expertise>

kindall · Accepted Answer · 2016-04-20 00:46:04Z

1

If you can install a third-party module, the easiest way is with the regex module, which supports an overlapped=True flag on findall().

answered Apr 20, 2016 at 0:46

kindall

185k36 gold badges291 silver badges321 bronze badges

Collectives™ on Stack Overflow

regular expression for title case - Python

3 Answers 3

2 Comments

Note that `Needs_Some_Expertise` is also caught by this regex, and it has 3 words

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Note that Needs_Some_Expertise is also caught by this regex, and it has 3 words

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related

Note that `Needs_Some_Expertise` is also caught by this regex, and it has 3 words