2

I'm just starting with python regular expressions. I read many topics but can't adapt solutions to my problem.

I've got a file like this one :

**** FILE.NAME ***
Fisrt sentence
    blablabla
    blablabla
    blablabla
    blablabla

Second sentence
    blablabla
    blablabla
    blablabla
    blablabla

I'm looking for a regex expression to extract several text blocks of my file :

Fisrt sentence
    blablabla
    blablabla
    blablabla
    blablabla

Then :

Second sentence
    blablabla
    blablabla
    blablabla
    blablabla

with a separation of sentences and blabla blocks. I tried something like this but not working :

^(\w+[^\n]*?)(.*)\n{2}

2 Answers 2

3

General rule of thumb: Don't use re when str methods suffice.

In this case you can call the .split() method on multiple newlines:

s.split('\n\n')

returns

['Fisrt sentence\n    blablabla\n    blablabla\n    blablabla\n    blablabla',
 'Second sentence\n    blablabla\n    blablabla\n    blablabla\n    blablabla']
Sign up to request clarification or add additional context in comments.

Comments

0

You may use

re.findall(r'^(\w.*)\n([\s\S]*?)(?:\n{2,}|\Z)', text, re.M)

See the regex demo.

The pattern matches:

  • ^ - start of a line (due to re.M, the ^ matches line start positions)
  • (\w.*) - Group 1: aword char followed with any 0+ chars other than line break chars
  • \n - a newline
  • ([\s\S]*?) - Group 2: any 0+ chars, as few as possible
  • (?:\n{2,}|\Z) - either two or more newlines (\n{2,}) or (|) end of string (\Z).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.