2

I have a string:

05-01-2015 12:27 - KH - (KH) Igangværende - Opringning - 13-11 00:00 Fangede RLI på hans mobil. Ring igen kl. 15 19-11-2014 11:17 - KH - (KH) Igangværende - Opringning - 13-11 00:00 Gik på svarer igen og lagt besked til RLI at ringe tilbage. 12-11-2014 09:38 - KH - (KH) Igangværende - Opringning - 13-11 00:00 12-11-2014 09:32 - KH - (KH) Igangværende - Opringning - 15-10 00:00 Forsøgt RLI igen og lagt besked om han vil ringe. 14-10-2014 13:14 - KH - (KH) Igangværende - Opringning - 15-10 00:00 14-10-2014 13:10 - KH - (KH) Igangværende - Opringning - 14-10 00:00 Lagt besked til RLI at ringe 14-10-2014 13:06 - KH - (KH) Igangværende - Opringning - 14-10 00:00 test

I need to parse this string into pieces so that each piece starts with dates. For this purpose, I tried to benefit from regex like :

match = re.search(r'\d{2}-\d{2}-\d{4}', text)

But this code only finds dates. And I cant go further. I need to have pieces such as:

first_piece: 05-01-2015 12:27 - KH - (KH) Igangværende - Opringning - 13-11 00:00 Fangede RLI på hans mobil. Ring igen kl. 15

second_piece: 19-11-2014 11:17 - KH - (KH) Igangværende - Opringning - 13-11 00:00 Gik på svarer igen og lagt besked til RLI at ringe tilbage.

and so on.

Could you please give me some insights about achieving these sub strings?

Thanks in advance.

1
  • parsedatetime.Calendar().nlp(text) fails in this case. Commented Jun 23, 2015 at 23:02

3 Answers 3

5

Does this work?

re.split(r' (?=\d{2}-\d{2}-\d{4})', text)
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks marcus. İts great
When i get the text from excel cell with xlrd or other, I cant parse the text. It comes as a whole. What can it cause that? I even tried encode/decode things. I mean, in above case, match[0] gives whole text, and match[1] or others dont exist. In normal case that i put text directly into this code, there is no problem.
2

Marcus has the right answer but there's a fun little detail that's missing from their answer.

Test file multiple_dates.py

import re

test_string = u"05-01-2015 12:27 - KH - (KH) Igangværende - Opringning - 13-11 00:00 Fangede RLI på hans mobil. Ring igen kl. 15 19-11-2014 11:17 - KH - (KH) Igangværende - Opringning - 13-11 00:00 Gik på svarer igen og lagt besked til RLI at ringe tilbage. 12-11-2014 09:38 - KH - (KH) Igangværende - Opringning - 13-11 00:00 12-11-2014 09:32 - KH - (KH) Igangværende - Opringning - 15-10 00:00 Forsøgt RLI igen og lagt besked om han vil ringe. 14-10-2014 13:14 - KH - (KH) Igangværende - Opringning - 15-10 00:00 14-10-2014 13:10 - KH - (KH) Igangværende - Opringning - 14-10 00:00 Lagt besked til RLI at ringe 14-10-2014 13:06 - KH - (KH) Igangværende - Opringning - 14-10 00:00 test"

groups = re.split(r' (?=\d{2}-\d{2}-\d{4})', test_string)

for group in groups:
    print(group)

If I run the given example in python2.7 I get

 python multipe_dates.py 
  File "multipe_dates.py", line 3
SyntaxError: Non-ASCII character '\xc3' in file multipe_dates.py on line 3, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

If I run this with python3 it works by default

python3 multipe_dates.py 
05-01-2015 12:27 - KH - (KH) Igangværende - Opringning - 13-11 00:00 Fangede RLI på hans mobil. Ring igen kl. 15
19-11-2014 11:17 - KH - (KH) Igangværende - Opringning - 13-11 00:00 Gik på svarer igen og lagt besked til RLI at ringe tilbage.
12-11-2014 09:38 - KH - (KH) Igangværende - Opringning - 13-11 00:00
12-11-2014 09:32 - KH - (KH) Igangværende - Opringning - 15-10 00:00 Forsøgt RLI igen og lagt besked om han vil ringe.
14-10-2014 13:14 - KH - (KH) Igangværende - Opringning - 15-10 00:00
14-10-2014 13:10 - KH - (KH) Igangværende - Opringning - 14-10 00:00 Lagt besked til RLI at ringe
14-10-2014 13:06 - KH - (KH) Igangværende - Opringning - 14-10 00:00 test

If you add

# -*- coding: utf-8 -*- 

to the top of your py file it'll work in python2

1 Comment

Yes. When Idle suggested to add that , i edited my code. Thanks.
1

you could use this pattern

(\d\d-\d\d-\d{4}.*?)(?=\d\d-\d\d-\d{4}|$)

Demo

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.