1

In my python script, I have a list of strings like,

birth_year = ["my birth year is *","i born in *","i was born in *"]

I want to compare one input sentence with the above list and need a birth year as output.

The input sentence is like:

Example1: My birth year is 1994.
Example2: I born in 1995

The output will be:

Example1: 1994
Example2: 1995

I applied many approaches by using regex. But I didn't find a perfect solution for the same.

1
  • if its like extracting just numerals, you can use re.findall(r'(\d+)',val)[0] Commented Mar 13, 2020 at 5:21

3 Answers 3

2

If you change birth_year to a list of regexes you could match more easily with your input string. Use a capturing group for the year.

Here's a function that does what you want:

def match_year(birth_year, input):  
    for s in birth_year:
        m = re.search(s, input, re.IGNORECASE)
        if m:
            output = f'{input[:m.start(0)]}{m[1]}'
            print(output)
            break

Example:

birth_year = ["my birth year is (\d{4})","i born in (\d{4})","i was born in (\d{4})"]

match_year(birth_year, "Example1: My birth year is 1994.")
match_year(birth_year, "Example2: I born in 1995")

Output:

Example1: 1994
Example2: 1995

You need at least Python 3.6 for f-strings.

Sign up to request clarification or add additional context in comments.

1 Comment

@Hiral Can you give me an example? A year is not a float value. Maybe you should ask a new question?
1
str1=My birth year is 1994.
str2=str1.replace('My birth year is ','')

You can try something like this and replace the unnecessary string with empty string.

For the code you shared, you can do something like :

for x in examples:
   for y in birth_year:
      if x.find(y)==1: #checking if the substring exists in example
         x.replace(y,'') #if it exists we replace it with empty string 

I think the above code might work

Comments

1

If you can guaranty those "strings like" always contain one 4 digits number, which is a year of birth, somewhere in there... i'd say just use regex to get whatever 4 digits in there surrounded by non-digits. Rather dumb, but hey, works with your data.

import re

examples = ["My birth year is 1993.", "I born in 1995", "я родился в 1976м году"]
for str in examples:
    y = int(re.findall(r"^[^\d]*([\d]{4})[^\d]*$", str)[0])
    print(y)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.