First of all, your code won't work due to several reasons:
- Importing wikipedia will only work with first lowercase letter
import wikipedia
summary method accepts strings (in your case names), so you would have to call it for every name in a set
All of this aside, let's try to achieve what you're trying to do:
import wikipedia as wp
import re
# First thing we see (at least for pages provided) is that dates all share the same format:
# For those who are no longer with us 31 October 1950 – 31 March 2016
# For those who are still alive 17 November 1944
# So we have to build regex patterns to find those
# First is the months pattern, since it's quite a big one
MONTHS_PATTERN = r"January|February|March|April|May|June|July|August|September|October|November|December"
# Next we build our date pattern, double curly braces are used for literal text
DATE_PATTERN = re.compile(fr"\d{{1,2}}\s({MONTHS_PATTERN})\s\d{{,4}}")
# Declare our set of names, great choice of architects BTW :)
names = ('Zaha Hadid', 'Rem Koolhaas')
# Since we're trying to get birthdays and dates of death, we will create a dictionary for storing values
lifespans = {}
# Iterate over them in a loop
for name in names:
lifespan = {'birthday': None, 'deathday': None}
try:
summary = wp.summary(name)
# First we find the first date in summary, since it's most likely to be the birthday
first_date = DATE_PATTERN.search(summary)
if first_date:
# If we've found a date – suppose it's birthday
bday = first_date.group()
lifespan['birthday'] = bday
# Let's check whether the person is no longer with us
LIFESPAN_PATTERN = re.compile(fr"{bday}\s–\s{DATE_PATTERN.pattern}")
lifespan_found = LIFESPAN_PATTERN.search(summary)
if lifespan_found:
lifespan['deathday'] = lifespan_found.group().replace(f"{bday} – ", '')
lifespans[name] = lifespan
else:
print(f'No dates were found for {name}')
except wp.exceptions.PageError:
# Handle not found page, so that code won't break
print(f'{name} was not found on Wikipedia')
pass
# Print result
print(lifespans)
Output for provided names:
{'Zaha Hadid': {'birthday': '31 October 1950', 'deathday': '31 March 2016'}, 'Rem Koolhaas': {'birthday': '17 November 1944', 'deathday': None}}
This approach is inefficient and has many flaws, like if we get a page with dates fitting our regular expression, yet not being birthday and death day. It's quite ugly (even though I've tried my best :) ) and you'd be better off parsing tags.
If you're not happy with date format from Wikipedia, I suggest you look into datetime. Also, consider that those regular expressions fit those two specific pages, I did not conduct any research on how dates might be represented in Wikipedia. So, if there are any inconsistencies, I suggest you stick with parsing tags.