0
website_list = [
    'https://www.zillow.com/62347390?location=Chicago%2N%23253',
    'https://www.zillow.com/82983250?location=Boston%3B%53324',
    'https://www.zillow.com/12917837?location=Miami%7K%26345',
]

How does one create a python function (e.g. city_finder()) such that we get the following output when given website_list as input?

>>> city_finder(website_list)
['Chicago', 'Boston', 'Miami']
1
  • 1
    You could use a simple regular expression like location=([^%]+) and grab the first group, see regex101.com/r/aSJxn7/1 Commented Feb 18, 2018 at 6:48

4 Answers 4

3

The previous answers assume that the format of the URLs will not change. Using regular expressions does not account for unexpected URL forms.

To handle changes in the URL format, use the urllib.parse module, whose documentation is here.

Namely, use the urlparse() function, which can parse a URL into its components. The component you want is the "query component," which is exposed by urlparse() as a dictionary. The value associated with the location key will be a list containing, for example, 'Chicago%2N%23253'. Finally, extract the substring before the first %.

Here's a code snippet:

from urllib.parse import urlparse, parse_qs

def city_finder(links)
    cities = []
    for url in links:
        query = parse_qs(urlparse(url).query)
        cities.append(query['location'][0].split('%')[0])
    return cities
Sign up to request clarification or add additional context in comments.

Comments

0

You can use str.find() to find the index location of "location=" and of the "%" following the name of the city. Use a list compehension to loop through the url list:

def city_finder(website_list)
    return [site[site.find("location=")+9:site.find("%")] for site in website_list]

Comments

0

Use re module to find word following location= from each item in website_list. Use append to add retrieved location to city list and return it.

import re
website_list = ['https://www.zillow.com/62347390?location=Chicago%2N%23253', 'https://www.zillow.com/82983250?location=Boston%3B%53324', 'https://www.zillow.com/12917837?location=Miami%7K%26345']
regexp = re.compile("location=(.*)%")
city = []
def city_finder(website_list):
    for lists in website_list:
        city.append((regexp.search(lists).group(1).split('%')[0]))
    return(city)
print city_finder(website_list)

Outputs:  

['Chicago', 'Boston', 'Miami']

Comments

0

As per my comment, you could use

import re

website_list = [
    'https://www.zillow.com/62347390?location=Chicago%2N%23253',
    'https://www.zillow.com/82983250?location=Boston%3B%53324',
    'https://www.zillow.com/12917837?location=Miami%7K%26345',
]

def city_finder(lst=None):
    rx = re.compile(r'location=([^%]+)')
    return [city.group(1) 
            for item in lst 
            for city in [rx.search(item)]
            if city]

print(city_finder(website_list))

Which yields

['Chicago', 'Boston', 'Miami']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.