1

I am trying to web scrape a house listing on remax page and save that information to Pandas dataframe. But for some reason, it keeps giving me KeyError. Here is my code:

import pandas as pd
import requests
from bs4 import BeautifulSoup
url = 'https://www.remax.ca/ab/calgary-real-estate/720-37-st-nw-wp_id251536557-lst'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
detail_title = soup.find_all(class_='detail-title')
details_t = pd.DataFrame(detail_title)

Here is the error I am getting:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-6-3be49b8e4cfc> in <module>
      6 soup = BeautifulSoup(response.text, 'html.parser')
      7 detail_title = soup.find_all(class_='detail-title')
----> 8 details_t = pd.DataFrame(detail_title)

~/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
    449                 else:
    450                     mgr = init_ndarray(data, index, columns, dtype=dtype,
--> 451                                        copy=copy)
    452             else:
    453                 mgr = init_dict({}, index, columns, dtype=dtype)

~/anaconda3/lib/python3.7/site-packages/pandas/core/internals/construction.py in init_ndarray(values, index, columns, dtype, copy)
    144     # by definition an array here
    145     # the dtypes will be coerced to a single dtype
--> 146     values = prep_ndarray(values, copy=copy)
    147 
    148     if dtype is not None:

~/anaconda3/lib/python3.7/site-packages/pandas/core/internals/construction.py in prep_ndarray(values, copy)
    228         try:
    229             if is_list_like(values[0]) or hasattr(values[0], 'len'):
--> 230                 values = np.array([convert(v) for v in values])
    231             elif isinstance(values[0], np.ndarray) and values[0].ndim == 0:
    232                 # GH#21861

~/anaconda3/lib/python3.7/site-packages/bs4/element.py in __getitem__(self, key)
   1014         """tag[key] returns the value of the 'key' attribute for the tag,
   1015         and throws an exception if it's not there."""
-> 1016         return self.attrs[key]
   1017 
   1018     def __iter__(self):

KeyError: 0

Any help would be greatly appreciated!

2 Answers 2

4

You can try this. I assume that you want only the text within the <span> tags. But feel free to adapt from my worked example.

import pandas as pd
import requests
from bs4 import BeautifulSoup
url = 'https://www.remax.ca/ab/calgary-real-estate/720-37-st-nw-wp_id251536557-lst'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
detail_title = soup.find_all(class_='detail-title')

ls = []

for _ in detail_title:
  ls.append(_.text)

df = pd.DataFrame(data=ls)

print(df)

Output

                           0
0            Property Type:
1             Property Tax:
2             Last Updated:
3        Property Sub Type:
4                  MLS® #:
5           Ownership-Type:
6               Year Built:
7                     sqft:
8              Date Listed:
9                 Lot Size:
10               Occupancy:
11             Subdivision:
12                 Heating:
13          Heating Source:
14          Full Bathrooms:
15          Half Bathrooms:
16                   Rooms:
17                Basement:
18    Basement Development:
19                Flooring:
20          Parking Spaces:
21                 Parking:
22                    Area:
23                Exterior:
24              Foundation:
25                    Roof:
26                   Faces:
27  Miscellaneous Features:
28         Lot Description:
29                   Condo:
30                Board ID:
31                   Suite:
32                Features:

Edit: print(type(detail_title)) gives <class 'bs4.element.ResultSet'>, it is not an accepted data type. From https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html

data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame

Sign up to request clarification or add additional context in comments.

Comments

2

detail_title does not contain something you can put in a dataframe: it's a list of BeautifulSoup "bs4.element.Tag" objects (see what type(detail_title[0]) gives you). Try the following:

Step 1. Extract the column headings

import pandas as pd
import requests
from bs4 import BeautifulSoup
url = 'https://www.remax.ca/ab/calgary-real-estate/720-37-st-nw-wp_id251536557-lst'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
detail_title = soup.find_all(class_='detail-title')

headings = [d.text for d in detail_title]
details_t = pd.DataFrame(columns = headings)

Step 2. Go up one level in the html and get the pairs of detail names and values. (The detail names are what you have already extracted in step 1). Write a helper function to return the value given a name.

details = soup.find_all(class_='detail-row ng-star-inserted')
def get_detail_value(detail_title, details): 
    return [(d.find(class_='detail-value')).text for d in details if (d.find(class_='detail-title')).text == detail_title]

This is a bit odd to do if you're only scraping 1 page. I think what you will want to do is run step 1 once to get the detail names, then step 2 on al pages you want to scrape.

Step 3. For each page you scrape, append the found values of the details to the dataframe.

details_t = details_t.append({deet:get_detail_value(deet, details) for deet in details_t.columns}, ignore_index = True)

3 Comments

Thanks, this was really helpful!
@SushantDeshpande a gentle tip for you as a new user: stack overflow etiquette is to upvote all answers you found helpful and to put the green tick next to the one which most closely answered your question.
@butterflyknife Sushant Deshpande hasn't unlocked the voting option yet. I will help you. +1 from me.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.