1

I'm trying to write a script for scraping a website with Python and BeautifulSoup, and then write the data into and excel sheet.

It works up until the writing section, then I get a NotImplementedError? I looked it up, and surrounded the write section of the code with TRY: and Pass: blocks....It solved the error in the Python interpreter console window, but my excel sheet was blank.

Here is what I have so far:

import requests, openpyxl
from bs4 import BeautifulSoup

wb = openpyxl.Workbook('RDWM_CRM.xls')
wb.create_sheet('Phone')
sheet = wb.get_sheet_by_name('Phone')

# nav to webpage I want to scrape
url = "http://www.yellowpages.com/search?search_terms=roofing%20company&geo_location_terms=New%20York%2C%20NY&page=2"
r = requests.get(url)
soup = BeautifulSoup(r.content)

# for loop finds info then prints
for div in soup.find_all("div", {"class": "info"}):
    print (div.contents[0].text)
    print (div.contents[1].text)            

# for loop finds info then writes to excel cells
for div in soup.find_all("div", {"class": "info"}):
    sheet['A1'] = div.contents[0].text
    sheet['B1'] = div.contents[1].text

wb.save('RDWM_CRM.xls')

Like I said above, even with no errors I was getting a blank excel sheet. here is the traceback as it is seen in the console:

Neptune Construction
Serving the New York Area.(866) 664-1759
>>> # for loop finds info then writes to excel cells
... for div in soup.find_all("div", {"class": "info"}):
...     sheet['A1'] = div.contents[0].text
...     sheet['B1'] = div.contents[1].text
...
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
File "C:\Users\Josh\AppData\Local\Programs\Python\Python35\lib\site-packages\openpyxl\writer\write_only.py", line 223, in removed_method
raise NotImplementedError
NotImplementedError
>>> wb.save('RDWM_CRM.xls')

this is the last piece of data as well as the error.





Thanks for the help!! I'm still running into the excel sheet being blank...here is the code I'm using, there are no errors....just a blank excel sheet. It creates the new sheet named Phone, it's just blank...

import requests
from bs4 import BeautifulSoup
from openpyxl import Workbook
url = "http://www.yellowpages.com/search?search_terms=roofing%20company&geo_location_terms=Seattle%2C%20WA&page=4" # nav to webpage I want to scrape
r = requests.get(url)
soup = BeautifulSoup(r.content)

# create a dummy list of texts to write to excel file
divs = []

wb = Workbook() # open new workbook, use load_workbook if existing
ws = wb.create_sheet('Phone')
for div in divs:
    row = [div.contents[0].text, div.contents[1].text]  # construct a row: shown only for example purposes
    ws.append(row)          # could use ws.append(div) since each div is a list 

wb.save('RDWM_CRM.xlsx')     # save workbook, will overwrite if exists

Any help is appreciated!!

5
  • Please include the traceback, did the error happen in the wb.save? Commented Dec 22, 2015 at 23:56
  • no in the second for loop, the one that should print. Commented Dec 23, 2015 at 0:04
  • Traceback (most recent call last): File "<stdin>", line 3, in <module> File "C:\Users\Josh\AppData\Local\Programs\Python\Python35\lib\site-packages\o penpyxl\writer\write_only.py", line 223, in removed_method raise NotImplementedError NotImplementedError >>> wb.save('RDWM_CRM.xls') Commented Dec 23, 2015 at 0:06
  • @user3429394 please edit your question and put the full text of the traceback there. Commented Dec 23, 2015 at 0:09
  • No where in second script do you run soup.find_all(). Commented Dec 23, 2015 at 14:38

1 Answer 1

2

Apologies in advance if I don't completely understand your question, but there appears to be some issues with the use of openpyxl.

Here is an example case of how to write worksheets using openpyxl that may be helpful:

from openpyxl import Workbook

# create a dummy list of texts to write to excel file
divs = [[chr(i)*8, chr(i+1)*8] for i in range(65, 75, 1)]

wb = Workbook()             # open new workbook, use load_workbook if existing
ws = wb.create_sheet(title="Example")
for div in divs:
    row = [div[0], div[1]]  # construct a row: shown only for example purposes
    ws.append(row)          # could use ws.append(div) since each div is a list 
wb.save('example.xlsx')     # save workbook, will overwrite if exists

The dummy list divs looks like this:

[['AAAAAAAA', 'BBBBBBBB'],
 ['BBBBBBBB', 'CCCCCCCC'],
 ['CCCCCCCC', 'DDDDDDDD'],
 ['DDDDDDDD', 'EEEEEEEE'],
 ['EEEEEEEE', 'FFFFFFFF'],
 ['FFFFFFFF', 'GGGGGGGG'],
 ['GGGGGGGG', 'HHHHHHHH'],
 ['HHHHHHHH', 'IIIIIIII'],
 ['IIIIIIII', 'JJJJJJJJ'],
 ['JJJJJJJJ', 'KKKKKKKK']]

And the excel file 'example.xlsx' has this worksheet 'example':

   A        B
1  AAAAAAAA BBBBBBBB
2  BBBBBBBB CCCCCCCC
3  CCCCCCCC DDDDDDDD
4  DDDDDDDD EEEEEEEE
5  EEEEEEEE FFFFFFFF
6  FFFFFFFF GGGGGGGG
7  GGGGGGGG HHHHHHHH
8  HHHHHHHH IIIIIIII
9  IIIIIIII JJJJJJJJ
10 JJJJJJJJ KKKKKKKK

You would construct a row something like this:

row = [div.contents[0].text, div.contents[1].text]

assuming that div.contents is correct. Hope this helps. PS. I am using openpyxl version 2.3.0

Sign up to request clarification or add additional context in comments.

4 Comments

Thank you for your help!
I'm still running into the excel sheet being blank, here is my revised code:
Were you able to copy the code I posted, run it on your system and output the excel file example.xlsx?
no... this is what I get when I copy and paste your code:

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.