0

I wrote the following line of code

#!/usr/bin/python
#weather.scraper

from bs4 import BeautifulSoup
import urllib

def main():
    """weather scraper"""
    r = urllib.urlopen("https://www.wunderground.com/history/airport/KPHL/2016/1/1/MonthlyHistory.html?&reqdb.zip=&reqdb.magic=&reqdb.wmo=&MR=1").read()
    soup = BeautifulSoup(r, "html.parser")
    table = soup.find_all("table", class_="responsive airport-history-summary-table")
    tr = soup.find_all("tr")
    td = soup.find_all("td")
    print table
            

if __name__ == "__main__":
    main()

When I print the table i get all the html (td, tr, span, etc.) as well. How can I print the content of the table (tr, td) without the html?
THANKS!

1 Answer 1

2

You have to use .getText() method when you want to get a content. Since find_all returns a list of elements, you have to choose one of them (td[0]).

Or you can do for example:

for tr in soup.find_all("tr"):
    print '>>>> NEW row <<<<'
    print '|'.join([x.getText() for x in tr.find_all('td')])

The loop above prints for each row cell next to cell.

Note that you do find all td's and all tr's your way but you probably want to get just those in table.

If you want to look for elements inside the table, you have to do this:

table.find('tr') instead of soup.find('tr) so the BeautifulSoup will be looking for trs in the table instead of whole html.

YOUR CODE MODIFIED (according to your comment that there are more tables):

#!/usr/bin/python
#weather.scraper

from bs4 import BeautifulSoup
import urllib

def main():
    """weather scraper"""
    r = urllib.urlopen("https://www.wunderground.com/history/airport/KPHL/2016/1/1/MonthlyHistory.html?&reqdb.zip=&reqdb.magic=&reqdb.wmo=&MR=1").read()
    soup = BeautifulSoup(r, "html.parser")
    tables = soup.find_all("table")

    for table in tables:
        print '>>>>>>> NEW TABLE <<<<<<<<<'

        trs = table.find_all("tr")

        for tr in trs:
            # for each row of current table, write it using | between cells
            print '|'.join([x.get_text().replace('\n','') for x in tr.find_all('td')])



if __name__ == "__main__":
    main()
Sign up to request clarification or add additional context in comments.

5 Comments

Indeed you are write about tds in the table. but if I try tr = table.find('tr')` I get the following error: AttributeError: 'ResultSet' object has no attribute 'find'
Because it's a list. If you have just one table in html, you have to do soup.find('table'.... instead of soup.find_all('table....
But in the complete html there are more tables, I am specifying my search by stating a class. I am not quite sure what you mean...
@malina Check my answer, I've edited there your code.
Now it's for all tables in the web page.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.