Extract/decode CSS from HTML into Python

Question

Good afternoon all.

I am currently parsing this website : http://uk.easyroommate.com/results-room/loc/981238/pag/1 .

I want to get the listing of every url of each adverts. However this listing is coded with JavaScript. I can perfectly see them via the Firefox firebug, but I have not find any way to get them via Python. I think it is doable but I don' t know how.

EDIT : Obviously I have tried with module like BeautifulSoup but as it is a JavaScript generated page, it is totally useless.

Thank you in advance for your help.

Welcome to Stackoverlfow! You will greatly increase your chances of getting an answer for your question if you include your input, what you have tried, your expected output vs. your actual output and the full stack trace of any errors you receive. You can also read this guide — kylieCatt
– kylieCatt, Commented Jun 23, 2015 at 16:38
Thank you for reading my question and for the guide. However, I am facing a completely new problem now and I expect some leads or q's ... which I won' t find in the guide. Thanks anyway. — Dirty_Fox
– Dirty_Fox, Commented Jun 23, 2015 at 16:42
Until your question is improved it will be very difficult to help you. What is dvert? How is it coded with CSS? CSS is not a programming language and it's very unlikely the content is added via CSS. What do you want to do with this data once you have it? What format do you need it in?We are not as familiar with you problem as you are and we all the details before we can help you. — kylieCatt
– kylieCatt, Commented Jun 23, 2015 at 16:57
Thanks. Apologies for not being as clear as I should. All the adverts urls are given through a piece of code that is most lkely javascript or CSS ( I am not an expert in programming especially not in website). I need a module/key/trick that could extract those urls, so I could then use them via urllib and BeautifulSoup and access all the information on the webpage describing every adverts individually. But first I need those urls from the "front page". I just need them in a unicode variable. Then I'll make my way through it. Could you help me better now? Thanks! — Dirty_Fox
– Dirty_Fox, Commented Jun 23, 2015 at 17:06

Community · Accepted Answer · 2017-05-23 10:26:43Z

0

Ads listing is generated by JavaScript. BeautifulSoup gives you this for example:

<ul class="search-results" data-bind="template: { name: 'room-template', foreach: $root.resultsViewModel.Results, as: 'resultItem' }"></ul>

I would suggest looking at: Getting html source when some html is generated by javascript and Python Scraping JavaScript using Selenium and Beautiful Soup.

edited May 23, 2017 at 10:26

CommunityBot

11 silver badge

answered Jun 23, 2015 at 17:10

Dušan Maďar

10k6 gold badges58 silver badges72 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Dirty_Fox Over a year ago

Thanks. I ll have a look into it.

Dirty_Fox · Accepted Answer · 2015-06-24 10:22:51Z

Thanks to your lead here is the solution and I hope it will help someone one day :

from selenium import webdriver  
from bs4 import BeautifulSoup

browser = webdriver.Firefox()  
browser.get('http://uk.easyroommate.com/results-room/loc/981238/pag/1')  
html_source = browser.page_source  
browser.quit()

soup = BeautifulSoup(html_source,'html.parser')  
print soup.prettify()
## You are now able to see the HTML generated by javascript code and you 
## can extract it as usual using BeautifulSoup

for el in soup.findAll('div', class_="listing-meta listing-meta--small"):
    print el.find('a').get('href')

Again in my case I just wanted to extract those links, but once you have got the web page source code via Selenium, it is a piece of cake to use beautifulSoup and get every item you want.

Collectives™ on Stack Overflow

Extract/decode CSS from HTML into Python

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related