1

I'm looking at the source code of this link as it was read by the browser. The problem is that the DOM seems to be manipulated by Javascript (For example the calendar).

How can I get the page after loading so I can access the Javascript generated calendar?

I wish to get this result

<table class="table-bordered daily>

I've tried this code with no luck

import requests
from bs4 import BeautifulSoup

page = requests.get('https://www.matchi.se/facilities/abybadminton?date=2020-10-17&sport=')
soup = BeautifulSoup(page.content, 'html.parser')


for eachRow in soup.find_all('table'):
    print(eachRow)
3
  • To get the code after JavaScript manipulation, you need to run the JavaScript interpreter. You cannot do that with request, because it just downloads data and gives it to you as such (same as what browser gets, but browser then runs the scripts). What you want is to use some library which implements the whole browser or uses an existing browser. Commented Oct 17, 2020 at 0:09
  • 1
    There might be something simpler out there, but one solution that I know of is to use Selenium. It will open up the page in Chrome, and then return the HTML to you. Selenium also allows you to interact with Chrome. Commented Oct 17, 2020 at 0:56
  • Selenium looks easier as you can then use classes to add in colour should you make it a graphic. You can reconstruct the available/booked division from one of the xhr requests the page makes but it is a bit of a faff. Commented Oct 17, 2020 at 2:09

1 Answer 1

1

The page is making request to external URL via JavaScript. You can use requests to load this information:

import re
import requests
from bs4 import BeautifulSoup

date = '2020-10-17'

main_url = 'https://www.matchi.se/facilities/abybadminton?date={date}&sport='
html_doc = requests.get(main_url.format(date=date)).text

sport_id = re.search(r"var sport = '(.*?)'", html_doc).group(1)
facility_id = re.search(r'facilityId: "(.*?)"', html_doc).group(1)

ajax_url = 'https://www.matchi.se/book/schedule'

params = {
    'wl': '',
    'facilityId': facility_id,
    'date': date,
    'sport': sport_id,
    'week': '',
    'year': ''    
}

soup = BeautifulSoup( requests.get(ajax_url, params=params).content, 'html.parser' )

# print occupied slots:
for td in soup.select('td.slot.red'):
    title = BeautifulSoup(td['title'], 'html.parser').get_text(strip=True, separator=' ')
    print(title)

Prints:

Booked Bana 1 11:00 - 12:00
Booked Bana 2 10:00 - 11:00
Booked Bana 2 11:00 - 12:00
Booked Bana 2 12:00 - 13:00
Booked Bana 3 12:00 - 13:00
Booked Bana 3 14:00 - 15:00
Booked Bana 4 11:00 - 12:00
Booked Bana 5 11:00 - 12:00
Booked Bana 5 14:00 - 15:00
Booked Bana 6 11:00 - 12:00
Booked Bana 6 12:00 - 13:00
Booked Bana 7 11:00 - 12:00
Booked Bana 7 12:00 - 13:00
Booked Bana 7 14:00 - 15:00
Booked Bana 7 15:00 - 16:00
Booked Bana 8 10:00 - 11:00
Booked Bana 9 14:00 - 15:00
Booked Bana 10 12:00 - 13:00
Booked Bana 10 15:00 - 16:00
Booked Bana 13 11:00 - 12:00
Booked Bana 14 10:00 - 11:00
Booked Bana 15 10:00 - 11:00
Booked Bana 15 18:00 - 19:00
Booked Bana 16 13:00 - 14:00
Sign up to request clarification or add additional context in comments.

2 Comments

Works great! May I ask how you recognized that there was an external request to that specific URL?
@MasterSmack I've looked to Firefox developer tools -> Network tab. There are all requests that the page is doing.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.