Retrieving JS script from a website using Python3

Question

I would like to scrape a website for its "raw" JavaScript code. For example, if I were to scrape this website. I would get a string containing:

This is just a small portion of the existing JS in the given link, but I would like to obtain the entire JS in a string or array of strings.

I have tried different approaches to obtain this data: using requests and selenium. Simply loading the HTML of the website doesn't seem to work, as the script tags don't seem to load.

Using selenium, I hoped this would work:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

url = "https://www.udemy.com"

driver = webdriver.Chrome()
driver.get(url)

wait = ui.WebDriverWait(driver, 10) 
results = wait.until(EC.visibility_of_all_elements_located((By.TAG_NAME, "script")))

print(results)

Then using results I could get a string, but it doesn't work.

Another example for the JS Scripts chunks I'd like to get:

The red rectangle indicates JS Scripts, as you can see there is a lot of it and I would like to get it in its "raw" form (not execute it).

My question is: How would I get the "raw" JS script in a string format? and what is the most efficient way (time-wise) to perform this?

undetected Selenium · Accepted Answer · 2020-01-20 14:17:55Z

1

You are looking for .get_attribute('innerHTML'). You also do not want to use visibility_of_all_elements_located since you are looking for something that will not ever be visible.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

url = "https://www.udemy.com"
driver = webdriver.Chrome()
driver.get(url)

#wait = ui.WebDriverWait(driver, 10) 
#results = wait.until(EC.visibility_of_all_elements_located((By.TAG_NAME, "script")))

wait = WebDriverWait(driver, 10)
script_tag = wait.until(EC.presence_of_all_elements_located((By.XPATH, "//script")))
innerHTML_of_script_tag = []
for script in script_tag:
    innerHTML_of_script_tag.append(script.get_attribute('innerHTML'))
    print(script.get_attribute('innerHTML'))
    print("################################################################")

print("---------------------------------------------------------------------")
print("---------------------------------------------------------------------")
print(innerHTML_of_script_tag)

edited Jan 20, 2020 at 14:17

undetected Selenium

194k44 gold badges304 silver badges387 bronze badges

answered Jan 20, 2020 at 1:27

Jortega

3,8081 gold badge22 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Omer Hen Over a year ago

Thanks for answering! This code doesn't seem to get the entire JS script found in the website. Though it is able to obtain some script, some larger sections of it (as shown in the image in the question itself) are missing and are empty in innerHTML_of_script_tag as well as in the print statement. It's those parts that I am having trouble retrieving.

Jortega Over a year ago

@Omer Hen When I go to udemy.com I am not seeing that large block of javascript from your screenshot. Are you interacting with the page in any way before you are seeing this block of java script code?

Omer Hen Over a year ago

I added another image showing the large chunks of JS script. The given code doesn't really seem to capture all that. Apart from that, I simply load the website and go to the "inspect" option given by Chrome to view the source code. I don't perform any special operations or interact with the page.

Jortega Over a year ago

@Omer Hen Small update, this is interesting I can see the block you are looking for when I print the driver.page_source to a text file but not in the attribute innerHTML.

Omer Hen Over a year ago

That's what I'm trying to find a solution to. Also, when I printed driver.page_source I didn't see the large chunks of JS script, but only a few smaller chunks.

|

Collectives™ on Stack Overflow

Retrieving JS script from a website using Python3

1 Answer 1

8 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Related