Python Selenium accessing HTML source

Question

How can I get the HTML source in a variable using the Selenium module with Python?

I wanted to do something like this:

from selenium import webdriver

browser = webdriver.Firefox()
browser.get("http://example.com")
if "whatever" in html_source:
    # Do something
else:
    # Do something else

How can I do this? I don't know how to access the HTML source.

Write following line before if condition: html_source = browser.page_source — Abdul Majeed
– Abdul Majeed, Commented Oct 23, 2014 at 13:21

AutomatedTester · Accepted Answer · 2020-04-17 11:44:53Z

267

You need to access the page_source property:

from selenium import webdriver

browser = webdriver.Firefox()
browser.get("http://example.com")

html_source = browser.page_source
if "whatever" in html_source:
    # do something
else:
    # do something else

edited Apr 17, 2020 at 11:44

user3064538

answered Oct 23, 2011 at 15:08

AutomatedTester

22.5k7 gold badges51 silver badges64 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

5agado Over a year ago

Best answer so far! The most immediate and clear way to do this, much more compact that the other, still valid, alternative (find_element_by_xpath("//*").get_attribute("outerHTML")(

Yogeesh Seralathan Over a year ago

What if we need to get page source after all the javascript executes.?

TheRookierLearner Over a year ago

Works only if the page has completely loaded. If the page loads indefinitely this property doesn't work.

Mobin Al Hassan · Accepted Answer · 2020-05-16 11:12:10Z

19

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Chrome()
html_source_code = driver.execute_script("return document.body.innerHTML;")
html_soup: BeautifulSoup = BeautifulSoup(html_source_code, 'html.parser')

Now you can apply BeautifulSoup function to extract data...

answered May 16, 2020 at 11:12

Mobin Al Hassan

1,07414 silver badges23 bronze badges

Comments

Dhiraj · Accepted Answer · 2018-11-20 07:23:17Z

8

driver.page_source will help you get the page source code. You can check if the text is present in the page source or not.

from selenium import webdriver
driver = webdriver.Firefox()
driver.get("some url")
if "your text here" in driver.page_source:
    print('Found it!')
else:
    print('Did not find it.')

If you want to store the page source in a variable, add below line after driver.get:

var_pgsource=driver.page_source

and change the if condition to:

if "your text here" in var_pgsource:

edited Nov 20, 2018 at 7:23

answered Nov 19, 2018 at 14:54

Dhiraj

5171 gold badge11 silver badges21 bronze badges

1 Comment

Nic3500 Over a year ago

While this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value.

Milanka · Accepted Answer · 2013-02-19 13:23:36Z

5

With Selenium2Library you can use get_source()

import Selenium2Library
s = Selenium2Library.Selenium2Library()
s.open_browser("localhost:7080", "firefox")
source = s.get_source()

answered Feb 19, 2013 at 13:23

Milanka

1,87221 silver badges17 bronze badges

1 Comment

JohnDotOwl Over a year ago

Can I set a delay and get the latest source? There are dynamic contents loaded using javascript.

Asclepius · Accepted Answer · 2018-09-29 18:42:05Z

3

By using the page source you will get the whole HTML code.
So first decide the block of code or tag in which you require to retrieve the data or to click the element..

options = driver.find_elements_by_name_("XXX")
for option in options:
    if option.text == "XXXXXX":
        print(option.text)
        option.click()

You can find the elements by name, XPath, id, link and CSS path.

edited Sep 29, 2018 at 18:42

Asclepius

64.6k20 gold badges188 silver badges164 bronze badges

answered Dec 16, 2013 at 11:18

Mahesh Reddy Atla

5295 silver badges12 bronze badges

Comments

Peter Mortensen · Accepted Answer · 2013-04-20 09:21:11Z

1

To answer your question about getting the URL to use for urllib, just execute this JavaScript code:

url = browser.execute_script("return window.location;")

edited Apr 20, 2013 at 9:21

Peter Mortensen

31.4k22 gold badges110 silver badges134 bronze badges

answered Oct 25, 2011 at 21:29

Bob Evans

6166 silver badges18 bronze badges

Comments

SysMurff · Accepted Answer · 2019-10-10 17:23:37Z

1

You can simply use the WebDriver object, and access to the page source code via its @property field page_source...

Try this code snippet :-)

from selenium import webdriver
driver = webdriver.Firefox('path/to/executable')
driver.get('https://some-domain.com')
source = driver.page_source
if 'stuff' in source:
    print('found...')
else:
    print('not in source...')

answered Oct 10, 2019 at 17:23

SysMurff

1222 silver badges14 bronze badges

1 Comment

Roman-Stop RU aggression in UA Over a year ago

how does this answer differs from stackoverflow.com/a/7866938/2231972 ?

score 1 · Accepted Answer · 2024-12-13 16:39:38Z

1

Complete code:

from selenium import webdriver

# Initialize the WebDriver
driver = webdriver.Chrome()  # Use the appropriate WebDriver for your browser

# Navigate to the desired URL
driver.get("https://www.example.com/")

# Access the page's HTML source
html_source = driver.page_source

if "whatever" in html_source:
   # do something
else:
   # do something else

# if you want to display complete source code.
print(html_source)

# Close the WebDriver
driver.quit()

edited Dec 13, 2024 at 16:39

answered Dec 11, 2024 at 15:30

user11555371

Comments

Peter Mortensen · Accepted Answer · 2013-04-20 09:20:38Z

-8

I'd recommend getting the source with urllib and, if you're going to parse, use something like Beautiful Soup.

import urllib

url = urllib.urlopen("http://example.com") # Open the URL.
content = url.readlines() # Read the source and save it to a variable.

edited Apr 20, 2013 at 9:20

Peter Mortensen

31.4k22 gold badges110 silver badges134 bronze badges

answered Oct 22, 2011 at 18:42

Griffin

6426 silver badges18 bronze badges

5 Comments

user1008791 Over a year ago

Okay then do you know how I can get the URL within Selenium? I want to store the URL in a variable so I can access it with urllib.

Griffin Over a year ago

@user1008791 Does it matter? You're apparently letting the user type it in anyway using raw_input, just do the same but with urllib.

user1008791 Over a year ago

That was just to make an easy example, the URL will be changing a lot.

mpenkov Over a year ago

Selenium does many things that urllib doesn't (e.g. execution of JavaScript).

Dave Over a year ago

Using the urllib here is pointless, why? AutomatedTester has it correct, it is what I do for scanning through HTML source to make sure we don't push development environment code.

Collectives™ on Stack Overflow

Python Selenium accessing HTML source

9 Answers 9

3 Comments

Comments

1 Comment

1 Comment

Comments

Comments

1 Comment

Comments

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

9 Answers 9

3 Comments

Comments

1 Comment

1 Comment

Comments

Comments

1 Comment

Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related