All methods here did not work quite well with some websites. The paragraphs that are generated by the JS code were resistant to all the above. Here is what eventually worked for me inspired by this answer and this.
The idea is to load the page in webdriver and scroll to the end of the page to make JS do its thing to generate/load the rest of the page. Then insert keystroke commands to select all copy/paste the whole page:
import selenium
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import pyperclip
import time
driver = webdriver.Chrome()
driver.get("https://www.lazada.com.ph/products/nike-womens-revolution-5-running-shoes-black-i1262506154-s4552606107.html?spm=a2o4l.seller.list.3.6f5d7b6cHO8G2Y&mp=1&freeshipping=1")
# Scroll down to end of the page to let all javascript code load its content
lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;")
match=False
while(match==False):
lastCount = lenOfPage
time.sleep(1)
lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;")
if lastCount==lenOfPage:
match=True
# copy from the webpage
element = driver.find_element_by_tag_name('body')
element.send_keys(Keys.CONTROL,'a')
element.send_keys(Keys.CONTROL,'c')
alltext = pyperclip.paste()
alltext = alltext.replace("\n", " ").replace("\r", " ") # cleaning the copied text
print(alltext )
It is slow. But nothing else did work out.
UPDATE: A better method is to load the source of the page AFTER scrolling to the end of the page using inscriptis library:
from inscriptis import get_text
text = get_text(driver.page_source)
Still will not work with a headless driver (page detects somehow that it is not shown by real and scroll to end will not make JS code loading its thing), but at least we don't need the crazy copy/paste which hinders us from running multiple scripts on a machine with a shared clipboard.