0

I have a problem regarding a span tag, that has no id or class. The larger approach is to extract the text between "ITEM 1. BUSINESS" TO "ITEM 1A. RISK FACTORS" from the link below. However, I can't figure out a way to find this part, because the span it is in, has no id nor a class I can search for (only the parent div the span is in: div = soup.find("div", {"id": "dynamic-xbrl-form"}).

This code does not work, sadly: #text = unicodedata.normalize('NFKD', soup.get_text()).replace('\n', '')

Here is my approach:

url = 'https://www.sec.gov/ix?doc=/Archives/edgar/data/934549/000093454919000017/actg2018123110-k.htm#s62CF0831C63E51C2BEF33F4163F1DE65'
raw = requests.get(url)
soup = BeautifulSoup(raw.content)

div = soup.find("span", {"id": ... })
print(div.txt)

Do you have any ideas or hints?

Thanks a lot Julius

3 Answers 3

1

As @Gagan said , The content of website are loaded from Javascript. You need to use Selenium

Using Selenium is more powerful than other Python function .I used ChromeDriver so If you don't install yet You can install it in

http://chromedriver.chromium.org/

from  selenium import webdriver

driver_path = r'your driver path'
browser = webdriver.Chrome(executable_path=driver_path)
browser.get("https://www.sec.gov/ix?doc=/Archives/edgar/data/934549/000093454919000017/actg2018123110-k.htm#s62CF0831C63E51C2BEF33F4163F1DE65")
datas = browser.find_elements_by_css_selector("span") // use # or . for class or id name like span#id_name , span.class_name

for spans in datas:
    print(spans.text)

You can also get all source

print (browser.page_source)
Sign up to request clarification or add additional context in comments.

1 Comment

Hey, I tried the selenium approach as well before, but with find_elements_by_xpath, however I was not able with this particular link (sec.gov) to find any div with "class: col-sm-12" or "id=dynamic-xbrl-form". Although the div with those attributes is clearly in the html code. To be specific, I used this code: driver.find_element_by_xpath("//div[@id='dynamic-xbrl-form']") But I only get "Unable to locate element" errors. Usually this does get me the right result. Sadly the span I am actually looking for has neither "id" nor "class"!
0

The content of this page are loaded from JavaScript, you cannot use BeautifulSoup for this. Make use of selenium for this purpose.

1 Comment

Thanks for the response. Do you know which function I need? Something with "find"?
0

In my case I am checking using id of span tag, this solved mine:

import requests
from bs4 import BeautifulSoup
URL = 'https://www.facebook.com/hackerv728'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
titles = soup.find_all('span', id='fb-timeline-cover-name')
for title in titles:
    print(title.text.strip())

1 Comment

Thanks, but did not work. Also the div.text is not a valid method.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.