Extracting text from span

Question

I have a problem regarding a span tag, that has no id or class. The larger approach is to extract the text between "ITEM 1. BUSINESS" TO "ITEM 1A. RISK FACTORS" from the link below. However, I can't figure out a way to find this part, because the span it is in, has no id nor a class I can search for (only the parent div the span is in: div = soup.find("div", {"id": "dynamic-xbrl-form"}).

This code does not work, sadly: #text = unicodedata.normalize('NFKD', soup.get_text()).replace('\n', '')

Here is my approach:

url = 'https://www.sec.gov/ix?doc=/Archives/edgar/data/934549/000093454919000017/actg2018123110-k.htm#s62CF0831C63E51C2BEF33F4163F1DE65'
raw = requests.get(url)
soup = BeautifulSoup(raw.content)

div = soup.find("span", {"id": ... })
print(div.txt)

Do you have any ideas or hints?

Thanks a lot Julius

Omer Tekbiyik · Accepted Answer · 2020-01-23 12:13:16Z

1

As @Gagan said , The content of website are loaded from Javascript. You need to use Selenium

Using Selenium is more powerful than other Python function .I used ChromeDriver so If you don't install yet You can install it in

http://chromedriver.chromium.org/

from  selenium import webdriver

driver_path = r'your driver path'
browser = webdriver.Chrome(executable_path=driver_path)
browser.get("https://www.sec.gov/ix?doc=/Archives/edgar/data/934549/000093454919000017/actg2018123110-k.htm#s62CF0831C63E51C2BEF33F4163F1DE65")
datas = browser.find_elements_by_css_selector("span") // use # or . for class or id name like span#id_name , span.class_name

for spans in datas:
    print(spans.text)

You can also get all source

print (browser.page_source)

answered Jan 23, 2020 at 12:13

Omer Tekbiyik

4,8041 gold badge19 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Heka Over a year ago

Hey, I tried the selenium approach as well before, but with find_elements_by_xpath, however I was not able with this particular link (sec.gov) to find any div with "class: col-sm-12" or "id=dynamic-xbrl-form". Although the div with those attributes is clearly in the html code. To be specific, I used this code: driver.find_element_by_xpath("//div[@id='dynamic-xbrl-form']") But I only get "Unable to locate element" errors. Usually this does get me the right result. Sadly the span I am actually looking for has neither "id" nor "class"!

Gagan T K · Accepted Answer · 2020-01-23 10:45:02Z

0

The content of this page are loaded from JavaScript, you cannot use BeautifulSoup for this. Make use of selenium for this purpose.

answered Jan 23, 2020 at 10:45

Gagan T K

7383 silver badges13 bronze badges

1 Comment

Heka Over a year ago

Thanks for the response. Do you know which function I need? Something with "find"?

unknown · Accepted Answer · 2020-01-23 13:46:31Z

0

In my case I am checking using id of span tag, this solved mine:

import requests
from bs4 import BeautifulSoup
URL = 'https://www.facebook.com/hackerv728'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
titles = soup.find_all('span', id='fb-timeline-cover-name')
for title in titles:
    print(title.text.strip())

edited Jan 23, 2020 at 13:46

answered Jan 23, 2020 at 10:55

unknown

3526 silver badges25 bronze badges

1 Comment

Heka Over a year ago

Thanks, but did not work. Also the div.text is not a valid method.

Collectives™ on Stack Overflow

Extracting text from span

3 Answers 3

1 Comment

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related