0

I am trying to scrape below table data from a website using BeautifulSoup4 and Python link is : 1: https://i.sstatic.net/PfPOQ.png

So far my code is

url = "https://www.boerse-frankfurt.de/bond/xs0216072230"
content = requests.get(url)
soup = BeautifulSoup(content.text, 'html.parser')
tbody_data = soup.find_all("table", attrs={"class": "table widget-table"})
table1 = tbody_data[2]
table_body = table1.find('tbody')
rows = table_body.find_all('tr')
 for row in rows:
        cols = row.find_all('td')
        print(cols)

With this code , I am getting result : Mycoderesult https://i.sstatic.net/C190u.png [Issuer, ] [Industry, ]

I see Issuer, Industry but value of Issuer and Industry not showing up by my result. Any help would be appreciated. TIA

2 Answers 2

1

You are not getting the entire output because data of second td of the table number 6 here is dynamically loaded via JavaScript.So you can mimic that using selenium with pandas .

import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
import time
from selenium.webdriver.chrome.options import Options

webdriver_service = Service("./chromedriver") #Your chromedriver path
driver = webdriver.Chrome(service=webdriver_service)
url = 'https://www.boerse-frankfurt.de/bond/xs0216072230-fuerstenberg-capital-erste-gmbh-2-522'
driver.get(url)
driver.maximize_window()
time.sleep(3)
table=BeautifulSoup(driver.page_source, 'lxml')
df = pd.read_html(str(table))[5]
print(df)

Output:

0                            Issuer  Fürstenberg Capital Erste GmbH
1                          Industry       Industrial and bank bonds
2                            Market                     Open Market
3                        Subsegment                             NaN
4         Minimum investment amount                            1000
5                      Listing unit                         Percent
6                        Issue date                      04/04/2005
7                      Issue volume                        61203000
8                Circulating volume                        61203000
9                    Issue currency                             EUR
10               Portfolio currency                             EUR
11                First trading day                      27/06/2012
12                         Maturity                             NaN
13  Extraordinary cancellation type                     Call option
14  Extraordinary cancellation date                             NaN
15                     Subordinated                             Yes
Sign up to request clarification or add additional context in comments.

Comments

1

Another solution, using just requests. Note, to obtain the result from the server one has to set required headers (the headers can be seen from the Developer tools -> Network tab).

import requests

url = (
    "https://api.boerse-frankfurt.de/v1/data/master_data_bond?isin=XS0216072230"
)

headers = {
    "X-Client-TraceId": "d87b41992f6161c09e875c525c70ffcf",
    "X-Security": "d361b3c92e9c50a248e85a12849f8eee",
    "Client-Date": "2022-08-25T09:07:36.196Z",
}

data = requests.get(url, headers=headers).json()
print(data)

Prints:

{
    "isin": "XS0216072230",
    "type": {
        "originalValue": "25",
        "translations": {
            "de": "(Industrie-) und Bankschuldverschreibungen",
            "en": "Industrial and bank bonds",
        },
    },
    "market": {
        "originalValue": "OPEN",
        "translations": {"de": "Freiverkehr", "en": "Open Market"},
    },
    "subSegment": None,
    "cupon": 2.522,
    "interestPaymentPeriod": None,
    "firstAnnualPayDate": "2006-06-30",
    "minimumInvestmentAmount": 1000.0,
    "issuer": "Fürstenberg Capital Erste GmbH",
    "issueDate": "2005-04-04",
    "issueVolume": 61203000.0,
    "circulatingVolume": 61203000.0,
    "issueCurrency": "EUR",
    "firstTradingDay": "2012-06-27",
    "maturity": None,
    "noticeType": {
        "originalValue": "CALL_OPTION",
        "translations": {"others": "Call option"},
    },
    "extraordinaryCancellation": None,
    "portfolioCurrency": "EUR",
    "subordinated": True,
    "flatNotation": {"originalValue": "01", "translations": {"others": "flat"}},
    "quotationType": {
        "originalValue": "2",
        "translations": {"de": "Prozentnotiert", "en": "Percent"},
    },
}

7 Comments

+1 (again, faster typing). By the way (and unrelated to this answer) have you considered improving the bs4 documentation? There are undocumented stuffs, like pseudo-selectors, or under-documented, like next_siblings, subtle diffs between find/select_one, etc. You obviously know it by heart :)
@platipus_on_fire Oh, my first language isn't english - so probably it will look really funny :) About CSS pseudo-selectors, bs4 is using soupsieve library, they have quite nice documentation (that I have in my bookmarks for long time ;) facelessuser.github.io/soupsieve/selectors/pseudo-classes
@AndrejKesely. Thanks for your reply. I tried your code on but got no result :{} Then I tried below code and it still returns empty response {} url = ( "https://api.boerse-frankfurt.de/v1/data/master_data_bond?isin=XS0216072230" ) headers = { "X-Client-TraceId": "118baf9d0dfe0d50efbc755822d39a36", "X-Security": "6dc1a08707798c575bbb35eb71f71dd2", "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36", "accept": "application/json, text/plain, */*" }
@YogitaNegi Your headers is missing Client-Date header
@AndrejKesely It still is blank.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.