0

Basically I want all info from Others page from the first page to the last, I try:

The website's a bit strange..I want get all issuer and other info under 'POST ISSUANCE'

driver.get('https://www.chinabondconnect.com/en/Primary/Primary-Information/Onshore.html')
wait = WebDriverWait(driver, 30)
driver.find_element_by_link_text('Others').click()
for i in range(1,20):
        pg = "tb2tr pg" + str(i)
        allitems = driver.find_element_by_xpath('//*[@id="td7"]/tbody/tr[@class=pg])')
        for i in range(len(allitems)):
            issuer = driver.find_element_by_xpath('(//tr[@class=pg]//td[1]//div[2]//div)').text
            print(issuer)

it says not a valid xpath..

Could someone help with this?

Thanks!!

1
  • hi, better with selenium Commented Mar 2, 2021 at 10:40

4 Answers 4

1

Use find_elements() to get all the records and use get_attribute("textContent") to get the hidden node value.

for item in driver.find_elements_by_xpath("//table[@id='tb7']//tr[starts-with(@class,'tb2tr pg')]//td[1]/div[2]/div"):
    print(item.get_attribute("textContent"))

Output:

Central Huijin Investment Ltd.
Dongguan Rural Commercial Bank Co., Ltd.
Gemdale (Group) Co., Ltd.
Everbright Securities
China securities co ltd
Bank of China 
Jinan Rail Transit Group Co., Ltd.
Ping An Bank Co., Ltd.
Shaanxi Financial Holding Group Co., Ltd.
Bank of Suzhou Co., Ltd.
Chongqing Expressway Group Co., Ltd.
Shanghai World Expo Land Holdings Co., Ltd.
Beijing Capital Tourism Group Co., Ltd.
CMB Financial Leasing Co., Ltd.
Shaanxi Coal Industry Chemical Group Co., Ltd.
China Securities Co., Ltd.
Guangdong Electric Power Development Co., Ltd.
China Construction Bank 
Industrial and Commercial Bank of China
Industrial and Commercial Bank of China Limited
China Securities Co., Ltd.
China Securities Co., Ltd.
China Bohai Bank
Shangrao Investment Holding Group SCP
China Securities Co., Ltd
Everbright Securities
Guangzhou Kaide Renewable Publicly Issued Corporate Bond
SCP/Guangzhou Development Zone Business Development Group
Qingdao City Investment Financial Holding Group Renewable Publicly Issued Corporate Bond
China Railway Construction Investment Group MTN
Qingdao Guoxin Development (Group) Co., Ltd.
China Securities Co., Ltd.
China Orient Asset Management Co., Ltd
    Datang International Power Generation Co.,Ltd.
Bank of China
Bank of China 
Datang International Power Generation Co.,Ltd. 
Hangzhou City Construction Investment Group Limited
YIBIN STATE OWNED ASSETS MANAGEMENT CO.,LTD.
China Railway Construction Investment Corporation
ABC Financial Leasing
Guangzhou Metro
Aluminum Corporation of China Limited
Fubon Bank
China Securities Co., Ltd.
Ganzhou Development Investment Holding Group
Shanghai rural Commercial Bank
Everbright Securities
ICBC Financial Leasing Co., Ltd
Shanghai Pudong Development Bank
China State Railway Group Co., Ltd.
China State Railway Group Co., Ltd.
CMB Financial Leasing
CMB Financial Leasing Co., Ltd.
Bank of China
Bank of China 
Industrial and Commercial Bank of China
Industrial and Commercial Bank of China
Industrial and Commercial Bank of China Limited
Industrial and Commercial Bank of China Limited
Bank of Communications Co.,Ltd.
Zhejiang State-owned Capital Operation Co., Ltd.
China Merchant Bank
China Merchants Bank
Bank of Communications Financial Leasing Co., Ltd.
CCB Financial Leasing Co., Ltd
Central Huijin Investment Ltd.
Central Huijin Investment Ltd.
China Securities Co., Ltd
Everbright Securities
Beijing Infrastructure Investment Co., LTD
Huishang Bank Corporation
Bank of Communication
China Nonferrous Metal Mining (Group) Co., Ltd
Everbright Securities
Industrial and Commercial Bank of China
Industrial and Commercial Bank of China Limited
China Securities Co., Ltd
China Everbright Bank Co., Ltd
Bank of China...so on
Sign up to request clarification or add additional context in comments.

3 Comments

Hello, it works fine, thanks for your help!
may I ask why using textcontent but not .text?
@Joyce : .text works when the element visible on the page. The application you are using you need to scroll the page to get element visible. That's the reason it is giving empty value. textcontent retrieve all hidden values if it is present in DOM. Hope I have answered your query. Please mark this as accepted and vote for it.Thanks.
1
"//table[@id='tb7']/tbody//tr[starts-with(@class,'{}')]".format(pg)

Try using this xpath for all items. To grab all tr values in td7 with that "tb2tr pg" + str(i) value.

You could just use

for item in allitems:
    issuer = item.find_element_by_xpath('./td[1]/div[2]/div').get_attribute('textContent'
    print(issuer)

5 Comments

Hi, glad you helped! but as I use for i in range(1,20): pg = "tb2tr pg" + str(i) allitems = driver.find_element_by_xpath("//table[@id='td7']/tbody//tr[starts-with(@class,'{}')]".format(pg)), it says unable to locate..not sure why
I think there is an extra 'on' on tb2tr pg1 on, I changed to for i in range(2,20), but still not able to find
thanks! but it return blank, does it mean the website cannot be scarped? besides may I ask why use tr[starts-with(@class,'{}') not tr[@class='{}']
Hey, I tried with an example, it is nothing wrong with the website, but somehow I cannot get text returns but all blank with for i in range(2,20): pg = "tb2tr pg" + str(i) driver.implicitly_wait(10) allitems = driver.find_elements_by_xpath("//table[@id='tb7']/tbody//tr[starts-with(@class,'{}')]".format(pg)) for item in allitems: issuer = item.find_element_by_xpath('./td[2]/div[2]/span').text print(issuer)
Use get_attribute('textContent') instead of text
0

correct me if I am wrong. I understand that you want to crawl the entire web page, which that means when you click, the page loads a new page. The Selenium web driver does not recognize new pages, and it focuses on the first page. You have to give it the instruction to do so. The way to solve this is:

from selenium.webdriver.support import expected_conditions as EC

# Start the driver
with webdriver.Firefox() as driver:
    # Open URL
    driver.get("https://seleniumhq.github.io")

    # Setup wait for later
    wait = WebDriverWait(driver, 10)

    # Store the ID of the original window
    original_window = driver.current_window_handle

    # Check we don't have other windows open already
    assert len(driver.window_handles) == 1

    # Click the link which opens in a new window
    driver.find_element(By.LINK_TEXT, "new window").click()

    # Wait for the new window or tab
    wait.until(EC.number_of_windows_to_be(2))

    # Loop through until we find a new window handle
    for window_handle in driver.window_handles:
        if window_handle != original_window:
            driver.switch_to.window(window_handle)
            break

    # Wait for the new tab to finish loading content

3 Comments

Hi thanks, but the website put all links already in one website, I do not need click on next page
I see, have you tried to print by element ?
element = driver.find_element(By.TAG_NAME, "a")
0

#This is used to make it wait for the page to load when a lot of resources are required and the page reloads in your window

loading = '//div[@values="elementVisible"]'

def createAudit(self):
     for count1 in range(1):
        count1 = len(self.driver.find_elements(By.XPATH, loading))
        count1 = int(count1)
        if count1 != 0:
            print("Page loaded correctly")
            time.sleep(3)
            break
        else:
            time.sleep(3)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.