0

I try scraping site in ajax page. I'm just learning python. Sorry if that is an easy question.

Using selenium to load a page and download a piece of code in html. They work perfectly as I want. But I have a problem how to parse these data.

I would like the data to look like this (It may be writing this data to a variable because then I want to transfer it to the mysql database.):

Custom ID:
Name:
Ticket NO:
Rate:
Win:

Data location in html code::

<li class="message">
    <div customid="CUSTOM ID">
        <span class="name nc-mark-user">NAME</span>
        <p>
            <span><img src="https://cht.sts.pl/assets/img/accepted.svg" width="15" height="15"> <span class="nc-ticket" onclick="serchTicketHandler('TICKET NO')">RATE / WIN zł</span></span>
        </p>
    </div>
</li>

My code in python:

import time
from selenium import webdriver
from bs4 import BeautifulSoup
from xml.dom import minidom


options = webdriver.ChromeOptions()
options.add_argument('headless')

browser = webdriver.Chrome(
            ("C:/Users/backu/Downloads/chromedriver_win32/chromedriver.exe"),
            chrome_options=options)

browser.get("https://www.sts.pl/pl/oferta/zaklady-live/")
time.sleep(1)
element = browser.find_element_by_class_name("nc-message-holder")

source = element.get_attribute('innerHTML')
print(source)

browser.close()

I don't know how to read this code now to extract the data I want.

Thank you so much for all the answers.

1

1 Answer 1

1
from time import sleep
from selenium import webdriver
from bs4 import BeautifulSoup

options = webdriver.ChromeOptions()
options.add_argument('headless')

browser = webdriver.Chrome(
        ("C:/Users/backu/Downloads/chromedriver_win32/chromedriver.exe"),
        chrome_options=options)

browser.get("https://www.sts.pl/pl/oferta/zaklady-live/")
sleep(1)
source = browser.page_source # Get the entire page source from the browser
if browser is not None :browser.close() # No need for the browser so close it 
soup = BeautifulSoup(source,'html.parser')
try:
    Tags = soup.select('ul.nc-message-holder li.message') # get the elements using css selectors    
    for tag in Tags: # loop through them 
        customerId = tag.find('div').get('customid')
        name       = tag.find('div').find('span').text
        #<span class="nc-ticket" onclick="serchTicketHandler('223461999015343335')">8.00 / 51.04 zł</span>
        ticketTag  = tag.select('span.nc-ticket')
        if ticketTag : 
            ticketNum = ticketTag[0].get('onclick').replace("serchTicketHandler('","").replace("')","")
            rate_Win  = ticketTag[0].text
            if '/' in rate_Win:
                rate_Win = rate_Win.split('/')
                rate      = rate_Win[0].strip()
                win       = rate_Win[1].strip()
            else:
                rate = rate_Win
                win  = ''

            print('\n\ncustomerId ==>',customerId)
            print('name ==>',name)
            print('ticketNum ==>',ticketNum)
            print('rate ==>',rate)
            print('win ==>',win)
except Exception as e:
    print(e)

Output:

customerId ==> c46654fa66765ae11bb34d7d99cf0a77
name ==> Wojciech W
ticketNum ==> 223461999016744267
rate ==> 100.00
win ==> 1340.24 zł


customerId ==> 7b071de240b730ad42cee50711dd8c72
name ==> Grzegorz P
ticketNum ==> 223461988025841282
rate ==> 15.94
win ==> 46.28 zł


customerId ==> 244950ab8485b7180c177a2b7b19b0ae
name ==> Michał J
ticketNum ==> 313441988030838257
rate ==> 12.00
win ==> 73967.98 zł


customerId ==> 9223e1c2f87afb02e6c704acb53308da
name ==> Piotr G
ticketNum ==> 313431999017162038
rate ==> 2.00
win ==> 430.40 zł


customerId ==> 4a8e2695fe71a084f69167ac987c7013
name ==> Dawid B
ticketNum ==> 313461988013246357
rate ==> 10.00
win ==> 1569.30 zł


customerId ==> 6b882a5ef93e0c3e52b81bbee0ba52af
name ==> Adrian P
ticketNum ==> 313441988034262951
rate ==> 2.00
win ==> 451268.63 zł


customerId ==> abd34ea0c7a9b0e07a53a78324cb7e0a
name ==> Michał D
ticketNum ==> 223461999013746135
rate ==> 10.00
win ==> 27.72 zł


customerId ==> bed4fc0ea1f21a7a9b1c6762d2302d09
name ==> Rafał Ż
ticketNum ==> 223461988021146803
rate ==> 607.40
win ==> 2150.26 zł
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.