How to select element using regex and an attribute

Question

I would like to scrape linkedin for a personal only use (need to get post of a friend company page) and I'm using Selenium and BeautifulSoup for this matter.

I found that each post is a div and they all have ember-view class but sponsored posts also have this class which I don't want to scrape, more digging in the HTML code, I found that I could select user posts by selecting all div that have the value: urn:li:activity:XXXXXXXXXX for the data-urn attribute.

However in each post div, XXXXXXX is a different number, how can I select all div with data-urn=urn:li:activity:XXXXXXXXX given that XXXXXXXX is a changing number in each div ?

dabingsou · Accepted Answer · 2020-03-18 23:41:32Z

1

Another solution.

from simplified_scrapy import SimplifiedDoc,req,utils
html='''
<div>
  <div class="ember-view" data-urn="urn:li:activity:123">123</div>
  <div class="ember-view" data-urn=urn:li:activity:456>456</div>
  <div class="ember-view" data-urn=urn:li:activity:789>789</div>
  <div class="ember-view">other</div>
</div>
'''
doc  = SimplifiedDoc(html)
# First way
divs = doc.getElementsByReg('data-urn[\s"=]+urn:li:activity:[\d]+',tag="div").text
print (divs)
# Second way
divs = doc.selects('div.ember-view').containsReg('urn:li:activity:[\d]+',attr="data-urn").text
print (divs)

Result:

['123', '456', '789']
['123', '456', '789']

answered Mar 18, 2020 at 23:41

dabingsou

2,4691 gold badge7 silver badges8 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to select element using regex and an attribute

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related