1

I would like to scrape linkedin for a personal only use (need to get post of a friend company page) and I'm using Selenium and BeautifulSoup for this matter.

I found that each post is a div and they all have ember-view class but sponsored posts also have this class which I don't want to scrape, more digging in the HTML code, I found that I could select user posts by selecting all div that have the value: urn:li:activity:XXXXXXXXXX for the data-urn attribute.

However in each post div, XXXXXXX is a different number, how can I select all div with data-urn=urn:li:activity:XXXXXXXXX given that XXXXXXXX is a changing number in each div ?

1 Answer 1

1

Another solution.

from simplified_scrapy import SimplifiedDoc,req,utils
html='''
<div>
  <div class="ember-view" data-urn="urn:li:activity:123">123</div>
  <div class="ember-view" data-urn=urn:li:activity:456>456</div>
  <div class="ember-view" data-urn=urn:li:activity:789>789</div>
  <div class="ember-view">other</div>
</div>
'''
doc  = SimplifiedDoc(html)
# First way
divs = doc.getElementsByReg('data-urn[\s"=]+urn:li:activity:[\d]+',tag="div").text
print (divs)
# Second way
divs = doc.selects('div.ember-view').containsReg('urn:li:activity:[\d]+',attr="data-urn").text
print (divs)

Result:

['123', '456', '789']
['123', '456', '789']
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.