0

I'm trying to parse HTML and extract each CSS selector. The problem I'm running into is separating/splitting selectors when multiple are present under the same div class.

html = '<div class="col-xl-4 col-md-6"> <div class="card hover-translate-y-n3 hover-shadow-lg overflow-hidden"><div class="position-relative overflow-hidden">'

css = re.findall(r'(?:class=")([^"]*)', html)

Current Output: ['col-xl-4 col-md-6', 'card hover-translate-y-n3 hover-shadow-lg overflow-hidden', 'position-relative overflow-hidden']

Desired Output: ['col-xl-4', 'col-md-6', 'card', hover-translate-y-n3', 'hover-shadow-lg, 'overflow-hidden', 'position-relative', 'overflow-hidden']

1
  • First off, regex is generally not the correct tool to use for parsing HTML. Use an HTML parser instead. Second, why not just call [item.split() for item in current_results]? Commented Oct 15, 2022 at 0:44

1 Answer 1

2

You can just alter your css with using split,

html = '<div class="col-xl-4 col-md-6"> <div class="card hover-translate-y-n3 hover-shadow-lg overflow-hidden"><div class="position-relative overflow-hidden">'
css = re.findall(r'(?:class=")([^"]*)', html)
css = [i for item in css for i in item.split()]

Output:

In [1]: print([i for item in css for i in item.split()])
['col-xl-4', 'col-md-6', 'card', 'hover-translate-y-n3', 'hover-shadow-lg', 'overflow-hidden', 'position-relative', 'overflow-hidden']
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.