1

Is there any relatively easy way of getting the combined attributes of one class using some sort of parser in python, or do I have to come up with some regex to get it instead ?

.container_12, .container_16 {
    margin-left:auto;
    margin-right:auto;
    width:960px
}
.grid_1, .grid_2, .grid_3, .grid_4, .grid_5 {
    display:inline;
    float:left;
    margin-left:10px;
    margin-right:10px
}
.featured_container .container_12 .grid_4 a {
    color: #1d1d1d;
    float: right;
    width: 235px;
    height: 40px;
    text-align: center;
    line-height: 40px;
    border: 4px solid #141a20;

For the above css snippet, if I searched for "container_12" it should return:

  {
        margin-left:auto;
        margin-right:auto;
        width:960px
        color: #1d1d1d;
        float: right;
        width: 235px;
        height: 40px;
        text-align: center;
        line-height: 40px;
        border: 4px solid #141a20;
    }

Duplicate attributes are fine, I will use a dictionary to store them afterwards, so it will not be a problem.

2 Answers 2

1

Here is a rough parser for your CSS:

import pyparsing as pp

# punctuation is important during parsing, but just noise afterwords; suppress it
LBRACE, RBRACE = map(pp.Suppress, "{}")

# read a ':' and any following whitespace
COLON = (":" + pp.Empty()).suppress()

obj_ref = pp.Word(".", pp.alphanums+'_') | pp.Word(pp.alphas, pp.alphanums+'_')
attr_name = pp.Word(pp.alphas, pp.alphanums+'-_')
attr_spec = pp.Group(attr_name("name") + COLON + pp.restOfLine("value"))

# each of your format specifications is one or more comma-delimited lists of obj_refs,
# followed by zero or more attr_specs in {}'s
# using a pp.Dict will auto-define an associative array from the parsed keys and values
spec = pp.Group(pp.delimitedList(obj_ref)[1,...]('refs')
                + LBRACE
                + pp.Dict(attr_spec[...])("attrs")
                + RBRACE)

# the parser will parse 0 or more specs    
parser = spec[...]

Parsing your css source:

result = parser.parseString(css_source)
print(result.dump())

Gives:

[['.container_12', '.container_16', [['margin-left', 'auto;'], ['margin-right', 'auto;'], ['width', '960px']]], ['.grid_1', '.grid_2', '.grid_3', '.grid_4', '.grid_5', [['display', 'inline;'], ['float', 'left;'], ['margin-left', '10px;'], ['margin-right', '10px']]], ['.featured_container', '.container_12', '.grid_4', 'a', [['color', '#1d1d1d;'], ['float', 'right;'], ['width', '235px;'], ['height', '40px;'], ['text-align', 'center;'], ['line-height', '40px;'], ['border', '4px solid #141a20;']]]]
[0]:
  ['.container_12', '.container_16', [['margin-left', 'auto;'], ['margin-right', 'auto;'], ['width', '960px']]]
  - attrs: [['margin-left', 'auto;'], ['margin-right', 'auto;'], ['width', '960px']]
    - margin-left: 'auto;'
    - margin-right: 'auto;'
    - width: '960px'
  - refs: ['.container_12', '.container_16']
[1]:
  ['.grid_1', '.grid_2', '.grid_3', '.grid_4', '.grid_5', [['display', 'inline;'], ['float', 'left;'], ['margin-left', '10px;'], ['margin-right', '10px']]]
  - attrs: [['display', 'inline;'], ['float', 'left;'], ['margin-left', '10px;'], ['margin-right', '10px']]
    - display: 'inline;'
    - float: 'left;'
    - margin-left: '10px;'
    - margin-right: '10px'
  - refs: ['.grid_1', '.grid_2', '.grid_3', '.grid_4', '.grid_5']
[2]:
  ['.featured_container', '.container_12', '.grid_4', 'a', [['color', '#1d1d1d;'], ['float', 'right;'], ['width', '235px;'], ['height', '40px;'], ['text-align', 'center;'], ['line-height', '40px;'], ['border', '4px solid #141a20;']]]
  - attrs: [['color', '#1d1d1d;'], ['float', 'right;'], ['width', '235px;'], ['height', '40px;'], ['text-align', 'center;'], ['line-height', '40px;'], ['border', '4px solid #141a20;']]
    - border: '4px solid #141a20;'
    - color: '#1d1d1d;'
    - float: 'right;'
    - height: '40px;'
    - line-height: '40px;'
    - text-align: 'center;'
    - width: '235px;'
  - refs: ['.featured_container', '.container_12', '.grid_4', 'a']

Using a defaultdict(dict) to accumulate attributes by referenced CSS object:

from collections import defaultdict
accum = defaultdict(dict)
for res in result:
    for name in res.refs:
        accum[name].update(res.attrs)

from pprint import pprint
pprint(accum['.container_12'])

Gives:

{'border': '4px solid #141a20;',
 'color': '#1d1d1d;',
 'float': 'right;',
 'height': '40px;',
 'line-height': '40px;',
 'margin-left': 'auto;',
 'margin-right': 'auto;',
 'text-align': 'center;',
 'width': '235px;'}
Sign up to request clarification or add additional context in comments.

7 Comments

I tried using this exact solution, maybe I am doing something wrong, but would this be able to handle entire css files ? I am passing it as this: temp_css_file = open(root + "/" + "style.css", "r") temp_css_content = temp_css_file.read() temp_css_file.close() parse_css(temp_css_content, ".container_12") But the problem is that both prints return empty. The only change I made was put it into a function. The file passed is valid, so the problem is not there.
I tried it for one short text and it seems to be the same : test_text = ".container_12, .container_16 {margin-left:auto; margin-right:auto; width:960px}" result = parser.parseString(test_text) print("<---- This is the parse result --->") print(result.dump()) <---- This is the parse result ---> []
The parser relies on the line-breaks as shown in your original CSS example to detect the end of each value. Your short text does not separate the key:value; elements onto separate lines. Try test_text = ".container_12, .container_16 {\nmargin-left:auto;\n margin-right:auto;\n width:960px\n}". If you need to support multiple key:values on the same line, then you'll need to refine the definition of attr_spec to read just up to the next ';' or '}', instead of using restOfLine.
You can also try using searchString instead of parseString if you want to sift through a full CSS file looking for these. No guarantees on what other unwanted stuff this might match. This is far from a full CSS parser.
Usually the files don't have same line attributes, that was just an example of me trying to figure out why my file does not seem to work. This is one of the files I tested with it pastebin . I am reading in python as shown in my comment above. Is there any conversion I should be doing to make it work with your parser ?
|
0

You can use:

\.container_12\b[^{]*{([\s\S]*?)}

and your desired result will be in \1 so just iterate it and do whatever you want with it.

https://regex101.com/r/AIR8W8/1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.