0

I have a CSS file generated by some tool, and it's formatted like this:

@font-face {
    font-family: 'icomoon';
    src:url('fonts/icomoon.eot?4px1bm');
    src:url('fonts/icomoon.eot?#iefix4px1bm') format('embedded-opentype'),
        url('fonts/icomoon.woff?4px1bm') format('woff'),
        url('fonts/icomoon.ttf?4px1bm') format('truetype'),
        url('fonts/icomoon.svg?4px1bm#icomoon') format('svg');
    font-weight: normal;
    font-style: normal;
}

[class^="icon-"], [class*=" icon-"] {
    font-family: 'icomoon';
    speak: none;
    font-style: normal;
    font-weight: normal;
    font-variant: normal;
    text-transform: none;
    line-height: 1;

    /* Better Font Rendering =========== */
    -webkit-font-smoothing: antialiased;
    -moz-osx-font-smoothing: grayscale;
}

.icon-pya:before {
    content: "\e60d";
}
.icon-pyp:before {
    content: "\e60b";
}
.icon-tomb:before {
    content: "\e600";
}
.icon-right:before {
    content: "\e601";
}

I want use a regular expression in Python to extract every CSS selector which starts with .icon- and its related value, e.g:

{key: '.icon-right:before', value: 'content: "\e601";'}

I only have basic regular expression knowledge, So I write this: \^.icon.*\, but it can only match the keys, not the values.

4
  • In which language you'll apply this regex? is it Javascript? Commented May 31, 2014 at 3:51
  • actually in python. but i think it shouldn't be matter. right? Commented May 31, 2014 at 4:04
  • Hey Leo, did one of the answers help with the problem, or are you still wrestling with it? Commented Jun 1, 2014 at 22:03
  • yep,both you two's answer is detail enough.thanks:) Commented Jun 3, 2014 at 0:49

2 Answers 2

1

If you're using Python, this regex works:

(\.icon-[^\{]*?)\s*\{\s*([^\}]*?)\s*\}

Example:

>>> css = """
... /* ... etc ... */
... .icon-right:before {
...     content: "\e601";
... }
... """
>>> import re
>>> pattern = re.compile(r"(\.icon-[^\{]*?)\s*\{\s*([^\}]*?)\s*\}")
>>> re.findall(pattern, css)
[
    ('.icon-pya:before', 'content: "\\e60d";'),
    ('.icon-pyp:before', 'content: "\\e60b";'),
    ('.icon-tomb:before', 'content: "\\e600";'),
    ('.icon-right:before', 'content: "\\e601";')
]

You can then convert that to a dictionary easily:

>>> dict(re.findall(pattern, css))
{
    '.icon-right:before': 'content: "\\e601";',
    '.icon-pya:before': 'content: "\\e60d";',
    '.icon-tomb:before': 'content: "\\e600";',
    '.icon-pyp:before': 'content: "\\e60b";'
}

This is usually a more sensible data structure than a sequence of {'key': ..., 'value': ...} dictionaries - if you must have the latter, I'll assume you have enough Python to work out how to get it.

Okay, that was a pretty complex regex, so taking it piece by piece:

(\.icon-[^\{]*?)

This is the first capturing group, delimited by parentheses. Inside those, we've got \.icon-, followed by [^\{]*? - which is a sequence of 0 or more (*) but as few as possible (?) of anything but a '{' ([^\{]).

Then, there's a non-captured section:

\s*\{\s*

This means any amount of whitespace (\s*), followed by a '{' (\{), followed by any amount of whitespace (\s*).

Next, our second capturing group, again enclosed in parentheses:

([^\}]*?)

... which is 0 or more (*) but as few as possible (?) of anything but a '}' ([^\}]).

Finally, the last non-captured section:

\s*\}

... which is any amount of whitespace (\s*), followed by a '}' (\}).

In case you're wondering, the reason for using *? (0 or more but as few as possible - known as a non-greedy match) is so that the match for \s* (any amount of whitespace) can consume as much whitespace as possible, and it won't end up inside the captured groups.

Sign up to request clarification or add additional context in comments.

1 Comment

Hi,@zero.Thanks for your detailed explain. Actually I still don't understand why do you use \s*? instead of \s*. because I tried \s*, and it works as well. Can you give me an example that only \s*? would work but \s* won't work?
1

With your current content, this regex would work:

(\.icon-[^\s{]+)\s*{\s*([^;]*;)

See demo (look at the substitutions at the bottom)

The name would get captured to Group 1, and the rule to Group 2.

To output in the format you specified, you have several options.

For instance, tweak the regex slighty and replace with

{key: '\1', value: '\2' }

This assumes only one rule per set of braces.

A better option is to find all the matches, then for each match output the string you want, concatenating from the Group 1 and Group 2 captures.

Here is a start:

reobj = re.compile(r"(\.icon-[^\s{]+)\s*{\s*([^;]*;)")
for match in reobj.finditer(subject):
    # Group 1: match.group(1)
    # Group 2: match.group(2)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.