JavaScript Regex over CSS: How to deal with erroneous classes?

Question

I am working in nodejs wherein a css file is read and it reads out the classnames/idnames and their associated properties. For that purpose, I have used the following regex (data is the file content that I receive from the callback function):

  data = data.replace(/\}/gm,"}\n")
  data = data.replace(/[\r\n|\n|\r]*\}[\r\n|\n|\r]*/gm,"}~")
  data = data.split("~")
  regex = /[\.#a-z][a-z0-9\-]*\{.*\}/gi
  results = []
  for(i = 0;i<data.length;i++)
  {
    data[i] = data[i].replace(/([\r\n|\n|\r|\s]*)/gm,"")
    while ( (result = regex.exec(data[i])) ) {
      results.push(result[0]);
  }

Which reads the following file content:

@color:#ffeedd;
.circle{
background:red;
}

#big-circle{color:green;}#small-circle{
color:yellow;
}

mango{
  color:brown;
}

And gives the output as

[ '.circle{background:red;}',
  '#big-circle{color:green;}',
  '#small-circle{color:yellow;}',
  'mango{color:brown;}' ]

A brief of what I have done:

I have divided the whole CSS files on the basis of existence of }, i.e. the closing bracket for the class, and added a \n after every } data = data.replace(/\}/gm,"}\n")
I have used regex to replace every instance of newlines followed by } followed by newline characters with a } and a ~ data = data.replace(/[\r\n|\n|\r]*\}[\r\n|\n|\r]*/gm,"}~")
Then I have split the data as per the }~ to give me an array of classes/ids data = data.split("~")
Then I have removed spaces internally from each of the classes data[i] = data[i].replace(/([\r\n|\n|\r|\s]*)/gm,"")

However, there is an issue. is really works well if the classes are properly ending up with a }. If there ever be an error, this will not work properly. My question is, what regex or step can I apply to ensure that such errors are caught and shown to the user ( much like the lessc compiler)? I guess it is much more than a simple bracket matching ( which can be implemented using stack)

For example:

@color:#ffeedd;
.circle{
background:red;
}

#big-circle{color:green;}#small-circle{
color:yellow;


mango{
  color:brown;
}

Gives the following error:

ParseError: Unrecognised input. Possibly missing something in /less/style.less on line 13, column 1:
12 }
13

Thanks

A regex works on the assumption that the input is valid. A regex can reject erroneous input by failing to match it, but it would not be able to identify the precise error. — BoltClock
– BoltClock, Commented Mar 7, 2016 at 6:16

JDB · Accepted Answer · 2016-03-07 18:28:15Z

Take the simple approach... use a LESS parser rather than trying to roll one yourself.

Regex is useful for matching well-known text formats. It's terrible for matching unknown formats. In fact, it's not possible. You have noticed one particular error and want to make your ad-hoc parser roll with it, but what about other possible errors? Are you going to try and catch all of them?

Too many brackets

.circle{{
    background:red;
}

or

.circle{
    background:red;
}
}

Forgotten semi-colons

.circle{
    background:red
    color: yellow
}

Forgotten colons

.circle{
    background red
}

Mixed-up programmers

.circle{
    background=red
}

or

.circle(
    background:red
)

The list of possible errors is nearly endless, but your regex is never going to be smart enough to catch them. Use a proper LESS parser (possibly the in-client version with error reporting turned on).

Collectives™ on Stack Overflow

JavaScript Regex over CSS: How to deal with erroneous classes?

1 Answer 1

Too many brackets

Forgotten semi-colons

Forgotten colons

Mixed-up programmers

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Too many brackets

Forgotten semi-colons

Forgotten colons

Mixed-up programmers

Comments

Your Answer

Sign up or log in

Post as a guest

Related