6

turns out that both of these sequences (previously working)

"`([\n\A;]+)\/\*(.+?)\*\/`ism" => "$1",     // error
"`([\n\A;\s]+)//(.+?)[\n\r]`ism" =>"$1\n",  // error

Now throw an error in PHP 7.3

Warning: preg_replace(): Compilation failed: escape sequence is invalid in character class offset 4

CONTEXT: consider this snipit, which removes CSS comments from a string

$buffer = ".selector {color:#fff; } /* some comment to remove*/";
$regex = array(
"`^([\t\s]+)`ism"=>'',
"`^\/\*(.+?)\*\/`ism"=>"",
"`([\n\A;]+)\/\*(.+?)\*\/`ism"=>"$1",     // 7.3 error
"`([\n\A;\s]+)//(.+?)[\n\r]`ism"=>"$1\n", // 7.3 error
"`(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+`ism"=>"\n"
);
$buffer = preg_replace(array_keys($regex),$regex,$buffer);
//returns cleaned up $buffer value with pure css and no comments

Refer to: https://stackoverflow.com/a/1581063/1293658

Q1 - Any ideas whats wrong with the REGEX in this case? This thread seems to suggest it's simply a misplaced backslash https://github.com/thujohn/twitter/issues/250

Q2 - Is this a PHP 7.3 bug or a problem with the REGEX sequence in this code?

6
  • What are you trying to match with \A? If you check your regex with regex101.com you'll see that it doesn't even match the first character class! The regex would match with \w\s but I don't really know if it's this what you wanted to match! Commented Sep 7, 2019 at 1:52
  • 1
    You might wanna place the regex in single quotes. To avoid PHP Escape sequence interpretation. Commented Sep 7, 2019 at 4:06
  • Can you please extract a minimal reproducible example? Also, if it works with 7.2 but fails with 7.3, check the release notes. Maybe the code relies on a bug that was fixed. Commented Sep 7, 2019 at 7:02
  • @slepic, using single quotes would require additional steps in order to get newlines and carriage returns in there. In particular, just replacing double quotes with single quotes changes the string content in this case. Commented Sep 7, 2019 at 7:04
  • What if you add (*NO_JIT) at the start of the pattern? Commented Sep 7, 2019 at 7:21

1 Answer 1

3

Do not use zero-width assertions inside character classes.

  • ^, $, \A, \b, \B, \Z, \z, \G - as anchors, (non-)word boundaries - do not make sense inside character classes since they do not match any character. The ^ and \b mean something different in the character class: ^ is either the negated character class mark if used after the open [ or denotes a literal ^. \b means a backspace char.

  • You can't use \R (=any line break) there, neither.

The two patterns with \A inside a character class must be re-written as a grouping construct, (...), with an alternation operator |:

"`(\A|[\n;]+)/\*.+?\*/`s"=>"$1", 
"`(\A|[;\s]+)//.+\R`"=>"$1\n", 

I removed the redundant modifiers and capturing groups you are not using, and replaced [\r\n] with \R. The "`(\A|[\n;]+)/\*.+?\*/`s"=>"$1" can also be re-written in a more efficient way:

"`(\A|[\n;]+)/\*[^*]*\*+(?:[^/*][^*]*\*+)*/`"=>"$1"

Note that in PHP 7.3, acc. to the Upgrade history of the bundled PCRE library table, the regex library is PCRE 10.32. See PCRE to PCRE2 migration:

Until PHP 7.2, PHP used the 8.x versions of the legacy PCRE library, and from PHP 7.3, PHP will use PCRE2. Note that PCRE2 is considered to be a new library although it's based on and largely compatible with PCRE (8.x).

Acc. to this resource, the updated library is more strict to regex patterns, and treats former leniently accepted user errors as real errors now:

  • Modifier S is now on by default. PCRE does some extra optimization.
  • Option X is disabled by default. It makes PCRE do more syntax validation than before.
  • Unicode 10 is used, while it was Unicode 7. This means more emojis, more characters, and more sets. Unicode regex may be impacted.
  • Some invalid patterns may be impacted.

In simple words, PCRE2 is more strict in the pattern validations, so after the upgrade, some of your existing patterns could not compile anymore.

Sign up to request clarification or add additional context in comments.

4 Comments

I see. so, this REGEX will need to be carefully rewritten. I am not good at Regular expression, any suggestions for stripping out /*CSS comments*/ ? Other than that I would say your answer here is "technically correct" - From what I gather here "\A" (which I assume is "begenning of string") is the problem. I then am not sure how to also target "\n \r new lines" within /*CSS comments*/ IF the segment began with a "new line"
@ChristianŽagarskas I added the fixed patterns to the answer.
outstanding. worked perfectly. After playing with this for a few hours and studying what you have written here I can see I was quite a way off on what I thought needed to change... Thank you for this, my understanding of regular expression has increased. Cheers. (I will be ordering a copy of "Mastering Regular Expressions" based on your other linked comment.)
I don't see this working - see onlinephp.io/c/5c564 .... what am I doing wrong?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.