I have a programming language that supports multi-line strings, the syntax is as follows (EBNF):
longstring ::= '"""' {'"'} newline chars newline '"""' {'"'}
newline ::= '\n'
while for regular strings (single-line strings) it is:
string ::= '"' chars '"'
I am not providing full syntax just this overview, we can imagine chars to be A-Z, a-z, 0-9, _, whitespace and punctuation.
Long strings begin with """ (3 double quotes) and might have additional " (doublequotes), and their delimiter begins on a newline and must match the begining sequence of doublequotes, so if we had a string with 5 ", then the delimiter should have 5 ", this is in order to allow nested multi-line strings (for metaprogramming).
In EBNF I struggle to show how the starting set of " (doublequotes) should match delimiting number of " (doublequotes), so I described it above.
Here is the small part of my syntax file that showcases how I tried defining these sytnax groups:
syn region mylangLongString start=/\z("\{3,}\)\r/ end=/\z1/ contains=@Spell
syn region mylangString start=/"/ skip=/\\"/ end=/"/ contains=@mylangSpecial,@Spell
hi def link mylangLongString String
hi def link mylangString String
Now this fails on this example:
local a = """""
"""
""""";
local b = 5;
the syntax highlighting for string bleeds over to the end of file, for this example the line where local variable b is defined.
This is because there is uneven number of double quotes in between the long string, the regular string eats up doublequotes, resulting in first the regular empty string is matched (2 doublequotes), then the long string with delimiter of 3 double quotes is matched and, on the last line where a is defined, 2 empty strings are matched (4 doublequotes in total) and finally a single doublequote is unmatched, causing the syntax highlight bleed-over.
Another example is this:
local a = """""
"""
"""";
local b = 5;
this works just fine even though it shouldn't, this is because first the regular string eats up 2 doublequotes, then the long string matches the next 3 doublequotes together with the doublequotes in between (so 6 in total, effectively closing the match), then the remaining 4 doublequotes in the last row are matched as 2 empty regular strings. However, obviously, this is not desirable behaviour.
Keep in mind, all the chars or contents of multi-line string are highlighted as the contents of that string, meaning it doesn't contain anything (aka it doesn't contain any other syntax groups), it is sort of a raw string.
How would I resolve this? Is there a way to force internal regex engine to first check for multi-line strings when syncing instead of it matching with regular string? As shown above, defining long string syntax group before the regular string does nothing to resolve this.