1

I have a programming language that supports multi-line strings, the syntax is as follows (EBNF):

longstring ::= '"""' {'"'} newline chars newline '"""' {'"'}
newline ::= '\n'

while for regular strings (single-line strings) it is: string ::= '"' chars '"'

I am not providing full syntax just this overview, we can imagine chars to be A-Z, a-z, 0-9, _, whitespace and punctuation.

Long strings begin with """ (3 double quotes) and might have additional " (doublequotes), and their delimiter begins on a newline and must match the begining sequence of doublequotes, so if we had a string with 5 ", then the delimiter should have 5 ", this is in order to allow nested multi-line strings (for metaprogramming). In EBNF I struggle to show how the starting set of " (doublequotes) should match delimiting number of " (doublequotes), so I described it above.

Here is the small part of my syntax file that showcases how I tried defining these sytnax groups:

syn region mylangLongString start=/\z("\{3,}\)\r/ end=/\z1/ contains=@Spell
syn region mylangString start=/"/ skip=/\\"/ end=/"/ contains=@mylangSpecial,@Spell

hi def link mylangLongString           String
hi def link mylangString               String

Now this fails on this example:

local a = """""
"""
""""";
local b = 5;

the syntax highlighting for string bleeds over to the end of file, for this example the line where local variable b is defined. This is because there is uneven number of double quotes in between the long string, the regular string eats up doublequotes, resulting in first the regular empty string is matched (2 doublequotes), then the long string with delimiter of 3 double quotes is matched and, on the last line where a is defined, 2 empty strings are matched (4 doublequotes in total) and finally a single doublequote is unmatched, causing the syntax highlight bleed-over.

Another example is this:

local a = """""
"""
"""";
local b = 5;

this works just fine even though it shouldn't, this is because first the regular string eats up 2 doublequotes, then the long string matches the next 3 doublequotes together with the doublequotes in between (so 6 in total, effectively closing the match), then the remaining 4 doublequotes in the last row are matched as 2 empty regular strings. However, obviously, this is not desirable behaviour.

Keep in mind, all the chars or contents of multi-line string are highlighted as the contents of that string, meaning it doesn't contain anything (aka it doesn't contain any other syntax groups), it is sort of a raw string.

How would I resolve this? Is there a way to force internal regex engine to first check for multi-line strings when syncing instead of it matching with regular string? As shown above, defining long string syntax group before the regular string does nothing to resolve this.

1 Answer 1

0

That's because it matches in a different order.

First, it matches local a = "" (i.e. "short string") and only then """ ("long string").

The easiest solution is to swap syntax declaration order. Make sure, also, to read :h syn-priority. Yes, Vim has it in a different way than others.

BTW. "A short string" should probably be oneline, I guess.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.