0

I've got a EBNF that needs to be Vim-syntaxed:

header :== 'header' ( '{' 'header1' ( '{ 'header2' nest_3? 'trailer2' '}' )? 'trailer1' '}' )? 'trailer'

Railroad diagram gets me:

header railroad diagram

And I'm struggling to have Vimscript process the syntax correctly as well highlighting.

header 
header trailer
header { header1 trailer1 } trailer
header { header1  { header2 trailer2 } trailer1 } trailer
header {
    header1 {
        header2
        trailer2
    }  trailer1
} trailer

" All errors below
header { XYZ header1    { header2 XYZ trailer2 } XYZ trailer1 XYZ } trailer XYZ

3 Answers 3

2

This one looks absolutely straightforward to me.

syn region level1 start=/header/ end=/trailer/ contains=level2,error1
syn match error1 /anything except trailer, empty space or {/ contained
syn region level2 start=/{/ end=/}/ contains=level3,error2 contained
syn region level3 start=/header1/ ...
...

I don't even bother to check it but I'm pretty sure the only "error" is the allowance of empty {} groups. Which, I believe, is ok for a syntax highlighter.

Also to note, (1) keepend doesn't belong here as all regions are balanced; also, don't put in other attributes (skipwhite, excludenl, etc.) until having read what they are used for; they don't work like a magical wand fixing every error you did; quite the contrary, they are very specific "overrides" that used only if necessary. (2) hi within syntax file should always be hi default.

6
  • I absolutely love terseR vimscript statements, and this is it. while it is very useful, often times in bigger syntax tree, header, et. al. have their own combo list or worse, complex ones as well in place of an ordinary keyword-like value. Thanks for the great tip! (I need to test and insert the screenshot result). I just wanted to point out that the ones I’ve supplied becomes quite extensible and templatable, should header ever go compound-complex. Commented Sep 3, 2024 at 5:55
  • @JohnGreene You have a choice: either use different start(s) for same region, or create different regions for the same nesting level. They still may share some of inner regions, of course. Commented Sep 3, 2024 at 6:20
  • 1
    @JohnGreene But, in general, yes there is a problem. It only works well while we can nest regions one into another. But if we ever switch to "horizontal" mode we need elaborate regexes and "nextgroup"(s) to keep things in order. In this case it makes sense to tolerate small errors, at least. Commented Sep 3, 2024 at 6:58
  • Please elaborate what is a "horizontal mode" a bit (at least for me). Commented Sep 3, 2024 at 15:32
  • @JohnGreene I meant simply "non-nested matches" vs. nested ones. BTW. Vim syntax can easily become slow. Especially, if using sophisticated look-behind regexes on very long lines and such. After all, it works on live data which can count in dozens of megs. Vim is quite good at editing huge files but many syntaxes are not, unfortunately. So never try to replace linters/compilers etc. Just keep it fast, simple and maintainable. Commented Sep 3, 2024 at 17:06
1

If I broke the EBNF down a bit further such that:

top-level

block_1

enter image description here

By focusing on the lowest, deepest grouping (block_2), we get

" Inside nest 2 "
hi link   trailer2 Number
syn match trailer2 "trailer2" skipwhite contained  " end-group "

hi link   header2 Identifier
syn match header2 "header2" skipwhite contained
\ nextgroup=trailer2

Then next up, block_1

" Inside nest 1 "
hi link   trailer1 Number
syn match trailer1 "trailer1" skipwhite contained  " end-group  "

hi link    block_2 Normal
syn region block_2 start="{" end="}" excludenl skipnl skipempty keepend contained
\ contains=header2,trailer2,test_any
\ nextgroup=trailer1

hi link   header1 PreProc
syn match header1 "header1" skipnl skipempty skipwhite contained
\ nextgroup=
\    trailer1,
\    block_2,  " block_2 is optional  "

Finally, the top-level:

" top-level "
hi link   trailer Identifier
syn match trailer "trailer" skipwhite contained

hi link    block_1 Normal
"do not use `keepend` on top-level nest "
syn region block_1 start="{" end="}" skipnl skipwhite contained
\ contains=header1,trailer1,test_any
\ nextgroup=trailer

hi link     header Statement
syn match   header "^\s*header" skipwhite
\ nextgroup=
\    block_1,
\    trailer

and the end result is:

End results

Tips

I used DrChip and this for troubleshooting

" Debugging pattern "
hi link test_any Error
syn match test_any "\v[a-zA-Z0-9\s\-\_\"]"

Of course, I could and have been be wrong, but here's what I've learned so far:

  • always add '\v' to any OR-combo list like '\v(opt1|opt2|opt3)' in syntax match
  • always add '\v' to any OR-combo list like '\v[a-zA-Z0-9_]' in syntax match
  • place any 'contained' keyword at end of line (EOL) (readability)
  • never use a '?' as a lone operator in match statements
  • 'contains=' ordering MATTERS in cluster statements
  • 'region' seems to enjoy the 'keepend' option, but not at the top-level of nested regions.
  • ordering between 'contains=' and 'nextgroup=' statements, first one wins in a match or a cluster statement (but not in a region)
  • ordering between 'contains=' statements amongst themselves, first one wins
  • ordering within 'contains=' statements, last one wins
  • ordering within 'nextgroup=' statements, last one wins
1
  • My answer is good for validation-like, configuration-concise, small content, or catching esoteric errors in SLOW rolling-visual data streaming, but requires great expense of vimscript validation-testing. Commented Sep 3, 2024 at 21:31
0

This is not an answer, but a placeholder of graphics in response to Matt, I've transferred his answer (and jiggle it some more for effect) into here.

" Debugging pattern "
hi link test_any Error
syn match test_any "\v[ \^\(\)\{\}\,\;a-zA-Z0-9\s\-\_\"]"

syn match trailer 'trailer' contained
syn match trailer1 'trailer1' contained
syn match left_curly_brace '{' contained
syn match emptyspace '\w' contained

syn match error3 /anything except trailer, empty space or {/ contained
syn region level3 start=/header2/ end=/trailer2/ 
syn cluster error2 
\ contains=CONTAINED,trailer1,left_curly_brace,emptyspace 
syn region level3 start=/header1/ end=/trailer1/ 
syn region level2 start=/{/ end=/}/ contains=level3,@error2 contained
"syn cluster error1 
"\ contains=CONTAINED,trailer,left_curly_brace,emptyspace 
syn region level1 start=/\v^\s*header/ end=/trailer/ contains=level2,@error1

hi link level1 PreProc
hi link level2 Constant
hi link level3 Identifier
hi link error1 Error
hi link error2 Error
hi link error3 Error

Matt's result

Did more testing:

More testing of Matt's result

The breakdown of my answer is:

enter image description here

So, it looks like we have a choice: if one is in a hurry and has relatively stable (valid) content to work with as often is with streaming content in data science, I would definitely go with Matt's approach.

On the other hand, if one is attempting at validation of a configuration file, then I would go with my answer.

4
  • 1
    of course, i err’d in slapping this together, some lines were in wrong category. Just ask me if you want a github repo of ftdetect/syntax files. I also use DrChip debugger. Commented Sep 3, 2024 at 7:40
  • 1
    DrChip, over at github.com/kergoth/vim-hilinks Commented Sep 3, 2024 at 8:15
  • What do you mean "plowed his answer"? Commented Sep 3, 2024 at 20:35
  • "through here", country-bumpkin here. (edited answer) Commented Sep 3, 2024 at 20:40

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.