3

I have just bought a book on Regex to try and get my head around it but I'm still really struggling with it. I am trying to create a java regex that will satisfy a string configuration that can;

  1. Can contain lowercase letters ([a-z])
  2. Can contain commas (,) but only between words
  3. Can contain colon (:) but must be separated by words or multiply (*)
  4. Can contain hyphens (-) but must be separated by words
  5. Can contain multiply (*) but if used it must be the only character before/between/after the colon
  6. Cannot contain spaces, 'words' are delimitated by a hyphens (-) or commas (,) or colon (:) or the end of the string

So for example the following would be true:

  1. foo:bar
  2. foo-bar:foo
  3. foo,bar:foo
  4. foo-bar,foo:bar,foo-bar
  5. foo:bar:foo,bar
  6. *:foo
  7. foo:*
  8. *:*:*

But the following would be false:

  1. foo :bar
  2. ,foo:bar
  3. foo-:bar
  4. -foo:bar
  5. foo,:bar-
  6. foo:bar,
  7. foo,*:bar
  8. foo-*:bar

This is what I have so far:

^[a-z-]|*[:?][a-z-]|*[:?][a-z-]|*
3
  • 10
    Have you tried something to accomplish this? Commented Sep 12, 2013 at 14:56
  • Try something and post your trials, and we're here to help you. Commented Sep 12, 2013 at 15:00
  • 1
    converting my answer to comment as asked : this is not java code, but here is a web service where you can test your regexps online : regexplanet.com/advanced/java/index.html . it's a life saver. well at least it saves a lot of time. Apart from your book, you should also have in mind the javadoc for the Pattern class : docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html Commented Sep 12, 2013 at 16:14

2 Answers 2

3

Here is a regex that will work for all your cases:

([a-z]+([,-][a-z]+)*|\*)(:([a-z]+)([,-][a-z]+)*|\*)*

Here is a detailed analysis:

One of the basic structures used to build complicated regular expressions like this is actually pretty simple, and has the form text(separator text)*. A regex of that form will match:

  • one text
  • one text, a separator, and another text
  • one text, a separator, another text, another separator, and yet another text
  • or more, just add another separator and a text to the end.

So here is a breakdown of the code:

  • [a-z]+([,-][a-z]+)* is an instance of the pattern I discussed above: the text here is [a-z]+, and the separator is [,-].
  • ([a-z]+([,-][a-z]+)*|\*) allows an asterisk to be matched instead.
  • ([a-z]+([,-][a-z]+)*|\*)(:([a-z]+([,-][a-z]+)*|\*))* is another instance of the pattern I discussed above: the text is ([a-z]+([,-][a-z]+)*|\*), and the separator is :.

If you plan to use this as a component of an even larger regex, in which the group matches will be important, I would recommend making the internal parens non-grouping, and place grouping parens around the entire regex, like so:

((?:[a-z]+(?:[,-][a-z]+)*|\*)(?::([a-z]+)(?:[,-][a-z]+)*|\*)*)
Sign up to request clarification or add additional context in comments.

4 Comments

Since the OP is learning regex, it would probably be beneficial to break this down and explain it's different components.
If you don't mind breaking it down that would be hugely helpfull
@RyanWH Finished breaking it down.
Thank you very much for taking the time to break that down in such a constructive manner it was extremely useful. I did find however that your published solution did not quite work for a couple of reasons. The first is most likely a typo as it appears you have places one too many brackets at the start that is not matched by a closing bracket. The other was that the second instance of the pattern did not allow for '*'. Just incase you are interested I modified the brackets to match your explanation and now it works: ([a-z]+([,-][a-z]+)*|\\*)(:([a-z]+([,-][a-z]+))*|\\*)*
2

We rarely see here somebody who can define positive and negative test cases. That makes live really easier.

Here's my regex with a 95% solution:

  • "(([a-z]+|\\*)[:,-])*([a-z]+|\\*)" (JAVA-Version)
  • (([a-z]+|\*)[:,-])*([a-z]+|\*) (plain regex)

It simply differntiates between words (a-z or *) and separators (one of :-,) and it must contain at least one word and words must be separated by a separator. It works for the positive cases and for the negative cases except the last two negative ones.

One remark: Such a complex "syntax" would in real live be implemented with a grammer definition tool like ANTLR (or a few years ago with lex/yacc, flex/bison). Regex can do that but will not be easy to maintain.

1 Comment

+1 for mentioning ANTLR, I had never heard of it but I will investigate it as it looks generally interesting.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.