0

I'm working on a side project for which I need to parse String to obtain substrings

I have a REST API containing a String parameter in the payload. This String value's pattern can vary across any of the enlisted patterns:

  1. [Name]
  2. [Name 1], [Name 2]
  3. [Name 1] and [Name 2]
  4. [Name 1], [Name 2] and [Name 3]
  5. [Name 1], [Name 2] and [Name 3], [Role]

Options I tried:

  • Including another parameter in the request payload that describes the format of the String value. For Ex: If a string value of pattern #4 is to be passed as input, here is the payload I would expect:

    {
    "Value" : "Name 1, Name 2 and Name 3",
    "Format": 4
    }

Here, it's a burden on the client to determine the format and set the format value accordingly, which is definitely not a good approach

  • Somehow determine the format (For Ex: count the number of commas and AND keyword) and accordingly use a Reg-ex dedicated for that format For Ex: If the string contains at least one comma, an occurrence of the AND keyword and a comma after the AND keyword, it could be pattern #5 (described in the list above). So use the Reg-ex pattern: ([a-zA-Z]+( [a-zA-Z]+)+),([a-zA-Z]+( [a-zA-Z]+)+),[a-zA-Z]+
    This approach does work, but still is far too rigid to be practical. For Ex: Consider 4 names (rather than 3) being a part of the value, the said pattern won't work

Is there a more generic reg-ex pattern possible that could satisfy each of the aforementioned patterns?

1
  • 2
    "Is there a more generic reg-ex pattern possible that could satisfy each of the aforementioned patterns?" - Seems to me that "and" serves the exact same purpose as the comma in your patterns. Replace " and " with comma, split the string on commas. Commented Feb 22, 2022 at 9:31

2 Answers 2

2

Here is a generic regex pattern which covers all 5 types of inputs:

^\[.*?\](?:(?:,|\s+and\s+)\s*\[.*?\](?:\s+and\s+\[.*?\])*)*$

Demo

Explanation of regex:

^                    start of string
\[.*?\]              match [Name]
(?:
    (?:,|\s+and\s+)  match either comma or "and" separator
    \s*              optional whitespace
    \[.*?\]          another [Name 2]
    (?:
        \s+and\s+    "and" separator
        \[.*?\]      more [Name] terms
    )*               zero or more
)*                   zero or more
$                    end of string
Sign up to request clarification or add additional context in comments.

Comments

1

You could write the pattern repeatedly matching all between the square brackets:

^\[[^\]\[]*](?:(?:,| and) \[[^\]\[]*])*$

In parts, the pattern matches:

  • ^ Start of string
  • \[[^\]\[]*] Match from [....]
  • (?: Non capture group
    • (?:,| and) Match either a comma followed by a space or and followed by a space
    • \[[^\]\[]*] Match from [....]
  • )* Close the non capture group and optionally repeat
  • $ End of string

Regex demo

In Java with the doubled escaped backslashes:

String regex = "^\\[[^\\]\\[]*](?:(?:,| and) \\[[^\\]\\[]*])*$"

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.