1

So if I have simple regex such as:

"g{1,3}(a|e|i|o|u)"

I want my program to generate strings of

ga
ge
gi
go
gu
gga
gge
ggi
ggo
ggu
ggga
ggge
gggi
gggo
gggu

I would not use "g*(a|e|i|o|u)" for regex, as there can be infinite number of 'g's and there will be infinite number of strings.

Any recommendation on simple efficient algorithm to make this? I think I will be able to make these strings in a brute force way by using for/while loops, but I'm wondering if there is any methods I could use to make this algorithm work.

I googled how to create strings from regex and many people seemed to redirect to: https://code.google.com/p/xeger/ to use the library that is built, but I was wondering if I could get some suggestions to make my own for these simple regex.

3
  • 3
    regex doesn't create strings, it finds patterns in existing strings. Not sure how this would work from a regex. Commented Mar 28, 2013 at 18:37
  • 1
    @Daedalus: a regexp doesn't find patterns either. It merely specifies a pattern. While regexp libraries typically only implement an efficient way to say whether a given string matches a given regexp pattern, the concept leads the the question of how to generate the set of all matching strings just as naturally, and indeed that plays a bigger role in the theory behind the concept (regular languages). Commented Mar 28, 2013 at 18:48
  • @MichaelBorgwardt - Yes... I know what a regular expression actual is. My comment was targeted to the spirit of the OPs question, not the technical minutia and theory of formal languages. Some people just love to flex those C.S. credentials though :) Commented Mar 28, 2013 at 18:58

3 Answers 3

1

Xeger is open source. You could browse their code base for ideas.

EDIT:

Their code base looks very small, so shouldn't be too hard. It only generates random strings that will match, not all strings. It could still be a good starting point though.

Sign up to request clarification or add additional context in comments.

Comments

1

I created Debuggex, which generates random strings to give you an idea of what a regex does.

If you already have a parse tree for your regex, you can use the following logic to generate random matches:

OrTree.random:
    Choose a child randomly, return its random()

ConcatTree.random:
    For every child, call random()
    Return the concatenation of all the results

RepeatTree.random:
    Choose a valid random number of repetitions within min and max
    Call random() on your child that many times
    Return the concatenation of all the results

Literal.random:
    Return the literal

You can generate random strings even if you use the * operator. This is done by choosing a distribution from 0 to infinity from which to generate numbers, just like you use a uniform distribution for finite sets. For example:

InfiniteRepeatTree.random:
    Flip a coin until I get tails
    Call random on child() the number of times that the coin landed heads
    Return concatenation of the results

Hope that helps :)

Comments

0
char[] vowels = new char[] {'a','e','i','o','u'};
for (int i = 1; i <= 3; i++) {
    for (int j = 0; j < vowels.length; j++) {
         for (int k = 0; k < i; k++) {
             System.out.print("g");
         }
         System.out.println(vowels[j]);
    }
}

Not generic solution, but it works for your specific example

1 Comment

Yes, this was the 'brute force way' I was thinking, but yea. It only works for my specific example :D

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.