0

I am trying to make use questions like this one to devise a regexp that will match and give a function name and all parameters in a very simplified Python-like syntax like the following:

mycall(x, y, hello)

with the desired results:

  • function name: mycall
  • parameter 0: x
  • parameter 1: y
  • parameter 2: hello

Of course it should also match noparams(), and any number of parameters. As for my simplifications, I just need parameters names, I don't allow default parameters or something different from a list of comma separated names.

My tries with variants of "(\\s*)([A-Za-z0-9_])+\\(\\)" just to match a function name string with spaces at the beginning are failing, with this code:

    std::regex fnregexp(s);

    std::smatch pieces_match;

    if (std::regex_match(q, pieces_match, fnregexp))
    {
        std::cout << ">>>> '" << q << "'" << std::endl;

        for (size_t i = 0; i < pieces_match.size(); ++i)
        {
            std::ssub_match sub_match = pieces_match[i];
            std::string piece = sub_match.str();
            std::cout << "  submatch " << i << ": '" << piece << "'" << std::endl;
        }
    }

I have the following output for " hello()":

>>>> '     hello()'
  submatch 0: '     hello()'
  submatch 1: '     '
  submatch 2: 'o'

With this very basic syntax, is it possible to find name of the function and its parameters?

Cheers!

6
  • Not sure what exactly you want to parse: Function declarations like def foo(param): in Python or calls to that function? Commented May 19, 2017 at 14:33
  • A string that matches a Python-like function header, I can also omit def. Commented May 19, 2017 at 14:36
  • Can your input contain more than one function ? Commented May 19, 2017 at 14:46
  • No, just one function, nothing fancy for now (for complicated things I'd go with Boost Spirit, probably). Commented May 19, 2017 at 14:47
  • If there is no noise with the function, the simplest regex you could use is \w+. First match would be the function name, each other match would be the arguments Commented May 19, 2017 at 14:55

2 Answers 2

1

Use this for the conformance check:

^\\s*[A-Za-z_]\\w* *\\( *(?:[A-Za-z_]\\w* *(?:, *[A-Za-z_]\\w* *)*)?\\)$

and if it's ok use this for extracting the parts of signature:

\\w+

the first submatch is the function name, the others are parameters.

EDIT: The correct synthax for Python is [A-Za-z_][A-Za-z0-9_]*

Sign up to request clarification or add additional context in comments.

2 Comments

You can remove the lookahead, and it will do the same ;) Edit : Oh, you just did ^^
Yes, because originally the answer did not provide the first check, so I thought other kinds of checking were necessary.
1

Matching simple function declarations with regex is feasable. For more complicated things you have exactly the right idea in going with a real parser like Boost Spirit.

The bug in your question is a wrong closing parens in the regex. Compare:

"(\\s*)([A-Za-z0-9_])+\\(\\)" // yours
"(\\s*)([A-Za-z0-9_]+)\\(\\)" // correct

The capture group in your version captures only a single character. Because of how the regex engine works it is the last one matched: the o. The correct version includes the + in the group and captures hello as expected.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.