3

Is there any way to match a function block in javascript source code using regular expressions?

(Really I'm trying to find the opposite of that, but I figured this would be a good place to start.)

6 Answers 6

8

I have a quite effective javascript solution, contrary to everyone elses belief... try this, i've used it and it works great function\s*([A-z0-9]+)?\s*\((?:[^)(]+|\((?:[^)(]+|\([^)(]*\))*\))*\)\s*\{(?:[^}{]+|\{(?:[^}{]+|\{[^}{]*\})*\})*\}

https://regex101.com/r/zV2fO7/1

Sign up to request clarification or add additional context in comments.

Comments

5

There are a certain things that regular expressions just aren't very good at. That doesn't mean it's impossible to build an expression that will work, just that it's probably not a good fit. Among those things:

  • multi-line input
  • nesting

Javascript function blocks tend to cover multiple lines, and you are going to want to find the matching "{" and "}" braces that signify the start and end of the block, which could be nested to an unknown depth. You also need to account for potential braces used inside comments. RegEx will be painful for this.

That doesn't mean it's impossible, though. You might have additional information about the nature of the functions you're looking for. If you can do things like guarantee no braces in comments and limit nesting to a specific depth, you could still build an expression to do it. It'll be somewhat messy and hard to maintain, but at least within the realm of the possible.

4 Comments

How is multi-line input a problem?
It depends on the engine- some just don't support it. others have bugs.
Isn't that more of a problem of the engines than regexes?
You could say that. I see it as more of an issue with the technology- multi-line is just one more thing you have to watch for.
5

Not really, no.

Function blocks aren't regular and so regular expressions aren't the right tool for the job. See, in order to capture a function block in JS, you need to count instances of { and balance them against instances of }, otherwise you're going to match too much or too little. Regular expressions can't do this kind of counting.

Just read in the file you're trying to look at and manage the nesting recursively. It's conceptually very easy to manage this way.

Comments

3

No, it is not possible. Regexes can't match nested pairs of characters. So something like this would fool it:

function foo() {
    if(bar) {
        baz();
    } // oops, regex would think this was end of function
}

However, you could create a fairly simple grammar to do it (in EBNF-ish form):

javascript_func
: "function" ID "(" ")" "{" body* "}"
| "function" ID "(" params ")" "{" body* "}"
;

params
: ID
| params "," ID

body
: [^{}]* // assume this is like a regex
| "{" body* "}"
;

Oh, this is also assuming you have some kind of lexer to strip out whitespace and comments.

3 Comments

Actually, a greedy regex would match the whole function. However, if another function followed it, it would be grabbed too.
Oh, and you can have nested functions too... (function definition within another)
@GalacticCowboy: I assumed the greediness problem would be fairly obvious, but you are correct.
3

Some regex engines do allow recursion. Say in PHP or PCRE you could get nested brackets like so:

{(?:[^{}]+|(?R))*+}

?R "pastes" the entire expression in it's place. To capture functions subgroups will be more useful:

function[^{]+({(?:[^{}]+|(?-1))*+})

And then we might want to filter out any comments breaking the brackets (needs sm flags):

function\s+\w+\s*\([^{]+({(?:[^{}]+\/\*.*?\*\/|[^{}]+\/\/.*?$|[^{}]+|(?-1))*+})

This should work for basic cases. But then there's still strings with '}', string's with escaped quotes and other things to worry about.

Here's a demo: https://regex101.com/r/fG4gO1/2

1 Comment

really cool. i make some changes. function([^{]+)({(?:[^{}]+\/\*.*?\*\/|[^{}]+\/\/.*?$|[^{}]+|(?-1))*+})
2

After a day of fiddling with it for my own project, here is a regex that will break up a js file to match all named functions and then break it up into function name, arguments, and body.

function\s+(?<functionName>\w+)\s*\((?<functionArguments>(?:[^()]+)*)?\s*\)\s*(?<functionBody>{(?:[^{}]+|(?-1))*+})

https://regex101.com/r/sXrHLI/1

1 Comment

Very nice! I tried an ES6 class, and it found the embedded functions (but not the methods of course). It even worked for the minified code.I do see that extra open/close parens in comments are an issue. I may personally choose to avoid regex parsing for funcs.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.