0

I need to fetch particular function and its body as a text from the javascript file and print that function as an output using C#. I need to give function name and js file as an input parameter. I tried using regex but couldnt achieved the desired result. Here is the code of regex.

public void getFunction(string jstext, string functionname)
{
    Regex regex = new Regex(@"function\s+" + functionname + @"\s*\(.*\)\s*\{");
    Match match = regex.Match(jstext);
}

Is there any other way I can do this?

11
  • I don't think you'll find javascript to be regular enough. I can't think of a way to do this without keeping track of the number of opened and closed brackets, but then you'll also need to take into account if an unmatched } appears within a string or a javascript regex. Unless you're willing to make heaps of assumptions about the function, you should probably look into existing javascript parsers for .net. Commented Sep 21, 2016 at 9:07
  • define: couldnt achieved the desired result. Commented Sep 21, 2016 at 9:07
  • Do you need to support functions like var functionName = function(x) { ... }? What about window['function' + name] = function(x) { ... }? Commented Sep 21, 2016 at 9:09
  • not to mention (function(x) { ... })("foo") - ie, IIFE Commented Sep 21, 2016 at 9:11
  • 1
    ok then - var myFunc = (function(x){ return function() { ... })("foo"); is a named function myFunc but you wont find it with anything like the regex above;) Commented Sep 21, 2016 at 9:13

1 Answer 1

2

This answer is based on the assumption which you provide in comments, that the C# function needs only to find function declarations, and not any form of function expressions.

As I point out in comments, javascript is too complex to be efficiently expressed in a regular expression. The only way to know you've reached the end of the function is when the brackets all match up, and given that, you still need to take escape characters, comments, and strings into account.

The only way I can think of to achieve this, is to actually iterate through every single character, from the start of your function body, until the brackets match up, and keep track of anything odd that comes along.

Such a solution is never going to be very pretty. I've pieced together an example of how it might work, but knowing how javascript is riddled with little quirks and pitfalls, I am convinced there are many corner cases not considered here. I'm also sure it could be made a bit tidier.

From my first experiments, the following should handle escape characters, multi- and single line comments, strings that are delimited by ", ' or `, and regular expressions (i.e. delimited by /).

This should get you pretty far, although I'm intrigued to see what exceptions people can come up with in comments:

private static string GetFunction(string jstext, string functionname) {

    var start = Regex.Match(jstext, @"function\s+" + functionname + @"\s*\([^)]*\)\s*{");

    if(!start.Success) {
        throw new Exception("Function not found: " + functionname);     
    }

    StringBuilder sb = new StringBuilder(start.Value);
    jstext = jstext.Substring(start.Index + start.Value.Length);
    var brackets = 1;
    var i = 0;

    var delimiters = "`/'\"";
    string currentDelimiter = null;

    var isEscape = false;
    var isComment = false;
    var isMultilineComment = false;

    while(brackets > 0 && i < jstext.Length) {
        var c = jstext[i].ToString();
        var wasEscape = isEscape;

        if(isComment || !isEscape)
        {
            if(c == @"\") {
                // Found escape symbol.
                isEscape = true;
            } else if(i > 0 && !isComment && (c == "*" || c == "/") && jstext[i-1] == '/') {
                // Found start of a comment block
                isComment = true;
                isMultilineComment = c == "*";
            } else if(c == "\n" && isComment && !isMultilineComment) {
                // Found termination of singline line comment
                isComment = false;
            } else if(isMultilineComment && c == "/" && jstext[i-1] == '*') {
                // Found termination of multiline comment
                isComment = false;
                isMultilineComment = false;
            } else if(delimiters.Contains(c)) {
                // Found a string or regex delimiter
                currentDelimiter = (currentDelimiter == c) ? null : currentDelimiter ?? c;
            }

            // The current symbol doesn't appear to be commented out, escaped or in a string
            // If it is a bracket, we should treat it as one
            if(currentDelimiter == null && !isComment) {
                if(c == "{") {
                    brackets++;
                }
                if(c == "}") {
                    brackets--;
                }
            }

        }

        sb.Append(c);
        i++;

        if(wasEscape) isEscape = false;
    }


    return sb.ToString();
}

Demo

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the code, been trying to get some solution to parse js file and get all entries, but your message is the only solution I have found in whole web, weird, but Thanks again.
This is pretty much what I was looking for: had to parse a Javascript text file and remove all functions, retain property values only (don't ask)...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.