1

I've searched around a bit, but it's hard to tell if this exact question has been answered before. I know you'll let me know if this is a duplicate.

I have a regular expression that matches a series of one or more positive integers preceded by a backslash. For example: \12345 would match, but \1234f or 12345 would not match.

The regex I'm using is ^\\(\d+)$

When I test the expression using various testers it works. For example, see: http://regex101.com/r/cY2bI1/1

However, when I implement it in the following c# code, I fail to get a match.

The implementation:

public string ParseRawUrlAsAssetNumber(string rawUrl) {
    var result = string.Empty;
    const string expression = @"^\\([0-9]+)$";
    var rx = new Regex(expression);
    var matches = rx.Matches(rawUrl);
    if (matches.Count > 0)
    {
        result = matches[0].Value;
    }
    return result;
}

The failing test (NUnit):

[Test]
public void ParseRawUrlAsAssetNumber_Normally_ParsesTheUrl() {
    var f = new Forwarder();
    var validRawUrl = @"\12345";
    var actualResult = f.ParseRawUrlAsAssetNumber(validRawUrl);
    var expectedResult = "12345";
    Assert.AreEqual(expectedResult, actualResult);
}

The test's output:

Expected string length 5 but was 6. Strings differ at index 0.
Expected: "12345"
But was:  "\\12345"
-----------^

Any ideas?

Resolution:

Thanks everyone for the input. In the end I took the following route based on your recommendations, and it is passing tests now.

public string ParseRawUrlAsAssetNumber(string rawUrl)
{
    var result = string.Empty;
    const string expression = @"^\\([0-9]+)$";
    var rx = new Regex(expression);
    var matches = rx.Matches(rawUrl);
    if (matches.Count > 0)
    {
        result = matches[0].Groups[1].Value;
    }
    return result;
}

3 Answers 3

7

The problem is this line:

var rx = new Regex(Regex.Escape(expression));

By escaping your expression you're turning all your special regex characters into literals. Calling Regex.Escape(@"^\\(\d+)$") will return "\^\\\\\(\\d\+\)\$", which will only match the literal string "^\\(\d+)$"

Try just this:

var rx = new Regex(expression);

See MSDN: Regex.Escape for a full explanation and examples of how this method is intended to be used.


Given your updated question, it seems like you also have an issue here:

result = matches[0].Value;

This will return the entire matched substring, not just the first capture group. For that you'll have to use:

result = matches[0].Groups[1].Value;
Sign up to request clarification or add additional context in comments.

6 Comments

Oops, that was an old version of the code. I'll make an edit and show the current version (the problem still exists).
See my updated answer. Also, while it's fine to edit the question to provide more information, please try not to edit it in a way that largely invalidates the answers. If you must, post the updated code under the original code and clearly mark it as an edit.
Ah, very good. I wasn't aware that matches[0] would return the full substring. Thanks.
Good point about editing the post in that way. I'll avoid that in the future.
@SergeyBerezovskiy Yes, I noticed the error after posting my initial update. Please check again.
|
3

Don't escape pattern. Also simply use Regex.Match thus you are going to have single match here. Use Match.Success to check if input matched your pattern. And return group value - digits are in group of your matched expression:

public string ParseRawUrlAsAssetNumber(string rawUrl)
{            
    const string pattern = @"^\\(\d+)$";

    var match = Regex.Match(rawUrl, pattern);
    if (!match.Success)
        return String.Empty;

    return match.Groups[1].Value;
}

2 Comments

+1 For the clean re-write, but the if is not necessary either. Even if it fails to match .Groups[1].Value will be an empty string.
@p.s.w.g thanks! Didn't know about this trick with empty group returned by default. Very cool. Whole method can be written as return Regex.Match(rawUrl, @"^\\(\d+)$").Groups[1].Value
1

What if you try get the group result instead?

match.Groups[1].Value

When I get to a real computer I'll test, but seems like it should work

1 Comment

Yes, it should be the solution stackoverflow.com/questions/6375873/… match.Groups[0] is always the same as match.Value, which is the entire match. match.Groups[1] is the first capturing group in your regular expression.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.