1

I have that kind of URLs:

/domain.com/segment1/segment2/segment3/segment4/segment5/segment6/segment7/filename.ext

Sometimes

http://someother.com/segment1/segment2/segment3/segment4/segment5/segment6/segment7/filename.ext

I need to extract segment 6 specifically with C# and Regex. Regex is an absolute requirement as I might want to extract segment 3 in the future just by changing some configuration.

5
  • 3
    .Net has a Uri class designed specifically for parsing out URIs. Regular Expressions don't strike me as the right tool for this job. Commented Apr 8, 2019 at 18:06
  • 1
    "Regex is an absolute requirement as I might want to extract segment 3 in the future just by changing some configuration." I'd argue that you can do that with almost any extraction method (e.g. string.Split) Commented Apr 8, 2019 at 18:10
  • Or if you want a C# solution, if that's the case: dotnetfiddle.net/tSWDO6. Also note that you can put the regex in your configuration as per your need. Commented Apr 8, 2019 at 18:21
  • 2
    new Uri(...).LocalPath.Split('/')[6] is far more reliable than Regex. Commented Apr 8, 2019 at 18:22
  • My requirement was C# + Regex. Regex was an absolute requirement. Uri would work in a fixed context. I'm not in such context. Commented Apr 12, 2019 at 15:57

3 Answers 3

2

Although you should preferably go for URL related classes for parsing a URL as explained in another answer, as builtin functions are proven and well tested for handling even the corner cases, but as you mentioned you have some limitation and can only use a regex solution, you can try with following solution.

Finding sixth or Nth segment can be easily done using this regex,

(?:([^/]+)/){7}

which captures 6+1 (N+1 in general for Nth segment where +1 is for matching domain part of URL) segments and the group retains the last captured value which can be accessed using group1.

Here, ([^/]+) matches one or more any characters except a / and captures the content in group1 followed by / and whole of it matching exactly 7 times.

Regex Demo

C# code demo

var pattern = "(?:([^/]+)/){7}";
var match = Regex.Match("/domain.com/segment1/segment2/segment3/segment4/segment5/segment6/segment7/filename.ext", pattern);
Console.WriteLine("Segment: " + match.Groups[1].Value);
match = Regex.Match("http://someother.com/segment1/segment2/segment3/segment4/segment5/segment6/segment7/filename.ext", pattern);
Console.WriteLine("Segment: " + match.Groups[1].Value);

Prints the value of sixth segment,

Segment: segment6
Segment: segment6
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks. That was it. I couldn't use Uri class. That's why I specified that Regex was an absolute requirement.
2

.NET has the class UriTemplate (Amy mentioned Uriin the comments already). There are a lot of aspects on matching URLs (like case-sensitivity, traling slash vs no-trailing-slash etc.), that can make the task of finding a suitable regular expression overly complex.

UriTemplate can deal with a lot of those things out-of-the-box. Maybe you can use that for a divide-and-conquer-like approach.

Uri baseUri = new Uri("http://someother.com");
UriTemplate template 
    = new UriTemplate("segment1}/{segment2}/{segment3}/{segment4}/{segment5}/{segment6}/{segment7}/{filename}");
Uri fullUri 
    = new Uri("http://someother.com/super1/kali2/fragi3/listig4/expi5/ali6/docious7/filename.ext");

UriTemplateMatch results = template.Match(baseUri, fullUri);

if(results.BoundVariables["segment6"]) {
    WriteLine(results.BoundVariables["segment6"]);
    // Output: "ali6"
    // further regex matching can take place here
}

Have a look at the .NET reference documentation for more.

Comments

1

Try Regex: (?<=\.com)(?:\/([^\/]+))+\/[^\/.]+?\.\w+

Regex Demo

C# Demo

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.