How to extract a specific URL segment with Regex & C#

Question

I have that kind of URLs:

/domain.com/segment1/segment2/segment3/segment4/segment5/segment6/segment7/filename.ext

Sometimes

http://someother.com/segment1/segment2/segment3/segment4/segment5/segment6/segment7/filename.ext

I need to extract segment 6 specifically with C# and Regex. Regex is an absolute requirement as I might want to extract segment 3 in the future just by changing some configuration.

.Net has a Uri class designed specifically for parsing out URIs. Regular Expressions don't strike me as the right tool for this job. — user47589
– user47589, Commented Apr 8, 2019 at 18:06
"Regex is an absolute requirement as I might want to extract segment 3 in the future just by changing some configuration." I'd argue that you can do that with almost any extraction method (e.g. string.Split) — user1781290
– user1781290, Commented Apr 8, 2019 at 18:10
Or if you want a C# solution, if that's the case: dotnetfiddle.net/tSWDO6. Also note that you can put the regex in your configuration as per your need. — Rahul Sharma
– Rahul Sharma, Commented Apr 8, 2019 at 18:21
new Uri(...).LocalPath.Split('/')[6] is far more reliable than Regex. — Dour High Arch
– Dour High Arch, Commented Apr 8, 2019 at 18:22
My requirement was C# + Regex. Regex was an absolute requirement. Uri would work in a fixed context. I'm not in such context. — Metrics
– Metrics, Commented Apr 12, 2019 at 15:57

Pushpesh Kumar Rajwanshi · Accepted Answer · 2019-04-08 19:51:16Z

2

Although you should preferably go for URL related classes for parsing a URL as explained in another answer, as builtin functions are proven and well tested for handling even the corner cases, but as you mentioned you have some limitation and can only use a regex solution, you can try with following solution.

Finding sixth or Nth segment can be easily done using this regex,

(?:([^/]+)/){7}

which captures 6+1 (N+1 in general for Nth segment where +1 is for matching domain part of URL) segments and the group retains the last captured value which can be accessed using group1.

Here, ([^/]+) matches one or more any characters except a / and captures the content in group1 followed by / and whole of it matching exactly 7 times.

Regex Demo

C# code demo

var pattern = "(?:([^/]+)/){7}";
var match = Regex.Match("/domain.com/segment1/segment2/segment3/segment4/segment5/segment6/segment7/filename.ext", pattern);
Console.WriteLine("Segment: " + match.Groups[1].Value);
match = Regex.Match("http://someother.com/segment1/segment2/segment3/segment4/segment5/segment6/segment7/filename.ext", pattern);
Console.WriteLine("Segment: " + match.Groups[1].Value);

Prints the value of sixth segment,

Segment: segment6
Segment: segment6

edited Apr 8, 2019 at 19:51

answered Apr 8, 2019 at 19:33

Pushpesh Kumar Rajwanshi

18.4k2 gold badges22 silver badges39 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Metrics Over a year ago

Thanks. That was it. I couldn't use Uri class. That's why I specified that Regex was an absolute requirement.

boris · Accepted Answer · 2019-04-08 18:51:22Z

.NET has the class UriTemplate (Amy mentioned Uriin the comments already). There are a lot of aspects on matching URLs (like case-sensitivity, traling slash vs no-trailing-slash etc.), that can make the task of finding a suitable regular expression overly complex.

UriTemplate can deal with a lot of those things out-of-the-box. Maybe you can use that for a divide-and-conquer-like approach.

Uri baseUri = new Uri("http://someother.com");
UriTemplate template 
    = new UriTemplate("segment1}/{segment2}/{segment3}/{segment4}/{segment5}/{segment6}/{segment7}/{filename}");
Uri fullUri 
    = new Uri("http://someother.com/super1/kali2/fragi3/listig4/expi5/ali6/docious7/filename.ext");

UriTemplateMatch results = template.Match(baseUri, fullUri);

if(results.BoundVariables["segment6"]) {
    WriteLine(results.BoundVariables["segment6"]);
    // Output: "ali6"
    // further regex matching can take place here
}

Have a look at the .NET reference documentation for more.

Matt.G · Accepted Answer · 2019-04-08 18:23:23Z

1

Try Regex: (?<=\.com)(?:\/([^\/]+))+\/[^\/.]+?\.\w+

Regex Demo

C# Demo

answered Apr 8, 2019 at 18:23

Matt.G

3,6092 gold badges12 silver badges24 bronze badges

Collectives™ on Stack Overflow

How to extract a specific URL segment with Regex & C#

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related