How to obtain index of subpattern in JavaScript regexp?

Question

I wrote a regular expression in JavaScript for searching searchedUrl in a string:

var input = '1234 url(  test  ) 5678';
var searchedUrl = 'test';

var regexpStr = "url\\(\\s*"+searchedUrl+"\\s*\\)"; 
var regex = new RegExp(regexpStr , 'i');

var match = input.match(regex);
console.log(match); // return an array

Output:

["url(            test  )", index: 5, input: "1234 url(            test  ) 5678"]

Now I would like to obtain position of the searchedUrl (in the example above it is the position of test in 1234 url( test ) 5678.

How can I do that?

Pebbl · Accepted Answer · 2014-05-23 15:10:23Z

3

As far as I could tell it wasn't possible to get the offset of a sub-match automatically, you have to do the calculation yourself using either lastIndex of the RegExp, or the index property of the match object returned by exec(). Depending on which you use you'll either have to add or subtract the length of groups leading up to your sub-match. However, this does mean you have to group the first or last part of the Regular Expression, up to the pattern you wish to locate.

lastIndex only seems to come into play when using the /g/ global flag, and it will record the index after the entire match. So if you wish to use lastIndex you'll need to work backwards from the end of your pattern.

For more information on the exec() method, see here:

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/exec

The following succinctly shows the solution in operation:

var str = '---hello123';
var r = /([a-z]+)([0-9]+)/;
var m = r.exec( str );
alert( m.index + m[1].length ); // will give the position of 123

update

This would apply to your issue using the following:

var input = '1234 url(  test  ) 5678';
var searchedUrl = 'test';
var regexpStr = "(url\\(\\s*)("+searchedUrl+")\\s*\\)";
var regex = new RegExp(regexpStr , 'i');
var match = regex.exec(input);

Then to get the submatch offset you can use:

match.index + match[1].length

match[1] now contains url( (plus two spaces) due to the bracket grouping which allows us to tell the internal offset.

update 2

Obviously things are a little more complicated if you have patterns in the RegExp, that you wish to group, before the actual pattern you want to locate. This is just a simple act of adding together each group length.

var s = '~- [This may or may not be random|it depends on your perspective] -~';
var r = /(\[)([a-z ]+)(\|)([a-z ]+)(\])/i;
var m = r.exec( s );

To get the offset position of it depends on your perspective you would use:

m.index + m[1].length + m[2].length + m[3].length;

Obviously if you know the RegExp has portions that never change length, you can replace those with hard coded numeric values. However, it's probably best to keep the above .length checks, just in case you — or someone else — ever changes what your expression matches.

edited May 23, 2014 at 15:10

answered Jun 11, 2013 at 8:55

Pebbl

36.1k6 gold badges66 silver badges65 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Pebbl Over a year ago

@m.buettner -- is that a joke? Or have you actually got no clue how rude that was. I'm guessing the former.

Martin Ender Over a year ago

Rude? I think editing others' answers if you see room for improvement is definitely something people are supposed to do on SO (see meta.stackexchange.com/questions/120576, especially the second answer). And imho way too few people actually use this feature. Your original post contained a very neat solution in my opinion, but the answer seemed a bit detached from the question, because your example had nothing to do with the OP's input - so I just changed the example while leaving your text and adding a little snippet of additional explanation, to make the answer more useful to the OP.

Pebbl Over a year ago

@m.buettner yes, adding to an answer to improve it is good. To irradicate the original is not. As you can see from my edit -- which I had originally planned as a second iteration -- the above is the correct way to do things. By all means if there was an issue in the code a small edit is fine, but it is far more polite to inform the original poster first, or let them make the change.

Martin Ender Over a year ago

eradicate is a bit of an exaggeration, seeing that I merely changed the input and pattern used (none of your explanation) and added a little note and link for the OP to read up on what's actually going on with the capturing. if you planned the update or not, the point is, that I didn't want to mess with you, but instead improve your answer to make sure it gets upvoted and accepted. I could as well have posted my own answer which would at the time have been more applicable to the question and contained more background info. but whatever, I'll remember to ask for your permission next time.

Michael Geary Over a year ago

@m.buettner - Forgive me for jumping in here. You're technically correct, but may not have considered the social aspect. People often edit their own answers several times shortly after posting them - I just made the seventh edit to mine. If you also jump in with an edit, it's likely to cause an edit conflict when they try to save their own edit. It would be thoughtful to post a comment first: "Hey, I have an idea for improving your answer, mind if I just edit it in or would you like me to explain it here?" This is less of a problem if you edit an older answer, of course.

|

Qtax · Accepted Answer · 2013-06-11 08:57:58Z

2

JS doesn't have a direct way to get the index of a subpattern/capturing group. But you can work around that with some tricks. For example:

var reStr = "(url\\(\\s*)" + searchedUrl + "\\s*\\)";
var re = new RegExp(reStr, 'i');

var m = re.exec(input);
if(m){
    var index = m.index + m[1].length;
    console.log("url found at " + index);
}

answered Jun 11, 2013 at 8:57

Qtax

34k9 gold badges92 silver badges127 bronze badges

Comments

Shlomi Lachmish · Accepted Answer · 2021-09-20 08:17:06Z

2

You can add the 'd' flag to the regex in order to generate indices for substring matches.

const input = '1234 url(  test  ) 5678';
const searchedUrl = 'test';

const regexpStr = "url\\(\\s*("+searchedUrl+")\\s*\\)"; 
const regex = new RegExp(regexpStr , 'id');

const match = regex.exec(input).indices[1]
console.log(match); // return [11, 15]

answered Sep 20, 2021 at 8:17

Shlomi Lachmish

6016 silver badges15 bronze badges

2 Comments

Mr Lister Over a year ago

Note that this is fairly new; in Firefox since v88 and Chrome v90.

Leo Orientis Over a year ago

Very good to know for parsing use cases!

Michael Geary · Accepted Answer · 2013-06-11 20:30:05Z

You don't need the index.

This is a case where providing just a bit more information would have gotten a much better answer. I can't fault you for it; we're encouraged to create simple test cases and cut out irrelevant detail.

But one important item was missing: what you plan to do with that index. In the meantime, we were all chasing the wrong problem. :-)

I had a feeling something was missing; that's why I asked you about it.

As you mentioned in the comment, you want to find the URL in the input string and highlight it in some way, perhaps by wrapping it in a <b></b> tag or the like:

'1234 url(  <b>test</b>  ) 5678'

(Let me know if you meant something else by "highlight".)

You can use character indexes to do that, however there is a much easier way using the regular expression itself.

Getting the index

But since you asked, if you did need the index, you could get it with code like this:

var input = '1234 url(  test  ) 5678';
var url = 'test';

var regexpStr = "^(.*url\\(\\s*)"+ url +"\\s*\\)"; 
var regex = new RegExp( regexpStr , 'i' );

var match = input.match( regex );
var start = match[1].length;

This is a bit simpler than the code in the other answers, but any of them would work equally well. This approach works by anchoring the regex to the beginning of the string with ^ and putting all the characters before the URL in a group with (). The length of that group string, match[1], is your index.

Slicing and dicing

Once you know the starting index of test in your string, you could use .slice() or other string methods to cut up the string and insert the tags, perhaps with code something like this:

// Wrap url in <b></b> tag by slicing and pasting strings
var output =
    input.slice( 0, start ) +
    '<b>' + url + '</b>' +
    input.slice( start + url.length );

console.log( output );

That will certainly work, but it is really doing things the hard way.

Also, I left out some error handling code. What if there is no matching URL? match will be undefined and the match[1] will fail. But instead of worrying about that, let's see how we can do it without any character indexing at all.

The easy way

Let the regular expression do the work for you. Here's the whole thing:

var input = '1234 url(  test  ) 5678';
var url = 'test';

var regexpStr = "(url\\(\\s*)(" + url + ")(\\s*\\))"; 
var regex = new RegExp( regexpStr , 'i' );

var output = input.replace( regex, "$1<b>$2</b>$3" );

console.log( output );

This code has three groups in the regular expression, one to capture the URL itself, with groups before and after the URL to capture the other matching text so we don't lose it. Then a simple .replace() and you're done!

You don't have to worry about any string lengths or indexes this way. And the code works cleanly if the URL isn't found: it returns the input string unchanged.

Collectives™ on Stack Overflow

How to obtain index of subpattern in JavaScript regexp?

4 Answers 4

update

update 2

7 Comments

Comments

2 Comments

You don't need the index.

Getting the index

Slicing and dicing

The easy way

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

update

update 2

7 Comments

Comments

2 Comments

You don't need the index.

Getting the index

Slicing and dicing

The easy way

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related