4

I wrote a regular expression in JavaScript for searching searchedUrl in a string:

var input = '1234 url(  test  ) 5678';
var searchedUrl = 'test';

var regexpStr = "url\\(\\s*"+searchedUrl+"\\s*\\)"; 
var regex = new RegExp(regexpStr , 'i');

var match = input.match(regex);
console.log(match); // return an array

Output:

["url(            test  )", index: 5, input: "1234 url(            test  ) 5678"]

Now I would like to obtain position of the searchedUrl (in the example above it is the position of test in 1234 url( test ) 5678.

How can I do that?

0

4 Answers 4

3

As far as I could tell it wasn't possible to get the offset of a sub-match automatically, you have to do the calculation yourself using either lastIndex of the RegExp, or the index property of the match object returned by exec(). Depending on which you use you'll either have to add or subtract the length of groups leading up to your sub-match. However, this does mean you have to group the first or last part of the Regular Expression, up to the pattern you wish to locate.

lastIndex only seems to come into play when using the /g/ global flag, and it will record the index after the entire match. So if you wish to use lastIndex you'll need to work backwards from the end of your pattern.

For more information on the exec() method, see here:

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/exec

The following succinctly shows the solution in operation:

var str = '---hello123';
var r = /([a-z]+)([0-9]+)/;
var m = r.exec( str );
alert( m.index + m[1].length ); // will give the position of 123

update

This would apply to your issue using the following:

var input = '1234 url(  test  ) 5678';
var searchedUrl = 'test';
var regexpStr = "(url\\(\\s*)("+searchedUrl+")\\s*\\)";
var regex = new RegExp(regexpStr , 'i');
var match = regex.exec(input);

Then to get the submatch offset you can use:

match.index + match[1].length

match[1] now contains url( (plus two spaces) due to the bracket grouping which allows us to tell the internal offset.

update 2

Obviously things are a little more complicated if you have patterns in the RegExp, that you wish to group, before the actual pattern you want to locate. This is just a simple act of adding together each group length.

var s = '~- [This may or may not be random|it depends on your perspective] -~';
var r = /(\[)([a-z ]+)(\|)([a-z ]+)(\])/i;
var m = r.exec( s );

To get the offset position of it depends on your perspective you would use:

m.index + m[1].length + m[2].length + m[3].length;

Obviously if you know the RegExp has portions that never change length, you can replace those with hard coded numeric values. However, it's probably best to keep the above .length checks, just in case you — or someone else — ever changes what your expression matches.

Sign up to request clarification or add additional context in comments.

7 Comments

@m.buettner -- is that a joke? Or have you actually got no clue how rude that was. I'm guessing the former.
Rude? I think editing others' answers if you see room for improvement is definitely something people are supposed to do on SO (see meta.stackexchange.com/questions/120576, especially the second answer). And imho way too few people actually use this feature. Your original post contained a very neat solution in my opinion, but the answer seemed a bit detached from the question, because your example had nothing to do with the OP's input - so I just changed the example while leaving your text and adding a little snippet of additional explanation, to make the answer more useful to the OP.
@m.buettner yes, adding to an answer to improve it is good. To irradicate the original is not. As you can see from my edit -- which I had originally planned as a second iteration -- the above is the correct way to do things. By all means if there was an issue in the code a small edit is fine, but it is far more polite to inform the original poster first, or let them make the change.
eradicate is a bit of an exaggeration, seeing that I merely changed the input and pattern used (none of your explanation) and added a little note and link for the OP to read up on what's actually going on with the capturing. if you planned the update or not, the point is, that I didn't want to mess with you, but instead improve your answer to make sure it gets upvoted and accepted. I could as well have posted my own answer which would at the time have been more applicable to the question and contained more background info. but whatever, I'll remember to ask for your permission next time.
@m.buettner - Forgive me for jumping in here. You're technically correct, but may not have considered the social aspect. People often edit their own answers several times shortly after posting them - I just made the seventh edit to mine. If you also jump in with an edit, it's likely to cause an edit conflict when they try to save their own edit. It would be thoughtful to post a comment first: "Hey, I have an idea for improving your answer, mind if I just edit it in or would you like me to explain it here?" This is less of a problem if you edit an older answer, of course.
|
2

JS doesn't have a direct way to get the index of a subpattern/capturing group. But you can work around that with some tricks. For example:

var reStr = "(url\\(\\s*)" + searchedUrl + "\\s*\\)";
var re = new RegExp(reStr, 'i');

var m = re.exec(input);
if(m){
    var index = m.index + m[1].length;
    console.log("url found at " + index);
}

Comments

2

You can add the 'd' flag to the regex in order to generate indices for substring matches.

const input = '1234 url(  test  ) 5678';
const searchedUrl = 'test';

const regexpStr = "url\\(\\s*("+searchedUrl+")\\s*\\)"; 
const regex = new RegExp(regexpStr , 'id');

const match = regex.exec(input).indices[1]
console.log(match); // return [11, 15] 

2 Comments

Note that this is fairly new; in Firefox since v88 and Chrome v90.
Very good to know for parsing use cases!
1

You don't need the index.

This is a case where providing just a bit more information would have gotten a much better answer. I can't fault you for it; we're encouraged to create simple test cases and cut out irrelevant detail.

But one important item was missing: what you plan to do with that index. In the meantime, we were all chasing the wrong problem. :-)

I had a feeling something was missing; that's why I asked you about it.

As you mentioned in the comment, you want to find the URL in the input string and highlight it in some way, perhaps by wrapping it in a <b></b> tag or the like:

'1234 url(  <b>test</b>  ) 5678'

(Let me know if you meant something else by "highlight".)

You can use character indexes to do that, however there is a much easier way using the regular expression itself.

Getting the index

But since you asked, if you did need the index, you could get it with code like this:

var input = '1234 url(  test  ) 5678';
var url = 'test';

var regexpStr = "^(.*url\\(\\s*)"+ url +"\\s*\\)"; 
var regex = new RegExp( regexpStr , 'i' );

var match = input.match( regex );
var start = match[1].length;

This is a bit simpler than the code in the other answers, but any of them would work equally well. This approach works by anchoring the regex to the beginning of the string with ^ and putting all the characters before the URL in a group with (). The length of that group string, match[1], is your index.

Slicing and dicing

Once you know the starting index of test in your string, you could use .slice() or other string methods to cut up the string and insert the tags, perhaps with code something like this:

// Wrap url in <b></b> tag by slicing and pasting strings
var output =
    input.slice( 0, start ) +
    '<b>' + url + '</b>' +
    input.slice( start + url.length );

console.log( output );

That will certainly work, but it is really doing things the hard way.

Also, I left out some error handling code. What if there is no matching URL? match will be undefined and the match[1] will fail. But instead of worrying about that, let's see how we can do it without any character indexing at all.

The easy way

Let the regular expression do the work for you. Here's the whole thing:

var input = '1234 url(  test  ) 5678';
var url = 'test';

var regexpStr = "(url\\(\\s*)(" + url + ")(\\s*\\))"; 
var regex = new RegExp( regexpStr , 'i' );

var output = input.replace( regex, "$1<b>$2</b>$3" );

console.log( output );

This code has three groups in the regular expression, one to capture the URL itself, with groups before and after the URL to capture the other matching text so we don't lose it. Then a simple .replace() and you're done!

You don't have to worry about any string lengths or indexes this way. And the code works cleanly if the URL isn't found: it returns the input string unchanged.

1 Comment

I'm just highlighting an URL in HTML/CSS file.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.