3

I have a Regex that is able to detect URLs (Disclosure: I copied this Regex from the internet).

My goal is to split a string, so that I get an array of substrings that either are a full URL or not.

For example.

const detectUrls = // some magical Regex
const input = 'Here is a URL: https://google.com <- That was the URL to Google.';

console.log(input.split(detectUrls)); // This should output ['Here is a URL: ', 'https://google.com', ' <- That was the URL to Google.']

My current Regex solution is as follows: /(([a-z]+:\/\/)?(([a-z0-9\-]+\.)+([a-z]{2}|aero|arpa|biz|com|coop|edu|gov|info|int|jobs|mil|museum|name|nato|net|org|pro|travel|local|internal))(:[0-9]{1,5})?(\/[a-z0-9_\-.~]+)*(\/([a-z0-9_\-.]*)(\?[a-z0-9+_\-.%=&amp;]*)?)?(#[a-zA-Z0-9!$&'()*+.=-_~:@/?]*)?)(\s+|$)/gi;

However, when I run the example code with my regex, I get a useless answer:

[ 'Here is a URL: ', 
  'https://google.com', 
  'https://', 
  'google.com', 
  'google.', 
  'com', 
  undefined, 
  undefined, 
  undefined, 
  undefined, 
  undefined, 
  undefined, 
  ' ', 
  '<- That was the URL to Google.',
]

Would anyone be able to point me in the right direction? Thanks in advance.

1
  • 1
    In regex (...) is called a capture group. Your result array has one item for each capture group. A solution would be named capture groups but browser support is probably bad (stackoverflow.com/questions/5367369/…). Instead of writing your own solution why not re-use an existing one? (google.com/…) Commented Feb 26, 2019 at 14:29

2 Answers 2

2

The reason why you are getting multiple matches is that the regex will return a match for each of your groups (the things inside parentheses).
For the result you want you should be using non capture groups (?:myRegex)
I modified your regex so that it should work:

/((?:[a-z]+:\/\/)?(?:(?:[a-z0-9\-]+\.)+(?:[a-z]{2}|aero|arpa|biz|com|coop|edu|gov|info|int|jobs|mil|museum|name|nato|net|org|pro|travel|local|internal))(?::[0-9]{1,5})?(?:\/[a-z0-9_\-.~]+)*(?:\/(?:[a-z0-9_\-.]*)(?:\?[a-z0-9+_\-.%=&amp;]*)?)?(?:#[a-zA-Z0-9!$&'(?:)*+.=-_~:@/?]*)?)(?:\s+|$)/

Tip: use an online website like https://regex101.com/ to test your regular expressions.
Also the answer for this question helped a bit:
Use of capture groups in String.split()

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for your answer and for linking to other resources that I can learn from. Your answer fixes my issue!
0

Try this:

var detectUrls = /(([a-z]+:\/\/)?(([a-z0-9\-]+\.)+([a-z]{2}|aero|arpa|biz|com|coop|edu|gov|info|int|jobs|mil|museum|name|nato|net|org|pro|travel|local|internal))(:[0-9]{1,5})?(\/[a-z0-9_\-.~]+)*(\/([a-z0-9_\-.]*)(\?[a-z0-9+_\-.%=&amp;]*)?)?(#[a-zA-Z0-9!$&'()*+.=-_~:@/?]*)?)(\s+|$)/gi;

var input = "Here is a URL: https://google.com";

alert(input.match(detectUrls));

Working Fiddle: https://jsfiddle.net/as2pbe3m/

1 Comment

It's just matching the pattern, OP wants an array with the string split using URL.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.