3

I have a string, for example:

$html = '<p>hello<a href="https://www.youtube.com/watch?v=7HknMcG2qYo">world</a></p><p>hello<a href="https://youtube.com/watch?v=37373o">world</a></p>';

And I want to search the string for the first URL that starts with youtube.com or youtu.be and store it in variable $first_found_youtube_url.

How can I do this efficiently?

I can do a preg_match or strpos looking for the urls but not sure which approach is more appropriate.

2
  • The appropriate thing to do, would probably be to use a parser and parse it as what it is, HTML ? Commented Dec 23, 2015 at 23:38
  • @adeneo Can you please elaborate? I don't understand your approach. English is not my first language and I don't pickup on technical terms that easily. Commented Dec 23, 2015 at 23:39

3 Answers 3

4

I wrote this function a while back, it uses regex and returns an array of unique urls. Since you want the first one, you can just use the first item in the array.

function getUrlsFromString($string) {
    $regex = '#\bhttps?://[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/))#i';
    preg_match_all($regex, $string, $matches);
    $matches = array_unique($matches[0]);           
    usort($matches, function($a, $b) {
        return strlen($b) - strlen($a);
    });
    return $matches;
}

Example:

$html = '<p>hello<a href="https://www.youtube.com/watch?v=7HknMcG2qYo">world</a></p><p>hello<a href="https://youtube.com/watch?v=37373o">world</a></p>';
$urls = getUrlsFromString($html);
$first_found_youtube = $urls[0];

With YouTube specific regex:

function getYoutubeUrlsFromString($string) {
    $regex = '#(https?:\/\/(?:www\.)?(?:youtube.com\/watch\?v=|youtu.be\/)([a-zA-Z0-9]*))#i';
    preg_match_all($regex, $string, $matches);
    $matches = array_unique($matches[0]);           
    usort($matches, function($a, $b) {
        return strlen($b) - strlen($a);
    });
    return $matches;
}

Example:

$html = '<p>hello<a href="https://www.youtube.com/watch?v=7HknMcG2qYo">world</a></p><p>hello<a href="https://youtube.com/watch?v=37373o">world</a></p>';
$urls = getYoutubeUrlsFromString($html);
$first_found_youtube = $urls[0];
Sign up to request clarification or add additional context in comments.

10 Comments

@HenrikPetterson try this https?:\/\/(?:www\.)?(?:youtube.com\/watch\?v=|youtu.be\/)(.*) - the only capture group contains the video ID
@TylerSebastian Please try the following to understand the problem: pastebin.com/6zS2fAus
Use (https?:\/\/(?:www\.)?(?:youtube.com\/watch\?v=|youtu.be\/)([a-zA-Z0-9]*)). regexr.com/3cfhn
Thanks @TylerSebastian - updated answer to include your regex with a rewritten function.
Make sure you make your regex in-case-sensitive (#i) in case you get capital links like HTTP://WWW.WEBSITE.NET.
|
1

you can parse the html with DOMDocument and look for youtube url's with stripos, something like this

$html = '<p>hello<a href="https://www.youtube.com/watch?v=7HknMcG2qYo">world</a></p><p>hello<a href="https://youtube.com/watch?v=37373o">world</a></p>';
$DOMD = @DOMDocument::loadHTML($html);

foreach($DOMD->getElementsByTagName("a") as $url)
{
    if (0 === stripos($url->getAttribute("href") , "https://www.youtube.com/") || 0 === stripos($url->getAttribute("href") , "https://www.youtu.be"))
    {
        $first_found_youtube_url = $url->getAttribute("href");
        break;
    }
}

personally, i would probably use

"youtube.com"===parse_url($url->getAttribute("href"),PHP_URL_HOST)

though, as it would get http AND https links.. which is probably what you want, though strictly speaking, not what you're asking for in top post right now..

Comments

0

I think this will do what you are looking for, I have used preg_match_all simply because I find it easier to debug the regexes.

<?php

$html = '<p>hello<a href="https://www.youtu.be/watch?v=7HknMcG2qYo">world</a></p><p>hello<a href="https://youtube.com/watch?v=37373o">world</a></p>';

$pattern = '/https?:\/\/(www\.)?youtu(\.be|\com)\/[a-zA-Z0-9\?=]*/i';
preg_match_all($pattern, $html, $matches);

// print_r($matches);
$first_found_youtube = $matches[0][0];
echo $first_found_youtube;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.