Using perl:
$ export BASE_URL='https://www.sbs.com.au'
$ URL="$BASE_URL/ondemand/tv-series/la-unidad/season-1"
$ curl -s "$URL" | perl -lne '
BEGIN {
$/ = q(") # set perl's record separator, $/, to "
};
print if m=https:.*/la-unidad-s='
https://www.sbs.com.au/ondemand/tv-series/la-unidad/season-1/la-unidad-s1-ep1/1839026755987
https://www.sbs.com.au/ondemand/tv-series/la-unidad/season-1/la-unidad-s1-ep2/1839026755988
https://www.sbs.com.au/ondemand/tv-series/la-unidad/season-1/la-unidad-s1-ep3/1839026755989
https://www.sbs.com.au/ondemand/tv-series/la-unidad/season-1/la-unidad-s1-ep4/1839026755990
https://www.sbs.com.au/ondemand/tv-series/la-unidad/season-1/la-unidad-s1-ep5/1839026755992
https://www.sbs.com.au/ondemand/tv-series/la-unidad/season-1/la-unidad-s1-ep6/1839026755993
https://www.sbs.com.au/ondemand/tv-series/la-unidad/season-2/la-unidad-s2-ep1/2440941635730
https://www.sbs.com.au/ondemand/tv-series/la-unidad/season-2/la-unidad-s2-ep2/2440941635731
https://www.sbs.com.au/ondemand/tv-series/la-unidad/season-2/la-unidad-s2-ep3/2440941635732
https://www.sbs.com.au/ondemand/tv-series/la-unidad/season-2/la-unidad-s2-ep4/2440941635733
https://www.sbs.com.au/ondemand/tv-series/la-unidad/season-2/la-unidad-s2-ep5/2440941635819
https://www.sbs.com.au/ondemand/tv-series/la-unidad/season-2/la-unidad-s2-ep6/2440941635820
https://www.sbs.com.au/ondemand/tv-series/la-unidad/season-3/la-unidad-s3-ep1/2440941635824
https://www.sbs.com.au/ondemand/tv-series/la-unidad/season-3/la-unidad-s3-ep2/2440941635826
https://www.sbs.com.au/ondemand/tv-series/la-unidad/season-3/la-unidad-s3-ep3/2440941635827
https://www.sbs.com.au/ondemand/tv-series/la-unidad/season-3/la-unidad-s3-ep4/2440941635831
https://www.sbs.com.au/ondemand/tv-series/la-unidad/season-3/la-unidad-s3-ep5/2440941635834
https://www.sbs.com.au/ondemand/tv-series/la-unidad/season-3/la-unidad-s3-ep6/2440941635837
NOTE: As with anything that doesn't use a proper parser for HTML, XML, json, etc using a simple regexp to extract data is fragile and could break whenever SBS makes even minor changes to their web site.
Old answer based on 'lynx -dump' or 'html2':
(I haven't deleted this because it's generically useful for people trying to extract links from actual HTML rather than json code embedded in a javascript function)
Don't try to parse HTML with regexes alone, that is doomed to failure unless you're an expert with all things HTML, the HTML is extremely simple, and the web site never changes its format. And even then it will be fragile and prone to breaking. In short: just don't.
See also:
Parsing Html The Cthulhu Way and
Why it's not possible to use regex to parse HTML/XML: a formal explanation in layman's terms
You should use a language that has a HTML parsing library. Perl has several. Python does too. As do many other languages.
Alternatively, if you just want to extract a list of links, you could use the -dump option of text-mode web browsers like lynx or links.
e.g. first set up some variables for the URL:
$ BASE_URL='https://www.sbs.com.au'
$ URL="$BASE_URL/ondemand/tv-series/la-unidad/season-1"
Then fetch the URL and pipe the output into grep:
$ lynx -dump -listonly -nonumbers "$URL" |
grep '/la-unidad-s[0-9]'
https://www.sbs.com.au/ondemand/tv-series/la-unidad/season-1/la-unidad-s1-ep1/1839026755987
https://www.sbs.com.au/ondemand/tv-series/la-unidad/season-1/la-unidad-s1-ep2/1839026755988
https://www.sbs.com.au/ondemand/tv-series/la-unidad/season-1/la-unidad-s1-ep3/1839026755989
https://www.sbs.com.au/ondemand/tv-series/la-unidad/season-1/la-unidad-s1-ep4/1839026755990
https://www.sbs.com.au/ondemand/tv-series/la-unidad/season-1/la-unidad-s1-ep5/1839026755992
https://www.sbs.com.au/ondemand/tv-series/la-unidad/season-1/la-unidad-s1-ep6/1839026755993
Another option is to use html2 from the xml2 package (which doesn't seem to have a home page any more, but is packaged for Debian) to convert the html to a line-oriented format.
This is more complicated than using lynx, but you get full access to each individual HTML element, not just the links, in a line-oriented format suitable for processing with text processing tools like sed, grep, and awk. And perl and python too, without needing their HTML parser libs. For example:
$ curl -s "$URL" |
html2 2>/dev/null |
sed -ne '/@href=.*\/la-unidad-s[0-9]/ {s:^.*/a/@href=::;p}'
/ondemand/tv-series/la-unidad/season-1/la-unidad-s1-ep1/1839026755987
/ondemand/tv-series/la-unidad/season-1/la-unidad-s1-ep2/1839026755988
/ondemand/tv-series/la-unidad/season-1/la-unidad-s1-ep3/1839026755989
/ondemand/tv-series/la-unidad/season-1/la-unidad-s1-ep4/1839026755990
/ondemand/tv-series/la-unidad/season-1/la-unidad-s1-ep5/1839026755992
/ondemand/tv-series/la-unidad/season-1/la-unidad-s1-ep6/1839026755993
Note that, unlike lynx -dump, it doesn't prepend the base URL (https://www.sbs.com.au) to relative URLs in the HTML source, the URLs are printed exactly as they appear in the HTML. You can add that yourself with the previously defined $BASE_URL variable.
Or, if you export the BASE_URL variable so that it's in the environment and available to child processes (i.e. programs you run from your shell or shell script), you could do something like this using perl:
$ export BASE_URL='https://www.sbs.com.au'
$ URL="$BASE_URL/ondemand/tv-series/la-unidad/season-1"
$ curl -s "$URL" |
html2 2>/dev/null |
perl -lne '
if (m:/\@href=(.*/la-unidad-s[0-9].*):) {
print $ENV{BASE_URL} . $1;
}'
https://www.sbs.com.au/ondemand/tv-series/la-unidad/season-1/la-unidad-s1-ep1/1839026755987
https://www.sbs.com.au/ondemand/tv-series/la-unidad/season-1/la-unidad-s1-ep2/1839026755988
https://www.sbs.com.au/ondemand/tv-series/la-unidad/season-1/la-unidad-s1-ep3/1839026755989
https://www.sbs.com.au/ondemand/tv-series/la-unidad/season-1/la-unidad-s1-ep4/1839026755990
https://www.sbs.com.au/ondemand/tv-series/la-unidad/season-1/la-unidad-s1-ep5/1839026755992
https://www.sbs.com.au/ondemand/tv-series/la-unidad/season-1/la-unidad-s1-ep6/1839026755993
grep -oP "(?<=start_string).*?(?=end_string)"grep -oP "(?<=https://www.sbs.com.au/ondemand/tv-series/la-unidad/season-).*?(?=")"