Shell script - remove all before and after

Question

Find the next link if the Link header contains rel=next.. Getting the link header can result in different strings.. I need to find the next link. e.g.

Link: <http://mygithub.com/api/v3/organizations/20/repos?page=1>; rel=prev, <http://mygithub.com/api/v3/organizations/20/repos?page=3>; rel=next, <http://mygithub.com/api/v3/organizations/20/repos?page=4>; rel=last, <http://mygithub.com/api/v3/organizations/20/repos?page=1>;

would be http://mygithub.com/api/v3/organizations/20/repos?page=3

Link: <http://mygithub.com/api/v3/organizations/4/repos?page=2>; rel="next", <http://mygithub.com/api/v3/organizations/4/repos?page=2>; rel="last"

would be http://mygithub.com/api/v3/organizations/4/repos?page=2

Played with sed and parameter expansion - not that experienced so got stuck :)

"Shell" meaning you need to be compatible with /bin/sh, or is this running in bash, ksh, zsh, or another extended shell? If you're in a shell with native regex support, you should consider using that. — Charles Duffy
– Charles Duffy, Commented Oct 30, 2020 at 17:48
See the answers using BASH_REMATCH in extract substring using regexp in plain bash. Using sed is generally best avoided when you're running it with only one line of input per invocation -- it takes a lot of time to start up each copy, even though it's quite fast once it's running. — Charles Duffy
– Charles Duffy, Commented Oct 30, 2020 at 17:49
@shellter thanks. One questions.. how can I assign the value to a variable in the shell script. e.g. I have the string with the links in a variable names nextReposLink echo $nextReposLink. - prints the string with mygithub links I want to save the result of the command in a new variable... $nextReposLink | awk '{for (i=0; i<=NF; i++){if ($i == "rel=next,"){print $(i-1);exit}}}' | sed -e 's/</ /' -e 's/>;/ /' Something like, but that gives me a "bad substitution" x="${echo $nextReposLink | awk '{for (i=0; i<=NF; i++){if ($i == \"rel=next,\"){print $(i-1);exit}}}'}" — klind
– klind, Commented Nov 2, 2020 at 23:58

shellter · Accepted Answer · 2020-11-03 03:43:49Z

Please be aware that parsing HTML with non-html tools it fraught with peril; you will see that this works, and assume you can get away with it always. You'll spend hours trying to get the next level of complexity to work, when you should be studying how to use html-aware tools. Don't say we didn't warn you (-;, but

printf "<http://mygithub.com/api/v3/organizations/20/repos?page=1>; rel=prev, <http://mygithub.com/api/v3/organizations/20/repos?page=3>; rel=next, <http://mygithub.com/api/v3/organizations/20/repos?page=4>; rel=last, <http://mygithub.com/api/v3/organizations/20/repos?page=1>;\n" \
| awk -F" " '{
    for(i=1;i<=NF;i++){
       if ($i == "rel=next,") {
         gsub(/[<>]/,"",$(i-1);sub(/;$/,"",$(i-1))
         print $(i-1)
       }
    }
}'

produces required output:

http://mygithub.com/api/v3/organizations/20/repos?page=3

To save the output of a script section into a variable, you wrap the code for command-substitution, in this case

 nextReposLink=$( printf .... | awk '....' )
 #-------------^^--------------------------^

The ^ pointed items are modern syntax for command-substitution. The code inside of $( ... ) is executed and the standard output is passed as a argument to the invoking command line. (The original syntax for command substitution is/was `cmds` and works the same in the simple case var=`cmds` . You can nest modern cmd-substitution easily, whereas the old version requires a lot of escape character fiddling. Avoid it if you can.

Note that about any s/str/rep/ that sed can do, awk can do the same, but requires the use of the sub(/regx/, "repl", "str") or gsub(sameArgs) functions. In this particular case, you may need to escape the <> like \<\>.

Be sure to always dbl-quote the use of variables, i.e. echo "$nextReposLink".

IHTH

Overcast · Accepted Answer · 2020-10-30 17:43:01Z

0

Well - I put one of your URL strings in a text file and was able to pull out the first URL with two cuts.

[root@oelinux2 ~]# cat test
Link: <http://mygithub.com/api/v3/organizations/20/repos?page=1>; rel=prev, <http://mygithub.com/api/v3/organizations/20/repos?page=3>; rel=next, <http://mygithub.com/api/v3/organizations/20/repos?page=4>; rel=last, <http://mygithub.com/api/v3/organizations/20/repos?page=1>;

Then with using cut:

cat test | cut -d "<" -f2 | cut -d ">" -f1


[root@oelinux2 ~]# cat test | cut -d "<" -f2 | cut -d ">" -f1
http://mygithub.com/api/v3/organizations/20/repos?page=1

That's one option - if you are just looking to get the first URL in the string. Basically - that's just grabbing what's between the two delimiters "<" and ">"

With Cut: -d is the 'delimiter' -f is the field you want to get.

If you wanted to get a later URL in that string, you could change the fields (-f #) and see what you get :)

answered Oct 30, 2020 at 17:43

Overcast

801 gold badge1 silver badge6 bronze badges

2 Comments

klind Over a year ago

the next link will not always be in the same spot. As you can see sometimes the prev comes first. It like I have to find the string 'rel="next"' and then go backwards from there finding the first > and then the < and take what is between.

Overcast Over a year ago

Oh ya.. see that - perhaps Charles Duffy in the reply to your OP using Regex might be best there.. Because cut and awk are pretty much dependent on using a positional field. I'm sure you could accomplish it with the right regex statement - but I am no real regex pro..

Collectives™ on Stack Overflow

Shell script - remove all before and after

2 Answers 2

Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related