1

I'm beginner in bash scripts, but I try to delete everything in my pdb file (test.pdb) before REMARK time 20.00 and from REMARK time 30.00 to the end of this file. I'm using to this sed command, which is such as:

sed 's/^.*\(REMARK time 20.00.*REMARK time 30.00\).*$/\1/' test.pdb > end.pdb

Unfortunately, above command generate end.pdb file, which is totally empty. Can you help me with solve this problem ?

My test.pdb file is as follows:

REMARK time   10.00                                       ENERGY     1.95686E+03
HELIX    1 H1  GLU    131  SER    146  1                                      15
HELIX    2 H2  GLU    278  SER    293  1                                      15
HELIX    3 H3  GLU    426  SER    441  1                                      15
HELIX    4 H4  GLU    574  SER    589  1                                      15
SHEET    1 B1  2 ALA    32  VAL    34  0
SHEET    2 B1  2 LYS    48  GLU    50 -1  N  GLU    50   O  ALA    32
SHEET    1 B2  2 ALA    32  LEU    35  0
SHEET    2 B2  2 TRP   123  ASP   126 -1  N  ASP   126   O  ALA    32
SHEET    1 B3  2 LEU    47  PRO    51  0
SHEET    2 B3  2 VAL    62  PHE    66 -1  N  PHE    66   O  LEU    47
SHEET    1 B4  2 ASN    58  ASN    68  0
SHEET    2 B4  2 ASP    71  LEU    81 -1  N  LEU    81   O  ASN    58
SHEET    1 B5  2 TRP    77  ASP    82  0
SHEET    2 B5  2 LYS    85  LEU    90 -1  N  LEU    90   O  TRP    77
SHEET    1 B6  2 ALA    84  LEU    90  0
SHEET    2 B6  2 LEU    93  VAL    99 -1  N  VAL    99   O  ALA    84
SHEET    1 B7  2 ASN    96  VAL    99  0
SHEET    2 B7  2 ASP   111  GLU   114 -1  N  GLU   114   O  ASN    96
SHEET    1 B8  2 HIS   107  VAL   113  0
SHEET    2 B8  2 ASP   119  LEU   125 -1  N  LEU   125   O  HIS   107
SHEET    1 B9  2 ALA   179  VAL   181  0
SHEET    2 B9  2 LYS   195  GLU   197 -1  N  GLU   197   O  ALA   179
SHEET    1 B*  2 ALA   179  LEU   182  0
........
ENDMDL
REMARK time   20.00                                       ENERGY     1.96641E+03
HELIX    1 H1  GLU    131  SER    146  1                                      15
HELIX    2 H2  GLU    278  SER    293  1                                      15
HELIX    3 H3  GLU    426  SER    441  1                                      15
HELIX    4 H4  GLU    574  SER    589  1                                      15
SHEET    1 B1  2 ALA    32  VAL    34  0
SHEET    2 B1  2 LYS    48  GLU    50 -1  N  GLU    50   O  ALA    32
SHEET    1 B2  2 ALA    32  LEU    35  0
SHEET    2 B2  2 TRP   123  ASP   126 -1  N  ASP   126   O  ALA    32
SHEET    1 B3  2 LEU    47  PRO    51  0
SHEET    2 B3  2 VAL    62  PHE    66 -1  N  PHE    66   O  LEU    47
SHEET    1 B4  2 ASN    58  ASN    68  0
SHEET    2 B4  2 ASP    71  LEU    81 -1  N  LEU    81   O  ASN    58
SHEET    1 B5  2 TRP    77  ASP    82  0
SHEET    2 B5  2 LYS    85  LEU    90 -1  N  LEU    90   O  TRP    77
SHEET    1 B6  2 ALA    84  LEU    90  0
SHEET    2 B6  2 LEU    93  VAL    99 -1  N  VAL    99   O  ALA    84
SHEET    1 B7  2 ASN    96  VAL    99  0
SHEET    2 B7  2 ASP   111  GLU   114 -1  N  GLU   114   O  ASN    96
SHEET    1 B8  2 HIS   107  VAL   113  0
SHEET    2 B8  2 ASP   119  LEU   125 -1  N  LEU   125   O  HIS   107
SHEET    1 B9  2 ALA   179  VAL   181  0
SHEET    2 B9  2 LYS   195  GLU   197 -1  N  GLU   197   O  ALA   179
SHEET    1 B*  2 ALA   179  LEU   182  0
SHEET    2 B*  2 TRP   270  ASP   273 -1  N  ASP   273   O  ALA   179
SHEET    1 B*  2 LEU   194  GLY   196  0
SHEET    2 B*  2 MET   211  PHE   213 -1  N  PHE   213   O  LEU   194
SHEET    1 B*  2 ASN   205  ASN   215  0
SHEET    2 B*  2 ASP   218  LEU   228 -1  N  LEU   228   O  ASN   205
SHEET    1 B*  2 TRP   224  ASP   229  0
....................
ENDMDL
REMARK time  100.00                                       ENERGY     1.95395E+03
HELIX    1 H1  GLU    131  SER    146  1                                      15
HELIX    2 H2  GLU    278  SER    293  1                                      15
HELIX    3 H3  GLU    426  SER    441  1                                      15
HELIX    4 H4  GLU    574  SER    589  1                                      15
SHEET    1 B1  2 ALA    32  VAL    34  0
SHEET    2 B1  2 LYS    48  GLU    50 -1  N  GLU    50   O  ALA    32
SHEET    1 B2  2 ALA    32  LEU    35  0
SHEET    2 B2  2 TRP   123  ASP   126 -1  N  ASP   126   O  ALA    32
SHEET    1 B3  2 LEU    47  PRO    51  0
SHEET    2 B3  2 VAL    62  PHE    66 -1  N  PHE    66   O  LEU    47
SHEET    1 B4  2 ASN    58  ASN    68  0
SHEET    2 B4  2 ASP    71  LEU    81 -1  N  LEU    81   O  ASN    58
SHEET    1 B5  2 TRP    77  ASP    82  0
SHEET    2 B5  2 LYS    85  LEU    90 -1  N  LEU    90   O  TRP    77
SHEET    1 B6  2 ALA    84  LEU    90  0
.......

CONECT   1131   1133   1132
CONECT   1133   1135   1134
CONECT   1135   1137   1136
CONECT   1137   1139   1138
CONECT   1139   1141   1140
CONECT   1141   1143   1142
CONECT   1143   1145   1144
CONECT   1145   1146
ENDMDL

Finally I want pull out information about only REMARK time 20.00 such as:

REMARK time   20.00                                       ENERGY     1.96641E+03
HELIX    1 H1  GLU    131  SER    146  1                                      15
HELIX    2 H2  GLU    278  SER    293  1                                      15
HELIX    3 H3  GLU    426  SER    441  1                                      15
HELIX    4 H4  GLU    574  SER    589  1                                      15
SHEET    1 B1  2 ALA    32  VAL    34  0
SHEET    2 B1  2 LYS    48  GLU    50 -1  N  GLU    50   O  ALA    32
SHEET    1 B2  2 ALA    32  LEU    35  0
SHEET    2 B2  2 TRP   123  ASP   126 -1  N  ASP   126   O  ALA    32
SHEET    1 B3  2 LEU    47  PRO    51  0
SHEET    2 B3  2 VAL    62  PHE    66 -1  N  PHE    66   O  LEU    47
SHEET    1 B4  2 ASN    58  ASN    68  0
SHEET    2 B4  2 ASP    71  LEU    81 -1  N  LEU    81   O  ASN    58
SHEET    1 B5  2 TRP    77  ASP    82  0
SHEET    2 B5  2 LYS    85  LEU    90 -1  N  LEU    90   O  TRP    77
SHEET    1 B6  2 ALA    84  LEU    90  0
SHEET    2 B6  2 LEU    93  VAL    99 -1  N  VAL    99   O  ALA    84
SHEET    1 B7  2 ASN    96  VAL    99  0
SHEET    2 B7  2 ASP   111  GLU   114 -1  N  GLU   114   O  ASN    96
SHEET    1 B8  2 HIS   107  VAL   113  0
SHEET    2 B8  2 ASP   119  LEU   125 -1  N  LEU   125   O  HIS   107
SHEET    1 B9  2 ALA   179  VAL   181  0
SHEET    2 B9  2 LYS   195  GLU   197 -1  N  GLU   197   O  ALA   179
SHEET    1 B*  2 ALA   179  LEU   182  0
SHEET    2 B*  2 TRP   270  ASP   273 -1  N  ASP   273   O  ALA   179
SHEET    1 B*  2 LEU   194  GLY   196  0
SHEET    2 B*  2 MET   211  PHE   213 -1  N  PHE   213   O  LEU   194
SHEET    1 B*  2 ASN   205  ASN   215  0
SHEET    2 B*  2 ASP   218  LEU   228 -1  N  LEU   228   O  ASN   205
SHEET    1 B*  2 TRP   224  ASP   229  0
SHEET    2 B*  2 LYS   232  LEU   237 -1  N  LEU   237   O  TRP   224
......
CONECT   1141   1143   1142
CONECT   1143   1145   1144
CONECT   1145   1146
ENDMDL
2
  • sed loads in pattern space only one input line at a time. So without saving it to the hold space, you'll never have the entire input to match against. Commented Aug 17, 2022 at 8:51
  • By the way, while this question is 100% on topic and welcome here, you might also be interested in our sister site: Bioinformatics. Commented Aug 17, 2022 at 10:55

2 Answers 2

1

The command you use is for editing inside a line. To extract a range of lines, you can use range addressing /start_pattern/,/end pattern/:

sed -n '/REMARK time *20\.00/,/ENDMDL/p' test.pdb

So, you suppress default output with option -n, then print everything from REMARK time *20.00 to the next ENDMDL (this seems to be what you want instead of including the next REMARK time – address ranges are not greedy, so you can do it this way).

If your REMARK lines use tabs instead of spaces, you may use /REMARK time[[:space:]]*20\.00/. Finally, you need to escape the . with \. because otherwise . would match any character.

2
  • Thank you for your help ! It works ! Commented Aug 17, 2022 at 9:25
  • @skywalker If one of the answers here solved your issue, please take a moment and accept it by clicking on the checkmark on the left. That is the best way to express your thanks on the Stack Exchange sites. Commented Aug 17, 2022 at 9:36
0

Philippos already gave you a fine sed answer, but here's a different approach. You can use awk and set the input and output record separator variables (RS and ORS respectively; this is normally a \n character, it is what defines a "line", and what is appended to every print call) to ENDMDL. That way, the entire block of lines between each ENDMDL will be treated as a single "line", so you can then tell awk to print any line containing the pattern you are looking for:

awk 'BEGIN{ RS=ORS="ENDMDL\n"}/REMARK\s*time\s*20\.00/' foo.pdb 

Running it on your file gives:

$ awk 'BEGIN{ RS=ORS="ENDMDL\n"}/REMARK\s*time\s*20\.00/' foo.pdb 
REMARK time   20.00                                       ENERGY     1.96641E+03
HELIX    1 H1  GLU    131  SER    146  1                                      15
HELIX    2 H2  GLU    278  SER    293  1                                      15
HELIX    3 H3  GLU    426  SER    441  1                                      15
HELIX    4 H4  GLU    574  SER    589  1                                      15
SHEET    1 B1  2 ALA    32  VAL    34  0
SHEET    2 B1  2 LYS    48  GLU    50 -1  N  GLU    50   O  ALA    32
SHEET    1 B2  2 ALA    32  LEU    35  0
SHEET    2 B2  2 TRP   123  ASP   126 -1  N  ASP   126   O  ALA    32
SHEET    1 B3  2 LEU    47  PRO    51  0
SHEET    2 B3  2 VAL    62  PHE    66 -1  N  PHE    66   O  LEU    47
SHEET    1 B4  2 ASN    58  ASN    68  0
SHEET    2 B4  2 ASP    71  LEU    81 -1  N  LEU    81   O  ASN    58
SHEET    1 B5  2 TRP    77  ASP    82  0
SHEET    2 B5  2 LYS    85  LEU    90 -1  N  LEU    90   O  TRP    77
SHEET    1 B6  2 ALA    84  LEU    90  0
SHEET    2 B6  2 LEU    93  VAL    99 -1  N  VAL    99   O  ALA    84
SHEET    1 B7  2 ASN    96  VAL    99  0
SHEET    2 B7  2 ASP   111  GLU   114 -1  N  GLU   114   O  ASN    96
SHEET    1 B8  2 HIS   107  VAL   113  0
SHEET    2 B8  2 ASP   119  LEU   125 -1  N  LEU   125   O  HIS   107
SHEET    1 B9  2 ALA   179  VAL   181  0
SHEET    2 B9  2 LYS   195  GLU   197 -1  N  GLU   197   O  ALA   179
SHEET    1 B*  2 ALA   179  LEU   182  0
SHEET    2 B*  2 TRP   270  ASP   273 -1  N  ASP   273   O  ALA   179
SHEET    1 B*  2 LEU   194  GLY   196  0
SHEET    2 B*  2 MET   211  PHE   213 -1  N  PHE   213   O  LEU   194
SHEET    1 B*  2 ASN   205  ASN   215  0
SHEET    2 B*  2 ASP   218  LEU   228 -1  N  LEU   228   O  ASN   205
SHEET    1 B*  2 TRP   224  ASP   229  0
....................
ENDMDL

Note that some awk versions might not like the \s, so you could try this instead:

awk 'BEGIN{ RS=ORS="ENDMDL\n"} $1=="REMARK" && $2=="time" && $3=="20.00"' foo.pdb 

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.