I (think I) am quite experienced in Perl, still I have a nasty question I'm trying to solve. I have to match a string (whose format I cannot change coming out from a bioinformatic software) in this format:
[\+\-][0-9]+[ACGTacgt]+
Actually this would be easy, though the number of repeats of the pattern [ACGTacgt] is not quite 1 or more but the number defined by [0-9]+
so it can be
[...whatever...]+2ac[...whatever...]
+4acta
+3atg
etc..
Now to test if the regex work I'm just playing with a substitution and I tried the following way:
$mystring =~ s/[\+\-]([0-9]+)[ACGTacgt]{\1}//g
Unfortunately this guy above does not work and I get an error complaining about unescaped braces. Indeed if I define a proper number instead of \1 the thing works:
$mystring =~ s/[\+\-]([0-9]+)[ACGTacgt]{1}//g
I need it to work since the format might contain sequences like ac.,.+2caaa..a.c from which I have to get exactly the +2ca leaving separately from the rest.
Is it possible in one step, or there's a logical reason which I'm missing right now for which it's not possible?
Thanks for any help or suggestions!
berutti