I have the following lines in a file. Here is an example of one line:
NM_???? chr12 - 10 110 10 110 3 10,50,100, 20,60,110,
I have the following code to get the info out:
fp = open(infile, 'r')
for line in fp:
tokens = line.split()
exonstarts = tokens[8][:-1].split(',')
exonends = tokens[9][:-1].split(',')
This will give me a list like these:
exonstarts = [10,50,100]
exonends = [20,60,110]
This has 3 exons (ALTHOUGH OTHER LINES IN THE FILE MAY HAVE MORE OR LESS THAN 3, so this must work for any number of exons), and they go from:
10-20
50-60
100-110
So for each number in the start list there is one in the finish list. Which means that the first codon start at exonstarts[0] and ends at exonends[0]. The second starts at exonstarts[1] and ends at exonends[1]. And so on.
How do I write the rest of this code so it pairs up the elements as such?
Update:
From this:
tokens = line.split()
exonstarts = tokens[8][:-1].split(',')
exonends = tokens[9][:-1].split(',')
zipped = list(zip(exonstarts, exonends))
I have another problem, I have a sting that I want these pieces of. So for example, I would want chr_string[10:20]+chr_string[50:60]+chr_string[100:110] Is there a way I could easily say this??