1

I have a file with strings like the following:

NM_???? chr12 - 10 110 10 110 3 10,50,100, 20,60,110,

I am interested in the last two columns, the first being a comma-separeted list of exonstarts and the last being a comma-separated list of exonends.

That said, I have done the following:

fp = open(infile, 'r')
for line in fp:
   tokens = line.split()
   exonstarts = tokens[8][:-1].split(',')
   exonends = tokens[9][:-1].split(',')
   zipped = list(zip(exonstarts, exonends))

now that I have a list that looks like this:

[(10, 20), (50, 60), (100, 110)]

I have another problem, I have a sting that I want these pieces of. So for example, I would want chr_string[10:20]+chr_string[50:60]+chr_string[100:110] Is there a way I could easily say this??

3
  • Do you want [10:20] or [10:21]. The stop index on a slice is non-inclusive. Commented Apr 28, 2012 at 0:43
  • you are correct, I would want [10:21] Commented Apr 28, 2012 at 0:44
  • possible duplicate of Quick basic loop (python) Commented Apr 28, 2012 at 0:56

3 Answers 3

4

I think the most Pythonic way to say that is:

''.join(chr_string[a[0]:a[1]] for a in myList)
Sign up to request clarification or add additional context in comments.

6 Comments

Right, but since I will not know how many elements are in the list beforehand. And your line of code only allows for two elements, is there a way to say this?
@PatrickCampbell Ken's list will account for each element in the list. It would not make sense to have more than two elements in each tuple.
This is nicer: ''.join(chr_string[start:end] for start, end in myList)
a[0] and a[1] are the start and end values for the characters you want out of chr_string. myList would be the list containing those tuples of values (like [(10, 20), (50, 60), (100, 110)]) and can be arbitrarily long.
Thanks, that's awesome. Helpful as always!
|
2
"".join(chr_string[slice(*exon_interval)] for exon_interval in zipped)

7 Comments

Joel, you have helped me with this so much over the past week. I have one more question for you and then this program should be fully functional. Along with the precious question you helped me answer. I need to subtract each exonend from the length of the chromosome to get the newstart and then subtract the exonstart from the length of the chromosome to get the new end. Yet I need to do this for each element yet again, like above. Is there a way to do this??
this then involves the dictionary and the line of code where I said: ''.join(bc[base.upper()] for base in chr_string[newstart:newend])
@PatrickCampbell: can you show me an example? preferably under 20 characters long.
@PatrickCampbell: This sounds like yet another new question. Please read tinyurl.com/so-hints for some hint on getting the answer you are looking for first time!
Umm... this had to go with the reverse compliment thing I was doing in a previous question. So I want to subtract each of the exonends from my string (this will give me a position I would call newstart) and then I would subtract the each exonstart from the string to get the position I would call newend. I want the same thing here to go from newstart:newend. For each of the numbers adding them together. So it is the same type of thing as above, but with different positions on the string. I just need to do it in a similar way, but can't come up with something easy
|
1

To get a list by slicing chr_string (which I have fabricated) using these pairs:

>>> [chr_string[start:end + 1] for start,end in zip(exonstarts, exonends)]
['05060708091', '25262728293', '50515253545']

To join these together:

>>> ''.join(chr_string[start:end + 1] for start,end in zip(exonstarts, exonends))
'050607080912526272829350515253545'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.