Pulling parts from a string (python)

Question

I have a file with strings like the following:

NM_???? chr12 - 10 110 10 110 3 10,50,100, 20,60,110,

I am interested in the last two columns, the first being a comma-separeted list of exonstarts and the last being a comma-separated list of exonends.

That said, I have done the following:

fp = open(infile, 'r')
for line in fp:
   tokens = line.split()
   exonstarts = tokens[8][:-1].split(',')
   exonends = tokens[9][:-1].split(',')
   zipped = list(zip(exonstarts, exonends))

now that I have a list that looks like this:

[(10, 20), (50, 60), (100, 110)]

I have another problem, I have a sting that I want these pieces of. So for example, I would want chr_string[10:20]+chr_string[50:60]+chr_string[100:110] Is there a way I could easily say this??

Do you want [10:20] or [10:21]. The stop index on a slice is non-inclusive. — Joel Cornett
– Joel Cornett, Commented Apr 28, 2012 at 0:43

Ken · Accepted Answer · 2012-04-28 00:42:07Z

4

I think the most Pythonic way to say that is:

''.join(chr_string[a[0]:a[1]] for a in myList)

answered Apr 28, 2012 at 0:42

Ken

1,8881 gold badge12 silver badges10 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Peter Hanson Over a year ago

Right, but since I will not know how many elements are in the list beforehand. And your line of code only allows for two elements, is there a way to say this?

garnertb Over a year ago

@PatrickCampbell Ken's list will account for each element in the list. It would not make sense to have more than two elements in each tuple.

Gary Kerr Over a year ago

This is nicer: ''.join(chr_string[start:end] for start, end in myList)

Ken Over a year ago

a[0] and a[1] are the start and end values for the characters you want out of chr_string. myList would be the list containing those tuples of values (like [(10, 20), (50, 60), (100, 110)]) and can be arbitrarily long.

Peter Hanson Over a year ago

Thanks, that's awesome. Helpful as always!

|

Joel Cornett · Accepted Answer · 2012-04-28 00:47:44Z

2

"".join(chr_string[slice(*exon_interval)] for exon_interval in zipped)

answered Apr 28, 2012 at 0:47

Joel Cornett

24.8k9 gold badges69 silver badges90 bronze badges

7 Comments

Peter Hanson Over a year ago

Joel, you have helped me with this so much over the past week. I have one more question for you and then this program should be fully functional. Along with the precious question you helped me answer. I need to subtract each exonend from the length of the chromosome to get the newstart and then subtract the exonstart from the length of the chromosome to get the new end. Yet I need to do this for each element yet again, like above. Is there a way to do this??

Peter Hanson Over a year ago

this then involves the dictionary and the line of code where I said: ''.join(bc[base.upper()] for base in chr_string[newstart:newend])

Joel Cornett Over a year ago

@PatrickCampbell: can you show me an example? preferably under 20 characters long.

johnsyweb Over a year ago

@PatrickCampbell: This sounds like yet another new question. Please read tinyurl.com/so-hints for some hint on getting the answer you are looking for first time!

Peter Hanson Over a year ago

Umm... this had to go with the reverse compliment thing I was doing in a previous question. So I want to subtract each of the exonends from my string (this will give me a position I would call newstart) and then I would subtract the each exonstart from the string to get the position I would call newend. I want the same thing here to go from newstart:newend. For each of the numbers adding them together. So it is the same type of thing as above, but with different positions on the string. I just need to do it in a similar way, but can't come up with something easy

|

johnsyweb · Accepted Answer · 2012-04-28 01:00:15Z

1

To get a list by slicing chr_string (which I have fabricated) using these pairs:

>>> [chr_string[start:end + 1] for start,end in zip(exonstarts, exonends)]
['05060708091', '25262728293', '50515253545']

To join these together:

>>> ''.join(chr_string[start:end + 1] for start,end in zip(exonstarts, exonends))
'050607080912526272829350515253545'

answered Apr 28, 2012 at 1:00

johnsyweb

143k26 gold badges197 silver badges253 bronze badges

Collectives™ on Stack Overflow

Pulling parts from a string (python)

3 Answers 3

6 Comments

7 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

6 Comments

7 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related