0

I have a text file in the following format:

AAAAATTTTTT
AAATTTTTTGGG
TTTDDDCCVVVVV

I am trying to calculate the number of occurrences of a character in sequence at start and end of the line.

I have written the following function:

def getStartEnd(sequence):
    start = sequence[0]
    end = sequence[-1]
    startCount = 0
    endCount = 0

    for char in sequence:
        if char == start:
            startCount += 1
            if ( char != start):
                break

    for char in reversed(sequence):
        if char == end:
            endCount += 1
            if ( char != end):
                break

    return startCount, endCount

This function works independently on strings. For e.g.:

seq = "TTTDDDCCVVVVV"
a,b = getStartEnd(seq)
print a,b

But when I insert in a for loop, it gives the correct value only on the last line of the file.

file = open("Test.txt", 'r')

for line in file:
    a,b = getStartEnd(str(line))
    print a, b

2 Answers 2

3

Because lines except the last line, contains newlines.

Try following (strip trailing spaces):

with open("Test.txt", 'r') as f:
    for line in f:
        a, b = getStartEnd(line.rstrip())
        print a, b

BTW, ( char != end ) in the following code is always False. (same for the ( char != start))

for char in reversed(sequence):
    if char == end:
        endCount += 1
        if ( char != end): # always False because char == end
            break

Do you mean this?

for char in reversed(sequence):
    if char == end:
        endCount += 1
    else:
        break

How about using itertools.takewhile:

import itertools

def getStartEnd(sequence):
    start = sequence[0]
    end = sequence[-1]
    start_count = sum(1 for _ in itertools.takewhile(lambda ch: ch == start, sequence))
    end_count = sum(1 for _ in itertools.takewhile(lambda ch: ch == end, reversed(sequence)))
    return start_count, end_count
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you very much. How about my code ? Is it an elegant way ?
I want to count the first char in sequence. For eg: AAAATTAGGAGGG, the starting letter A is occuring 4 times, and end letter G occuring 3 time in continuous order.
1

Three things. First, in your function, you probably meant to break using the following structure.

for char in sequence:
    if char == start:
        startCount += 1
    else:
        break

for char in reversed(sequence):
    if char == end:
        endCount += 1
    else:
        break

Second, when you are looping through the lines in your file, you don't need to convert the lines to strings with the str function. They already are strings!

Third, the lines include newline characters which are like this: '\n' They are used to tell the computer when to end a line and start a new one. To get rid of them, you can use the rstrip method of string as follows:

file = open("Test.txt", 'r')

for line in file:
    a,b = getStartEnd(line.rstrip())
    print a, b
file.close()

1 Comment

Fourth, use with statement.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.