1

I am trying to find substring of very large string and getting memory error:

The code:

def substr(string):
    le = []
    st = list(string)
    for s in xrange(len(string)+1):
        for s1 in xrange(len(string)+1):
            le.append(''.join(st[s:s1]))

    cou = Counter(le)
    cou_val = cou.keys()
    cou_val.remove('')
    return le, cou_val

I am getting error as ile "solution.py", line 31, in substr le.append(''.join(st[s:s1])) MemoryError

How to tackle this problem?

1

3 Answers 3

1

Answer

I noticed that your code prints all the possible substrings of string in a certain order. I suggest that instead of storing all of them in an array, you use code to return just the substring that you want. I tested the subroutine below with 'a very long string' and it always returns the same value as if you were to get an indexed value from an array.

    string = 'a very long string'
    def substr2(string,i):
        return string[i//(len(string)+1):i%(len(string)+1)]

    print(substr2(string,10))

Solution

The way you order the arguments for your for loops (s,s1) work similarly to a number system. s1 increments by 1 until it gets to a given value, then it resets to 0 and s increments by 1, repeating the cycle. This is seen in a decimal system (e.g. 01,02,03,04,05,06,07,08,09,10,11,12,13,14,15,16 etc.)

The i//n div operator returns the integer value of i/n. (e.g. 14//10=1). The i%n mod operator returns the remainder value of i/n. (e.g. 14%10 is 4).

So if we were to, for example, increment i by 1 and define (s,s1) as [i//10,i%10], we would get:

[0,0],[0,1],[0,2],[0,3],[0,4],[0,5],[0,6],[0,7],[0,8],[0,9],[1,0],[1,1],[1,2] etc.

We can utilize these to produce the same answer as in your array.

PS. My first answer. Hope this helps!

Sign up to request clarification or add additional context in comments.

1 Comment

+1: Nice idea and nice answer :). You can also write the above as a generator by looping in substr2 and yeald instead of return.
0

It seems that you are running out of memory. When the string is too large the code you posted seems to be copying it over and over again into the le list. As @Rikka's link suggests, buffer/memoryview may be of use for you but I have never used it.

As a workaround to your solution/code I would suggest that instead of storing each substring in le, store the indexes as a tuple. Additionally, I don't think that st list is required (not sure tho if your way speeds it up) so the result would be (code not tested):

def substr(string):
    le = []

    for s in xrange(len(string)+1):
        for s1 in xrange(len(string)+1):
            # Skip empty strings
            if s!=s1:
                le.append((s, s1))

    cou = Counter(le)
    cou_val = cou.keys()
    cou_val.remove('')
    return le, cou_val

Now, an example of how you can use the substr is (code not tested):

myString = "very long string here"
matchString = "here"
matchPos = False
indexes, count = substr(myString)

# Get all the substrings without storing them simultaneously in memory
for i in indexes:
    # construct substring and compare
    if myString[i[0],i[1]]==matchString:
        matchPos = i
        break

After the above you have start and end positions of the 1st occurrence of "here" into your large string. I am not sure what you try to achieve but this can easily be modified to find all occurrences, count matches, etc - I just post it as example. I am also not sure why the Counter is there...

This approach should not present the memory error, however, it is a trade-off between memory and CPU and I expect it to be bit slower on runtime since every time you use indexes you have to re-construct every substring.

Hope it helps

Comments

-1

The solution: The error in memory is always caused by out of range.And the slice technique also has some rules. When the step is positive, just like 1, the first index must be greater than the second.And on the contrary, when negative, such as -1, the number of the index is shorter than the second, but it is actually the greater one.(-1 > -2) So in your program, the index s is greater than s1 when step is one, so you access a place you have not applied for it.And you know, that is MemoryError!!!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.