4

I've got the entire contents of a text file (at least a few KB) in string myStr.

Will the following code create a copy of the string (less the first character) in memory?

myStr = myStr[1:]

I'm hoping it just refers to a different location in the same internal buffer. If not, is there a more efficient way to do this?

Thanks!

Note: I'm using Python 2.5.

3
  • @Glenn: Thanks for the edit. I always forget to proof-read the title! Commented Mar 16, 2010 at 19:41
  • @Mike: Hah, I guess I'm over-optimizing. The files loaded could potentially be large (in theory) -- but currently the largest is 8KB :-) Commented Mar 16, 2010 at 22:44
  • 1
    A few KB is tiny, but if you're doing an algorithm like [s[0:n] for n in range(0, len(s))], you'll end up with O(n^2), where in-place slicing would give you O(n). You can always code around it, obviously; it's just extra work. Commented Mar 19, 2010 at 1:13

4 Answers 4

4

At least in 2.6, slices of strings are always new allocations; string_slice() calls PyString_FromStringAndSize(). It doesn't reuse memory--which is a little odd, since with invariant strings, it should be a relatively easy thing to do.

Short of the buffer API (which you probably don't want), there isn't a more efficient way to do this operation.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the info. I'm actually using Python 2.5 (I've updated my question) but I doubt it's done differently. I'll just have to live with the duplication, I guess (I really need to remove that one character).
Can't you just read the first character out of the file, and not assign it to the string to begin with? See my answer, coming momentarily. edit: see benson's answer instead.
3

As with most garbage collected languages, strings are created as often as needed, which is very often. The reason for this is because tracking substrings as described would make garbage collection more difficult.

What is the actual algorithm you are trying to implement. It might be possible to give you advice for ways to get better results if we knew a bit more about it.

As for an alternative, what is it you really need to do? Could you use a different way of looking at the issue, such as just keeping an integer index into the string? Could you use a array.array('u')?

1 Comment

I'm removing the BOM from a UTF-8 decoded file in memory, then sending the contents of this file into a templating engine (Jinja2), then writing the result to an HTML response. I just figured out a way that I'll only have to do this once per template file, though, so it's not really an issue anymore :-)
1

One (albeit slightly hacky) solution would be something like this:

f = open("test.c")
f.read(1)
myStr = f.read()
print myStr

It will skip the first character, and then read the data into your string variable.

7 Comments

Actually, that will read the first byte, not necessarily the first character. In a utf-8 encoded file only 128 US-ASCII characters are encoded in one byte.
So read the first line, convert to unicode, and then strip the first character. Proceed more or less as above, converting to unicode as you go along. If you don't convert, then you're dealing with bytes.
I would use this technique, but at the time I'm reading it from file I don't know whether the BOM should be kept or not. When I later retrieve the contents (from a DB), I get the entire file back at once. A version of your technique has actually already been presented to me in the answer to another (related) question I asked earlier: stackoverflow.com/questions/2456380/…
Always use a context manager when dealing with files, i.e. with open("test.c") as f:
@Mike: I would have, but he said he was using 2.5, and I didn't want to muck about with the from future import with_statement junk.
|
1

Depending on what you are doing, itertools.islice may be a suitable memory-efficient solution (should one become necessary).

2 Comments

Cool, I didn't know that module even existed!
Good find, then!—itertools is constantly useful.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.