1

I've seen a good number of posts (examples here and here) speaking about concatenation in Python, how best to do it ('+' vs ','), which is faster, etc. But I can't seem to find out why it matters. Roger Pate mentioned in the first example about passing multiple arguments vs one, but I'm still unclear.

So, why does concatenation matter? What is a use case where this would be critical?

2
  • 4
    , doesn't do concatenation in Python. Commented Feb 8, 2014 at 0:08
  • 3
    ^ this is where it is critical I guess... Commented Feb 8, 2014 at 0:10

2 Answers 2

6

Because in general +ing n strings will result in n-1 allocations of O(n) memory for the result. Concatenation via adjacency is done in the parser, and performs 1 allocation. Concatenation via for instance ''.join(iter(s)) will perform O(log(n)) allocations/copies of 2n total memory.

> a = ['a'] * 100000
> def concat(strings):
      c = ''
      for s in strings:
          c += s
      return c
> %timeit ''.join(a)       # precalculates necessary buffer size
1000 loops, best of 3: 1.07 ms per loop
> %timeit ''.join(iter(a)) # allocates exponentially larger buffers
1000 loops, best of 3: 1.94 ms per loop
> %timeit concat(a)        # allocates a new buffer n-1 times
100 loops, best of 3: 7.15 ms per loop
Sign up to request clarification or add additional context in comments.

5 Comments

I love this answer, especially for the timing information
great answer ... Im not sure the loop is identical to s=a+b+c... but awesome nonetheless (but i think that s=a+b+c... is essentially the same as ''.join(...))
@JoranBeasley I know that the JVM will optimize my concat method to use a mutable buffer, I'm not sure whether or not python performs the same optimization but it's certainly possible.
I had always assumed join() would walk the list once to sum the lengths of the items being joined and do one allocation, then walk again to assemble into the newly allocated space.
@RussellBorogove It seems you are correct! %timeit ''.join(a) -> 1.07 ms/loop, %timeit ''.join(iter(a)) -> 201 ms/loop
3

Strings are immutable objects in Python, so you cannot modify existing strings. That means that every concatenation of a string results in a new string object being created and two (the source objects) being thrown away. Memory allocation is expensive enough to make this matter a lot.

So when you know you need to concatenate multiple strings, store them in a list. And then at the end, just once, join that list using ''.join(list_of_strings). That way, a new string will only be created once.

Note that this also applies in other languages. For example Java and C# both have a StringBuilder type which is essentially the same. As long as you keep appending new string parts, they will just internally add that to a string, and only when you convert the builder into a real string, the concatenation happens—and again just once.

Also note, that this memory allocation overhead already happens when you just append a few strings in a single line. For example a + b + c + d will create three intermediary strings. You can see that, if you look at the byte code for that expression:

>>> dis.dis('a + b + c + d')
  1           0 LOAD_NAME                0 (a) 
              3 LOAD_NAME                1 (b) 
              6 BINARY_ADD           
              7 LOAD_NAME                2 (c) 
             10 BINARY_ADD           
             11 LOAD_NAME                3 (d) 
             14 BINARY_ADD           
             15 RETURN_VALUE         

Each BINARY_ADD concats the two previous values and creates a new object for the result on the stack. Note that for constant string literals, the compiler is smart enough to notice that you are adding constants:

>>> dis.dis('"foo" + "bar" + "baz"')
  1           0 LOAD_CONST               4 ('foobarbaz') 
              3 RETURN_VALUE         

If you do have some variable parts within that though—for example if you want to produce a nicely formatted output—then you are back to creating intermediary string objects. In that case, using str.format is a good idea, e.g. 'foo {} bar {} baz'.format(a, b).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.