Why does string concatenation matter in Python?

Question

I've seen a good number of posts (examples here and here) speaking about concatenation in Python, how best to do it ('+' vs ','), which is faster, etc. But I can't seem to find out why it matters. Roger Pate mentioned in the first example about passing multiple arguments vs one, but I'm still unclear.

So, why does concatenation matter? What is a use case where this would be critical?

, doesn't do concatenation in Python.

Ignacio Vazquez-Abrams
– Ignacio Vazquez-Abrams

2014-02-08 00:08:29 +00:00
Commented Feb 8, 2014 at 0:08 — Ignacio Vazquez-Abrams
– Ignacio Vazquez-Abrams, Commented Feb 8, 2014 at 0:08
^ this is where it is critical I guess...

OutFall
– OutFall

2014-02-08 00:10:49 +00:00
Commented Feb 8, 2014 at 0:10 — OutFall
– OutFall, Commented Feb 8, 2014 at 0:10

U2EF1 · Accepted Answer · 2014-02-08 00:41:19Z

6

Because in general +ing n strings will result in n-1 allocations of O(n) memory for the result. Concatenation via adjacency is done in the parser, and performs 1 allocation. Concatenation via for instance ''.join(iter(s)) will perform O(log(n)) allocations/copies of 2n total memory.

> a = ['a'] * 100000
> def concat(strings):
      c = ''
      for s in strings:
          c += s
      return c
> %timeit ''.join(a)       # precalculates necessary buffer size
1000 loops, best of 3: 1.07 ms per loop
> %timeit ''.join(iter(a)) # allocates exponentially larger buffers
1000 loops, best of 3: 1.94 ms per loop
> %timeit concat(a)        # allocates a new buffer n-1 times
100 loops, best of 3: 7.15 ms per loop

edited Feb 8, 2014 at 0:41

answered Feb 8, 2014 at 0:14

U2EF1

13.4k3 gold badges38 silver badges38 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

inspectorG4dget Over a year ago

I love this answer, especially for the timing information

Joran Beasley Over a year ago

great answer ... Im not sure the loop is identical to s=a+b+c... but awesome nonetheless (but i think that s=a+b+c... is essentially the same as ''.join(...))

U2EF1 Over a year ago

@JoranBeasley I know that the JVM will optimize my concat method to use a mutable buffer, I'm not sure whether or not python performs the same optimization but it's certainly possible.

Russell Borogove Over a year ago

I had always assumed join() would walk the list once to sum the lengths of the items being joined and do one allocation, then walk again to assemble into the newly allocated space.

U2EF1 Over a year ago

@RussellBorogove It seems you are correct! %timeit ''.join(a) -> 1.07 ms/loop, %timeit ''.join(iter(a)) -> 201 ms/loop

poke · Accepted Answer · 2014-02-08 00:27:08Z

Strings are immutable objects in Python, so you cannot modify existing strings. That means that every concatenation of a string results in a new string object being created and two (the source objects) being thrown away. Memory allocation is expensive enough to make this matter a lot.

So when you know you need to concatenate multiple strings, store them in a list. And then at the end, just once, join that list using ''.join(list_of_strings). That way, a new string will only be created once.

Note that this also applies in other languages. For example Java and C# both have a StringBuilder type which is essentially the same. As long as you keep appending new string parts, they will just internally add that to a string, and only when you convert the builder into a real string, the concatenation happens—and again just once.

Also note, that this memory allocation overhead already happens when you just append a few strings in a single line. For example a + b + c + d will create three intermediary strings. You can see that, if you look at the byte code for that expression:

>>> dis.dis('a + b + c + d')
  1           0 LOAD_NAME                0 (a) 
              3 LOAD_NAME                1 (b) 
              6 BINARY_ADD           
              7 LOAD_NAME                2 (c) 
             10 BINARY_ADD           
             11 LOAD_NAME                3 (d) 
             14 BINARY_ADD           
             15 RETURN_VALUE

Each BINARY_ADD concats the two previous values and creates a new object for the result on the stack. Note that for constant string literals, the compiler is smart enough to notice that you are adding constants:

>>> dis.dis('"foo" + "bar" + "baz"')
  1           0 LOAD_CONST               4 ('foobarbaz') 
              3 RETURN_VALUE

If you do have some variable parts within that though—for example if you want to produce a nicely formatted output—then you are back to creating intermediary string objects. In that case, using str.format is a good idea, e.g. 'foo {} bar {} baz'.format(a, b).

Collectives™ on Stack Overflow

Why does string concatenation matter in Python?

2 Answers 2

5 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related