-1

I'm analysing the complexity of string concatination for different methods and what I notice.

%%time
sent = ""
word = "hello "
num = 100000
for _ in range(num):
    sent += word

this method takes 20ms

%%time
sent_1 = "".join([word] * num)

the second method takes 4 ms and

%%time
sent_2 = ""
for _ in range(num//5):
    sent_2 = sent_2 + word + word + word + word + word

the third method takes 208 ms

sent==sent_1==sent_2 gives me True

can someone explain why I get such results

5
  • 2
    "can someone explain why I get such results" What needs explaining? What do you think the results should be like instead, and why? I notice you wrote "time complexity" in the title; did you try using different values of num, to assess the big-O complexity of each approach? Commented Mar 6, 2023 at 20:01
  • Repeated string concatenation (using the + symbol) in Python is very slow. Using .join() is much faster. It's that simple. Commented Mar 6, 2023 at 20:06
  • Strictly speaking, str values are immutable. But, CPython has an optimization that detects that sent + word is being assigned back to sent, and so updates sent in-place anyway. Commented Mar 6, 2023 at 20:07
  • @KarlKnechtel I thought that for _ in range(num): sent += word should give the same time as for _ in range(num//5): sent_2 = sent_2 + word + word + word + word + word Commented Mar 6, 2023 at 20:09
  • Ah, then the question is basically about the special-case optimization of += for strings. This is a common question, but I'm struggling to find a proper canonical. But see e.g. stackoverflow.com/questions/2791931. Commented Mar 6, 2023 at 23:10

1 Answer 1

1

sent2 + word + word + word + word + word creates multiple temporary str objects, roughly equivalent to

t1 = sent2 + word
t2 = t1 + word
t3 = t2 + word
t4 = t3 + word
sent_2= t4 + word

Creating all these objects (and copying the ever-growing values of one temporary value into another) results in a quadratic runtime.

''.join([word] * num), on the other hand, gets all the strings to concatenate at once in a single list, so it can allocate one result string and copy the substring in place in linear time.

The first example,

for _ in range(num):
    sent += word

should be equivalent to repeated concatenation. But, CPython in particular optimizes this by recognizing that nobody can access the value of sent while sent + word is being computed, so it "cheats" and updates sent in place, with no new string being allocated to assign back to sent. This means you still have a linear-time operation, like ''.join, but with extra overhead due to iterating over the string in Python rather than at the C level of the interpreter.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.