Writing a faster implementation for insert() in python

Question

Essentially, I need to write a faster implementation as a replacement for insert() to insert an element in a particular position in a list. The inputs are given in a list as [(index, value), (index, value), (index, value)]

For example: Doing this to insert 10,000 elements in a 1,000,000 element list takes about 2.7 seconds

def do_insertions_simple(l, insertions):
    """Performs the insertions specified into l.
    @param l: list in which to do the insertions.  Is is not modified.
    @param insertions: list of pairs (i, x), indicating that x should
        be inserted at position i.
    """
    r = list(l)
    for i, x in insertions:
        r.insert(i, x)
    return r

My assignment asks me to speed up the time taken to complete the insertions by 8x or more

My current implementation:

def do_insertions_fast(l, insertions):
    """Implement here a faster version of do_insertions_simple """
    #insert insertions[x][i] at l[i]
    result=list(l)
    for x,y in insertions:
      result = result[:x]+list(y)+result[x:]
    return result

Sample input:

import string
l = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
insertions = [(0, 'a'), (2, 'b'), (2, 'b'), (7, 'c')]
r1 = do_insertions_simple(l, insertions)
r2 = do_insertions_fast(l, insertions)
print("r1:", r1)
print("r2:", r2)
assert_equal(r1, r2)

is_correct = False
for _ in range(20):
    l, insertions = generate_testing_case(list_len=100, num_insertions=20)
    r1 = do_insertions_simple(l, insertions)
    r2 = do_insertions_fast(l, insertions)
    assert_equal(r1, r2)
    is_correct = True

The error I'm getting while running the above code:

r1: ['a', 0, 'b', 'b', 1, 2, 3, 'c', 4, 5, 6, 7, 8, 9]
r2: ['a', 0, 'b', 'b', 1, 2, 3, 'c', 4, 5, 6, 7, 8, 9]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-8-54e0c44a8801> in <module>()
     12     l, insertions = generate_testing_case(list_len=100, num_insertions=20)
     13     r1 = do_insertions_simple(l, insertions)
---> 14     r2 = do_insertions_fast(l, insertions)
     15     assert_equal(r1, r2)
     16     is_correct = True

<ipython-input-7-b421ee7cc58f> in do_insertions_fast(l, insertions)
      4     result=list(l)
      5     for x,y in insertions:
----> 6       result = result[:x]+list(y)+result[x:]
      7     return result
      8     #raise NotImplementedError()

TypeError: 'float' object is not iterable

The file is using the nose framework to check my answers, etc, so if there's any functions that you don't recognize, its probably from that framework.

I know that it is inserting the lists right, however it keeps raising the error "float object is not iterable"

I've also tried a different method which did work (sliced the lists, added the element, and added the rest of the list, and then updating the list) but that was 10 times slower than insert()

I'm not sure how to continue

edit: I've been looking at the entire question wrong, for now I'll try to do it myself but if I'm stuck again I'll ask a different question and link that here

list(y) tries to iterate y to create a list out of it. You want to just do [y] to turn y into a single-element list. — b_c
– b_c, Commented Jan 24, 2020 at 22:40
I'll tell you for certain right now that your method is not faster than insert :) — iz_
– iz_, Commented Jan 24, 2020 at 22:41
At the moment, all you are doing is creating a bulkier and more expensive version of insert. It's key to realize how insert works. In CPython at least, lists are contiguous data structures, meaning insert is an expensive operation. If you understand C, see github.com/python/cpython/blob/master/Objects/listobject.c#L286 insert calls a method list_resize that sometimes performs a very expensive reallocation. HTH. — iz_
– iz_, Commented Jan 24, 2020 at 22:52
If you want fast, arbitrary insertions, you need a different data structure (like a doubly-linked list). — chepner
– chepner, Commented Jan 24, 2020 at 23:04

Kelly Bundy · Accepted Answer · 2020-01-25 00:24:38Z

3

From your question, emphasis mine:

I need to write a faster implementation as a replacement for insert() to insert an element in a particular position in a list

You won't be able to. If there was a faster way, then the existing insert() function would already use it. Anything you do will not even get close to the speed.

What you can do is write a faster way to do multiple insertions.

Let's look at an example with two insertions:

>>> a = list(range(15))
>>> a.insert(5, 'X')
>>> a.insert(10, 'Y')
>>> a
[0, 1, 2, 3, 4, 'X', 5, 6, 7, 8, 'Y', 9, 10, 11, 12, 13, 14]

Since every insert shifts all values to the right of it, this in general is an O(m*(n+m)) time algorithm, where n is the original size of the list and m is the number of insertions.

Another way to do it is to build the result piece by piece, taking the insertion points into account:

>>> a = list(range(15))
>>> b = []
>>> b.extend(a[:5])
>>> b.append('X')
>>> b.extend(a[5:9])
>>> b.append('Y')
>>> b.extend(a[9:])
>>> b
[0, 1, 2, 3, 4, 'X', 5, 6, 7, 8, 'Y', 9, 10, 11, 12, 13, 14]

This is O(n+m) time, as all values are just copied once and there's no shifting. It's just somewhat tricky to determine the correct piece lengths, as earlier insertions affect later ones. Especially if the insertion indexes aren't sorted (and in that case it would also take O(m log m) additional time to sort them). That's why I had to use [5:9] and a[9:] instead of [5:10] and a[10:]

(Yes, I know, extend/append internally copy some more if the capacity is exhausted, but if you understand things enough to point that out, then you also understand that it doesn't matter :-)

edited Jan 25, 2020 at 0:24

answered Jan 24, 2020 at 23:22

Kelly Bundy

28k7 gold badges34 silver badges76 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

kaya3 Over a year ago

This is almost certainly the intended solution. However, if the insertion indices aren't sorted then it's going to take O(n + m log m) time because you need to sort them, but also sorting them is going to be tricky because insertions are not commutative, so you actually need to insert at different indices if you do the insertions in a different order.

Kelly Bundy Over a year ago

Right, I ignored the sorting time. I guess because I'm still hoping that they are sorted and they're just overlooking that in the problem specification. I'll mention it. And yes, the insertions affect each other, that's why my demo already slice to/from index 9, not 10. And it's what I meant with being tricky.

kaya3 Over a year ago

Indeed. I think your answer is good (so I upvoted), I just thought that part could be expanded on.

iz_ Over a year ago

Yes, this is the right approach. I transcribed your ideas into code here if you wanted to incorporate it into your answer.

Kelly Bundy Over a year ago

@kaya3 I expanded a bit on it now, also fixed the naive way's complexity (it's not O(nm) if m is much larger than n). Btw, I actually exploited that insertions affect each other in another case and I'm quite please with it :-)

|

kaya3 · Accepted Answer · 2020-01-25 01:00:36Z

One option is to use a different data structure, which supports faster insertions.

The obvious suggestion would be a binary tree of some sort. You can insert nodes into a balanced binary tree in O(log n) time, so long as you're able to find the right insertion point in O(log n) time. A solution to that is for each node to store and maintain its own subtree's cardinality; then you can find a node by index without iterating through the whole tree. Another possibility is a skip list, which supports insertion in O(log n) average time.

However, the problem is that you are writing in Python, so you have a major disadvantage trying to write something faster than the built-in list.insert method, because that's implemented in C, and Python code is a lot slower than C code. It's not unusual to write an O(log n) algorithm in Python that only beats the built-in O(n) implementation for very large n, and even n = 1,000,000 may not be large enough to win by a factor of 8 or more. This could mean a lot of wasted effort if you try implementing your own data structure and it turns out not to be fast enough.

I think the expected solution for this assignment will be something like Heap Overflow's answer. That said, there is another way to approach this question which is worth considering because it avoids the complications of working out the correct indices to insert at if you do the insertions out of order. My idea is to take advantage of the efficiency of list.insert but to call it on shorter lists.

If the data is still stored in Python lists, then the list.insert method can still be used to get the efficiency of a C implementation, but if the lists are shorter then the insert method will be faster. Since you only need to win by a constant factor, you can divide the input list into, say, 256 sublists of roughly equal size. Then for each insertion, insert it at the correct index in the correct sublist; and finally join the sublists back together again. The time complexity is O(nm), which is the same as the "naive" solution, but it has a lower constant factor.

To compute the correct insertion index we need to subtract the lengths of the sublists to the left of the one we're inserting in; we can store the cumulative sublist lengths in a prefix sum array, and update this array efficiently using numpy. Here's my implementation:

from itertools import islice, chain, accumulate
import numpy as np

def do_insertions_split(lst, insertions, num_sublists=256):
    n = len(lst)
    sublist_len = n // num_sublists
    lst_iter = iter(lst)
    sublists = [list(islice(lst_iter, sublist_len)) for i in range(num_sublists-1)]
    sublists.append(list(lst_iter))
    lens = [0]
    lens.extend(accumulate(len(s) for s in sublists))
    lens = np.array(lens)

    for idx, val in insertions:
        # could use binary search, but num_sublists is small
        j = np.argmax(lens >= idx)
        sublists[j-1].insert(idx - lens[j-1], val)
        lens[j:] += 1

    return list(chain.from_iterable(sublists))

It is not as fast as @iz_'s implementation (linked from the comments), but it beats the simple algorithm by a factor of almost 20, which is sufficient according to the problem statement. The times below were measured using timeit on a list of length 1,000,000 with 10,000 insertions.

simple -> 2.1252768037122087 seconds
iz -> 0.041302349785668824 seconds
split -> 0.10893724981304054 seconds

Note that my solution still loses to @iz_'s by a factor of about 2.5. However, @iz_'s solution requires the insertion points to be sorted, whereas mine works even when they are unsorted:

lst = list(range(1_000_000))
insertions = [(randint(0, len(lst)), "x") for _ in range(10_000)]

# uncomment if the insertion points should be sorted
# insertions.sort()

r1 = do_insertions_simple(lst, insertions)
r2 = do_insertions_iz(lst, insertions)
r3 = do_insertions_split(lst, insertions)

if r1 != r2: print('iz failed') # prints
if r1 != r3: print('split failed') # doesn't print

Here is my timing code, in case anyone else wants to compare. I tried a few different values for num_sublists; anything between 200 and 1,000 seemed to be about equally good.

from timeit import timeit

algorithms = {
    'simple': do_insertions_simple,
    'iz': do_insertions_iz,
    'split': do_insertions_split,
}
reps = 10

for name, func in algorithms.items():
    t = timeit(lambda: func(lst, insertions), number=reps) / reps
    print(name, '->', t, 'seconds')

Ok, that sounds like what I expected (and I guess you, too). If numpy and list.insert were equally fast, then about sqrt(n)=1000 should be best, but since shifting memory around is likely still faster, somewhat smaller numbers should be better.
Indeed, you caught me between edits, I was already going for it! I think sqrt(n) should be an overestimate also because we do argmax in the loop as well as += 1 on the array. But thinking about it, we could dynamically choose num_sublists to be some constant multiple of int(sqrt(n)), and then the time complexity is O(n + m sqrt(n)) which does beat the naive algorithm asymptotically.

Aaron · Accepted Answer · 2020-01-24 22:40:04Z

1

list(y) attempts to iterate over y and create a list of its elements. If y is an integer, it will not be iterable, and return the error you mentioned. You instead probably want to create a list literal containing y like so: [y]

answered Jan 24, 2020 at 22:40

Aaron

11.2k1 gold badge27 silver badges43 bronze badges

3 Comments

Enscivwy Over a year ago

Thank you! I changed list(y) to [y] and its not throwing any errors anymore, however, like iz_ said its not faster than the insert() function

Aaron Over a year ago

Without doing some major trickery, or mostly re-defining your problem, you're unlikely to get any faster than the built-in: insert. Any speed-up is going to come from taking a more efficient approach, not speeding up a slow one.

Enscivwy Over a year ago

So is there a more efficient approach that I can take to speed it up? (Like heap overflow pointed out, if the question didnt explain it properly, I need a faster way to insert multiple elements and not a single one)

Collectives™ on Stack Overflow

Writing a faster implementation for insert() in python

3 Answers 3

8 Comments

2 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

8 Comments

2 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related