0

I was reading about CPython’s implementation of list.pop(k) method and learned that it takes O(n-k) to remove the kth element of the list since it needs to move the posterior elements one position to the left. But couldn't it be faster when removing the first element if we simply shift the pointer of the list to the second element?

Heres the snippet of the current implementation:

    PyObject **items = self->ob_item;
    v = items[index];
    const Py_ssize_t size_after_pop = Py_SIZE(self) - 1;
    if (size_after_pop == 0) {
        Py_INCREF(v);
        list_clear(self);
        status = 0;
    }
    else {
        if ((size_after_pop - index) > 0) {
            memmove(&items[index], &items[index+1], (size_after_pop - index) * sizeof(PyObject *));
        }
        status = list_resize(self, size_after_pop);
    }

And here's what I have in mind:

    PyObject **items = self->ob_item;
    v = items[index];
    const Py_ssize_t size_after_pop = Py_SIZE(self) - 1;
    if (size_after_pop == 0) {
        Py_INCREF(v);
        list_clear(self);
        status = 0;
    }
    else {
        if ((size_after_pop - index) > 0) {
            if (index == 0) {
                self->ob_item = &items[1];    <------- here
            } else {
                memmove(&items[index], &items[index+1], (size_after_pop - index) * sizeof(PyObject *));
            }
        }
        status = list_resize(self, size_after_pop);
    }
3
  • 2
    You can do that, shift the pointer. But the problem is that then you loose the original pointer, which you need to pass to free. Commented Dec 29, 2024 at 11:30
  • 1
    Nothing's stopping you from writing your own data structure that keeps track of beginning and ending indexes like that. Commented Dec 29, 2024 at 14:41
  • @Shawn I'm not interested in implementing my own data structure, but to understand why the CPython implementation is the way it is Commented Dec 29, 2024 at 19:08

2 Answers 2

1

But what about removing the first element?

What about it?

So, why if given an implementation of a dynamic array in C we can’t simply shift the pointer of the first element to the second when removing the first?

We can, and such data structures are known. One of the most common variants is called a "circular buffer".

I can't speak specifically to why CPython uses a different approach, but the general tradeoffs involve code complexity vs. performance vs. memory use (vs. possibly other factors). The analysis of those usually involves judgement calls about the frequencies of various use cases, properties of different options with respect to those use cases, and long-term maintenance burden, among others. If there's a better answer in the CPython case than "it seemed like a good idea at the time", then it will be that Guido or whoever judged that a simple variation on dynamic arrays represented the best balance of performance, memory use, and maintainability.


Addendum in response to edited question

Your specific proposed variation is not feasible because it loses the pointer to start of the space allocated for the list elements. It is necessary to retain that pointer, somewhere, or at least to retain enough information to be able to recompute it. Otherwise,

  • the list could not be expanded beyond its current capacity leaking memory,

  • the list could not be garbage collected without leaking memory, and

  • the space previously used for the first element would become unusable. For example, it could not be leveraged to provide for an insert() at index 0.

The first two of those follow from the characteristics of C's dynamic memory allocation functions. realloc() and free() require as input the pointer to the beginning of the allocated region to operate on, as returned by a previous call to malloc(), calloc(), or realloc(). Only that pointer value will do, not, for example, a pointer into the middle of the allocated space.

The last follows because you don't know how much space there is for elements before the first.

Of course, you could retain the information needed to determine the beginning of the allocated region, but then you'd have changed to a different kind of data structure, such as the aforementioned circular buffer. With all the tradeoffs already described.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, I updated the question to make it more specific about the CPython implementation
@OsnielTeixeira, I have added some remarks directed to the proposed code variation you edited into the question. TL;DR: by itself, that variation has serious memory management problems, and adding enough extra data and code to fix that takes you into the realm of alternative data structures such as this answer already addressed.
0

We can. In fact, Perl does exactly this.

use Devel::Peek qw( Dump );

my @a = "a" .. "z";
Dump( @a, 0 );

for ( 1 .. 3 ) {
   shift @a;
   Dump( @a, 0 );
}

push @a, "!";
Dump( @a, 0 );
SV = PVAV(0x558c6cb91068) at 0x558c6cbbe1c0
  REFCNT = 1
  FLAGS = ()
  ARRAY = 0x558c6cbaff30             # Pointer to the buffer
  FILL = 25                          # Index to the last used element
  MAX = 25                           # Index to the last allocated element
  FLAGS = (REAL)
SV = PVAV(0x558c6cb91068) at 0x558c6cbbe1c0
  REFCNT = 1
  FLAGS = ()
  ARRAY = 0x558c6cbaff38 (offset=1)  # Pointer to the buffer (offset is calculated)
  ALLOC = 0x558c6cbaff30             # Actual start of buffer (existed but not shown earlier)
  FILL = 24                          # This had to be adjusted
  MAX = 24                           # This had to be adjusted   
  FLAGS = (REAL)
SV = PVAV(0x558c6cb91068) at 0x558c6cbbe1c0
  REFCNT = 1
  FLAGS = ()
  ARRAY = 0x558c6cbaff40 (offset=2)
  ALLOC = 0x558c6cbaff30
  FILL = 23
  MAX = 23
  FLAGS = (REAL)
SV = PVAV(0x558c6cb91068) at 0x558c6cbbe1c0
  REFCNT = 1
  FLAGS = ()
  ARRAY = 0x558c6cbaff48 (offset=3)
  ALLOC = 0x558c6cbaff30
  FILL = 22
  MAX = 22
  FLAGS = (REAL)
SV = PVAV(0x558c6cb91068) at 0x558c6cbbe1c0
  REFCNT = 1
  FLAGS = ()
  ARRAY = 0x558c6cbe75d0             # New buffer. All 20 remaining elements got copied.
  FILL = 23
  MAX = 48                           # We can no hold more. 
  FLAGS = (REAL)

2 Comments

The question was about the CPython implementation of list
You said CPython doesn't do it, and you asked if was something that could be done. The answer is yes, it's possible. I showed that it's not only possible, but it's being done elsewhere.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.