14

Assumption is that we have a dictionary containing exactly one key/value pair. Objective is to extract the only key.

I can think of four ways to do this (there may be more).

import timeit


def func1(d):
    """
    Convert to a list and return the first element
    """
    return list(d)[0]


def func2(d):
    """
    Unpack
    """
    rv, *_ = d
    return rv


def func3(d):
    """
    Classic efficient approach
    """
    return next(iter(d))


def func4(d):
    """
    Appears to be faster than the classic approach
    """
    for key in d:
        return key


if __name__ == "__main__":
    d = {"foo": 0}
    for func in (func1, func2, func3, func4):
        assert func1(d) == func(d)
        duration = timeit.timeit(lambda: func(d), number=5_000_000)
        print(func.__name__, f"{duration=:.4f}s")

Output:

func1 duration=0.6322s
func2 duration=0.8306s
func3 duration=0.5505s
func4 duration=0.5040s

I have always understood that the implementation in func3() is optimal - and that's what I would normally use.

However, it appears that func4() is more efficient. This may be related to Python version (3.13.5 in this case).

Why would func4() perform better than func3()? Are there optimisations in modern Python versions that would affect this?

5
  • 4
    I am not surprised, func4 is the only one without any allocation/object creation Commented Jul 27 at 9:33
  • 1
    I can reproduce with 3.12,3 indeed. Basically, it is a bit the same solution any way (what a for loop does is exactly iter then next, and then, well nothing since it has returned. Not it is not surprising that both are more efficient than building a list or a tuple. But it is interesting that for is faster than iter/next. Probably because of the slight overhead of having to perform some python function calls, when for does that directly without calls (edit: and, yes, indeed — just seeing bruno's comment — also the object holding iterator itself) Commented Jul 27 at 9:34
  • There is also the cost of looking up the symbols iter and next. Since these are global objects, it is not free. For me, it is not obvious which has a greater impact. Commented Jul 27 at 10:18
  • 2
  • @bruno For dicts in 3.13.5? How does for key in d iterate without creating an iterator object? Commented Jul 28 at 20:40

3 Answers 3

13

Indeed

I've been using this optimization for years, for example I optimized/simplified more_itertools.first from

def first(iterable, default=_marker):
    try:
        return next(iter(iterable))
    except StopIteration as e:
        (empty case handling here)

to:

def first(iterable, default=_marker):
    for item in iterable:
        return item
    (empty case handling here)

In the issue I showed that it was faster for various iterables (especially for empty ones). Here are benchmark results with various arguments (columns: time of the old implementation in ns, time of proposal, time difference, arguments):

137 118 -19 (0,),
140 121 -19 [0],
150 122 -28 "0",
200 174 -26 {0},
142 123 -19 {0: 0},
115 92 -23 iter((0,) * 10000),
114 92 -22 iter([0] * 10000),
113 92 -21 repeat(0),
140 118 -22 (x for x in repeat(0)),
220 196 -24 Infinite(),
458 124 -334 (), None
457 126 -331 [], None
454 124 -330 "", None
463 132 -331 set(), None
455 125 -330 {}, None
425 98 -327 iter(()), None
426 98 -328 iter([]), None
422 98 -324 repeat(None, 0), None
429 98 -331 (x for x in ()), None
711 506 -205 Empty(), None

Why

It's not really about modern optimizations. With return next(iter(d)) you load two globals and have two calls, all in Python. With for key in d: return key you have lower-level equivalents of iter and next, which is evidently so much faster that it's a win despite its additional saving and loading of the local variable key. Has been like this for a long time, all this was already the case in Python 2.

(In the above case of more_itertools.first, the for way also saves entering try, although CPython has had "Zero cost" exception handling for a while when no exception is raised.)

More: The for-break(-else) pattern

I've used it not just with return but more often with an unconditional break, for example in Optimize heapq.merge with for-break(-else) pattern?. Not just for speed but also for shorter/nicer code, for example the initialization where an iterator it (or its __next__) is added to the heap if it's not empty:

Current:

    try:
        next = it.__next__
        h_append([next(), order * direction, next])
    except StopIteration:
        pass

Proposal:

    for value in it:
        h_append([value, order * direction, it])
        break

More: nested fors over the same iterator

Yet another way I've been using for to quickly get just the first value with an iterator is nested loops over the same iterator, for example my more_itertools.all_equal proposal (you can find benchmarks there), which got adopted with small modifications:

def all_equal(iterable):
    groups = groupby(iterable)
    for first in groups:
        for second in groups:
            return False
    return True

In that case, the inner loop had an unconditional return. In other cases, my inner loop exhausts the iterator, for example in my improvement of more_itertools.mark_ends just last week:

def mark_ends__improved_mystyle(iterable):
    it = iter(iterable)
    for a in it:
        first = True
        for b in it:
            yield first, False, a
            a = b
            first = False
        yield first, True, a

The previous/ alternative code achieved the "get first value or do nothing" with four lines instead of my one line for a in it: and nesting:

    try:
        b = next(it)
    except StopIteration:
        return

Btw

Your func1 can be optimized by changing list(d)[0] to [*d][0], which is faster as it likewise replaced loading and calling the global list.

And if you really assume that your dict has exactly one item as you said, then you can improve your func2 by changing rv, *_ = d to rv, = d.

Sign up to request clarification or add additional context in comments.

Comments

4

It's worth mentioning that none of the four methods actually validates that there is only one key. Correctness is more important than performance. Unless this is deep in a nested loop and known to be a bottleneck (the question does not make that clear), I think a variant of (2) is appropriate:

def func2a(d):
    rv, = d
    return rv

This will (by design) throw if there is more than one key, and none of the other methods will. The traceback will look like

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: too many values to unpack (expected 1)

2 Comments

Did you read the first sentence in my question? This has nothing to do with robust programming
Typically I pride myself on my literacy. The single-key condition is an assumption. That means that if it's violated, something is very wrong, and the developer should learn of it. Overall, your question smells of X/Y and without more information I can only conclude that it's premature optimisation, especially if you want to have "nothing to do" with robust programming.
2

I copied your code and run it using recent pypy3, namely

Python 3.11.13 (413c9b7f57f5, Jul 03 2025, 18:03:56)
[PyPy 7.3.20 with GCC 10.2.1 20210130 (Red Hat 10.2.1-11)]

I got following results

func1 duration=0.2187s
func2 duration=0.2582s
func3 duration=0.0314s
func4 duration=0.0316s

So func3 was marginally faster func4 and func1/func2 are order of magnitude slower than func3/func4. Most probably implementation details cause these difference.

1 Comment

Interestingly, func2 can be made significantly faster on pypy3 by changing to rv, = d

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.