How to retrieve an element from a set without removing it?

Python Programming

Question or problem about Python programming:

Suppose the following:

>>> s = set([1, 2, 3])

How do I get a value (any value) out of s without doing s.pop()? I want to leave the item in the set until I am sure I can remove it – something I can only be sure of after an asynchronous call to another host.

Quick and dirty:

>>> elem = s.pop()
>>> s.add(elem)

But do you know of a better way? Ideally in constant time.

How to solve the problem:

Solution 1:

Two options that don’t require copying the whole set:

for e in s:
    break
# e is now an element from s

Or…

e = next(iter(s))

But in general, sets don’t support indexing or slicing.

Solution 2:

Least code would be:

>>> s = set([1, 2, 3])
>>> list(s)[0]
1

Obviously this would create a new list which contains each member of the set, so not great if your set is very large.

Solution 3:

I wondered how the functions will perform for different sets, so I did a benchmark:

from random import sample

def ForLoop(s):
    for e in s:
        break
    return e

def IterNext(s):
    return next(iter(s))

def ListIndex(s):
    return list(s)[0]

def PopAdd(s):
    e = s.pop()
    s.add(e)
    return e

def RandomSample(s):
    return sample(s, 1)

def SetUnpacking(s):
    e, *_ = s
    return e

from simple_benchmark import benchmark

b = benchmark([ForLoop, IterNext, ListIndex, PopAdd, RandomSample, SetUnpacking],
              {2**i: set(range(2**i)) for i in range(1, 20)},
              argument_name='set size',
              function_aliases={first: 'First'})

b.plot()

enter image description here

This plot clearly shows that some approaches (RandomSample, SetUnpacking and ListIndex) depend on the size of the set and should be avoided in the general case (at least if performance might be important). As already shown by the other answers the fastest way is ForLoop.

However as long as one of the constant time approaches is used the performance difference will be negligible.


iteration_utilities (Disclaimer: I’m the author) contains a convenience function for this use-case: first:

>>> from iteration_utilities import first
>>> first({1,2,3,4})
1

I also included it in the benchmark above. It can compete with the other two “fast” solutions but the difference isn’t much either way.

Solution 4:

tl;dr

for first_item in muh_set: break remains the optimal approach in Python 3.x. Curse you, Guido.

y u do this

Welcome to yet another set of Python 3.x timings, extrapolated from wr.‘s excellent Python 2.x-specific response. Unlike AChampion‘s equally helpful Python 3.x-specific response, the timings below also time outlier solutions suggested above – including:

  • list(s)[0], John‘s novel sequence-based solution.
  • random.sample(s, 1), dF.‘s eclectic RNG-based solution.
Code Snippets for Great Joy

Turn on, tune in, time it:

from timeit import Timer

stats = [
    "for i in range(1000): \n\tfor x in s: \n\t\tbreak",
    "for i in range(1000): next(iter(s))",
    "for i in range(1000): s.add(s.pop())",
    "for i in range(1000): list(s)[0]",
    "for i in range(1000): random.sample(s, 1)",
]

for stat in stats:
    t = Timer(stat, setup="import random\ns=set(range(100))")
    try:
        print("Time for %s:\t %f"%(stat, t.timeit(number=1000)))
    except:
        t.print_exc()
Quickly Obsoleted Timeless Timings

Behold! Ordered by fastest to slowest snippets:

$ ./test_get.py
Time for for i in range(1000): 
    for x in s: 
        break:   0.249871
Time for for i in range(1000): next(iter(s)):    0.526266
Time for for i in range(1000): s.add(s.pop()):   0.658832
Time for for i in range(1000): list(s)[0]:   4.117106
Time for for i in range(1000): random.sample(s, 1):  21.851104
Faceplants for the Whole Family

Unsurprisingly, manual iteration remains at least twice as fast as the next fastest solution. Although the gap has decreased from the Bad Old Python 2.x days (in which manual iteration was at least four times as fast), it disappoints the PEP 20 zealot in me that the most verbose solution is the best. At least converting a set into a list just to extract the first element of the set is as horrible as expected. Thank Guido, may his light continue to guide us.

Surprisingly, the RNG-based solution is absolutely horrible. List conversion is bad, but random really takes the awful-sauce cake. So much for the Random Number God.

I just wish the amorphous They would PEP up a set.get_first() method for us already. If you’re reading this, They: “Please. Do something.”

Solution 5:

To provide some timing figures behind the different approaches, consider the following code.
The get() is my custom addition to Python’s setobject.c, being just a pop() without removing the element.

from timeit import *

stats = ["for i in xrange(1000): iter(s).next()   ",
         "for i in xrange(1000): \n\tfor x in s: \n\t\tbreak",
         "for i in xrange(1000): s.add(s.pop())   ",
         "for i in xrange(1000): s.get()          "]

for stat in stats:
    t = Timer(stat, setup="s=set(range(100))")
    try:
        print "Time for %s:\t %f"%(stat, t.timeit(number=1000))
    except:
        t.print_exc()

The output is:

$ ./test_get.py
Time for for i in xrange(1000): iter(s).next()   :       0.433080
Time for for i in xrange(1000):
        for x in s:
                break:   0.148695
Time for for i in xrange(1000): s.add(s.pop())   :       0.317418
Time for for i in xrange(1000): s.get()          :       0.146673

This means that the for/break solution is the fastest (sometimes faster than the custom get() solution).

Hope this helps!