Comprehensions and Combinations

If you are working on a project in which you must enumerate, traverse, and/or test every possible combination of elements from one or more finite (or even infinite) collections, Python provides a variety of ways to do so. This article reviews some of the syntactically concise ways to do so, while also addressing some relevant memory utilization aspects. In particular, the focus is on comprehension syntax as foundational building block that can be employed in conjunction with functions and recursion.

Conventions for Terminology and Notation

In order to maintain consistency across examples and to keep outputs deterministic, this articles follows the conventions enumerated below.

  • Python lists are used to represent both collections that may have duplicates and sets that do not have duplicates. For example, the set $\{0, 1, 2\}$ is represented using [0, 1, 2].
  • When including multiple collections (e.g., $U$, $V$, and $W$) within a Cartesian product expression (e.g., $U \times V \times W$), the collections are called components of the Cartesian product.
  • When an output evaluates to an iterable, it is sometimes immediately consumed and turned into a list using the list function so that it can be reused throughout an example multiple times. In practice, this may not be necessary and may even be unnecessarily expensive in terms of both memory utilization and running time. This distinction is noted explicitly where applicable.
  • A single-letter variable x usually refers to individual elements in a collection, a variable such as xs usually refers to collections of elements, and a variable such as xss usually refers to a collection of collections.

Cartesian Products

One of the simplest scenarios is one in which it is necessary to generate every combination of elements from two or more collections, where each combination has one component from each collection. This corresponds to the Cartesian product of the collections. Python's comprehension syntax is a powerful language feature that, under the right circumstances, provides a way to implement such operations involving collections in a way that is concise and closely resembles widely used and recognized forms of mathematical notation. The example below demonstrates how this syntax can be used to build a Cartesian product of two lists.

In [1]:
[(x, y) for x in [0, 1, 2] for y in ["a", "b"]]
Out[1]:
[(0, 'a'), (0, 'b'), (1, 'a'), (1, 'b'), (2, 'a'), (2, 'b')]

Comprehension syntax can become difficult to manage and read as the number of component collections involved increases. In such cases, it may make sense to build a function so that the repetitive aspects of building the definition are handled programmatically. It is demonstrated at the end of this section how such a function can be defined. But first, some examples of how it can be used are illustrated using the product function found in the built-in itertools library.

In [2]:
from itertools import product
u = [0, 1, 2]
v = ["a", "b"]
list(product(u, v)) # Wrapped in `list` for display purposes.
Out[2]:
[(0, 'a'), (0, 'b'), (1, 'a'), (1, 'b'), (2, 'a'), (2, 'b')]

The product function takes any number of iterable arguments and creates an iterable result containing one tuple for every combination of elements from each of the arguments.

In [3]:
r = [False, True]
s = [0, 1, 2]
t = ['a', 'b', 'c', 'd']
p = list(product(r, s, t))
len(p) == 2*3*4 and len(p[0]) == 3
Out[3]:
True

Using the unpacking operator in conjunction with the list repetition operator (also available as a method), it is possible to concisely describe a Cartesian product of any number of instances of a finite collection. In the example below, a ten-dimensional discrete finite space is created (where each dimension is [0, 1, 2]).

In [4]:
s = [0, 1, 2]
p = list(product(*[s]*10))
len(p) == 3**10
Out[4]:
True

How much memory does an output of product consume? Because it is an iterable that generates data dynamically, it only takes about as much memory as the component collections provided as inputs to product.

In [5]:
import sys
s = [0, 1, 2, 3, 4, 5, 6, 7]
n = 10
p = product(*[s]*n)
(sys.getsizeof([s]*n), sys.getsizeof(p))
Out[5]:
(68, 72)

Now that it has been demonstrated how the function can be used, consider a recursive implementation of such a function. To understand how this can be accomplished, a concrete example may help. Suppose a Cartesian product p of two collections [False, True] and [0, 1] has already been built. How do you turn it into a Cartesian product of three collections? You can iterate over all combinations of elements from the third collection ["a", "b"] and from the Cartesian product p, concatenating each element-combination pair.

In [6]:
p = [(x, y) for x in [False, True] for y in [0, 1]]
q = [(z,) + t for z in ["a", "b"] for t in p]
q
Out[6]:
[('a', False, 0),
 ('a', False, 1),
 ('a', True, 0),
 ('a', True, 1),
 ('b', False, 0),
 ('b', False, 1),
 ('b', True, 0),
 ('b', True, 1)]

Below is a complete implementation of the function based on the above approach. Note that the base case is a collection containing a single tuple of length zero. The recursive case consists of a comprehension that prepends every element in the first collection to every tuple in the cartesian product of all the remaining collections.

In [7]:
def cart(xss):
    if len(xss) == 0:
        return [()]
    else:
        return [(x,) + ys for x in xss[0] for ys in cart(xss[1:])]

You can confirm that cart produces the same output as product. However, note that its result is generated in its entirety when the function is called.

In [8]:
c = cart([range(0, 100), range(0, 100)])
p = product(*[range(0, 100), range(0, 100)])
print(set(c) == set(p))
print(sys.getsizeof(c), sys.getsizeof(p))
True
43808 40

A variant of this function that uses memory more efficiently can be created by turning the original definition into a generator. One added benefit of this approach is that the component collections can now be generators themselves (and, thus, can potentially contain an unknown number or even infinitely many elements).

In [9]:
def cart(xss):
    if len(xss) == 0:
        yield ()
    else:
        for t in ((x,) + ys for x in xss[0] for ys in cart(xss[1:])):
            yield t

The generator variant of cart function is nearly identical to product; the only difference is the absence of argument unpacking.

In [10]:
list(cart([[0, 1, 2], ["a", "b"]]))
Out[10]:
[(0, 'a'), (0, 'b'), (1, 'a'), (1, 'b'), (2, 'a'), (2, 'b')]

Because this function is a generator, it consumes approximately as much memory as is needed to keep track of the two component collections.

In [11]:
c = cart([range(0, 1000), range(0, 1000)])
print(2 * sys.getsizeof(range(0, 1000)))
print(sys.getsizeof(c))
48
56

Power Sets

A closely related scenario is one in which it may be necessary to generate every subset of a finite set (also known as the power set). There are a number of approaches to building a power set. It is possible to use the Cartesian product as a building block by noting that every element in a set s of size len(s) is either absent (corresponding to False) or present (corresponding to True). Thus, you can first build a Cartesian product of len(s) instances of the set {False, True}.

In [12]:
s = {0, 1, 2}
p = list(product(*[[False, True]]*len(s)))
p
Out[12]:
[(False, False, False),
 (False, False, True),
 (False, True, False),
 (False, True, True),
 (True, False, False),
 (True, False, True),
 (True, True, False),
 (True, True, True)]

It is then possible to use the built-in zip function and a comprehension (which employs an if clause for filtering) to associate the boolean values that make up each of the tuples in the Cartesian product with the corresponding elements in the original set.

In [13]:
ss = [{x for (b, x) in zip(bs, s) if b} for bs in p]
print(len(ss) == 2**len(s))
ss
True
Out[13]:
[set(), {2}, {1}, {1, 2}, {0}, {0, 2}, {0, 1}, {0, 1, 2}]

Alternatively, the Python documentation provides a recipe for building power sets using the built-in itertools library.

In [14]:
from itertools import chain, combinations
def powerset(iterable):
    s = list(iterable)
    return chain.from_iterable(combinations(s, r) for r in range(len(s) + 1))

The above variant yields a collection of tuples rather than sets, but otherwise produces the same collection of combinations.

In [15]:
list(powerset([0, 1, 2]))
Out[15]:
[(), (0,), (1,), (2,), (0, 1), (0, 2), (1, 2), (0, 1, 2)]

The recursive definition below employs an approach that is nearly identical to that of the recursive definition of the Cartesian production function. In the recursive case, every subset of all the remaining elements (excluding the current first element) is included in the overall result. However, a second copy of each of these subsets is taken and then paired (as in the Cartesian product function) with the current first element. This accounts for all subsets that do include the first element and all subsets that do not include it.

In [16]:
def powerset(xs):
    if len(xs) == 0:
        return [tuple()]
    else:
        x = xs[0]
        yss = powerset(xs[1:])
        return yss + [(x,) + ys for ys in yss]

This approach yields the same results as the definition in the recipe, though it is worth noting that the exact implementation above leads to a different order. This distinction may be important in some applications (such as when performing a search for the largest subset that satisfies some criteria).

In [17]:
powerset([0, 1, 2])
Out[17]:
[(), (2,), (1,), (1, 2), (0,), (0, 2), (0, 1), (0, 1, 2)]

Can you apply the same technique used in the implementations of the cart function to turn the above recursive definition of powerset into a generator?

Further Reading

This article reviews how comprehensions, in concert with other built-in functions and Python features, can be used to assemble concise implementations of algorithms for generating common kinds of combinations of items from a collection. Many variants and relevant peripheral techniques can be found in a list of recipes found in the documentation for the built-in itertools library. This includes functions such as combinations, which can be used to obtain all subsets of a specific size. Other common set operations such as union and intersection are available as methods of the built-in set type, and the built-in collections library provides a variety of operations on collections (such as a class Counter for concisely implementing counting workflows). Popular third-party libraries such as SciPy contain specialized functions for common operations in combinatorics.