Marquette University, view of Wisconsin Avenue  

Module 22

Comprehension

Python supports elements of the "functional programming", style where all or almost all data manipulation works by defining and applying functions directly. An exact definition is difficult to give, since there are so many different approaches, but the current wikipedia article is quite good. Python however has reduced its functional programming footprint almost exclusively to comprehension, a very nifty way of defining lists and similar data structures.

List comprehension

A standard ploy in programming is to apply a function to all elements of a list (the map paradigm). We know already how to do it by creating a new list, walking through the original list, and using append to add the value of an original value to the new list. List comprehension at its simplest does this in a single expression. In

[x**2 for x in range(100)]

we generate a list of the first 100 squares. The first part x**2 is called the output expression. The for is a generator expression. xis the variable, which is taken from a list-like expression, in this case in range(100). By varying the output expression, we can already use list comprehension for the map paradigm.

An additional if condition can restrict the number of elements in the list-like expression. For example, to only take the square of even numbers, we say

[x**2 for x in range(100) if x%2 == 0]

By nesting comprehension, we can create powerful, but hard to read single-line construction of lists and similar data structures. As an example, here is a list of all composite numbers between 2 and 100

[i*j for i in range(2,51) for j in range(2,101) if i*j < 100]

In this expression, we simply generate all products of two numbers between 2 and 50 and add them to the list if the product is less than or equal to 100. If we execute this, we see that the resulting list contains many repetitions. This is clear, since a number like 24=2*2*2*3 appears as 2*12, 3*8, 4*6, 6*4, 8*3 and 12*2. We can avoid this by using set comprehension. What we do is we replace the square brackets of a list definition with the curly brackets of a set definition.

{i*j for i in range(2,51) for j in range(2,101) if i*j < 100}
We can now generate a list of all prime numbers between 1 and 100 as a set (or list) of all numbers not in the set of compound numbers.
{i for i in range(2,100) if i not in
{i*j for i in range(2,51) for j in range(2,51) if i*j < 100}}

This one line expression is elegant, but barely readable. A common maxime for good Python style is to keep the code readable, and this is an example where the code is not readable.

Dictionary Comprehension

What has been said about list / set comprehension applies as well to dictionaries. The output expression now has to be a key, value pair separated by colons. For example, if we have a dictionary and want to invert the role of keys and values, we can just define an inverted dictionary as

drev = {d[key]:key for key in d}

This works well, but if two keys have the same value, then the decision which key is selected as the new value for the old value as key is somewhat arbitrary, depending on the internal structure of the dictionary at run-time.

Filter and Map Paradigm

The map paradigm applies a function to all elements of the list, and we already have seen how this is done. The Filter paradigm selects elements from a list according to a filter, a function that evaluates to True or False. We can do so by using list comprehension or by using the function filter.