The impressive term partial function application stands for a programming technique that is not nearly as complicated as it sounds. In short words, it refers to creating a new function from any given function by fixating one or more of its arguments.
Why write a tutorial about such a simple feature? The actual technique (the what and how) is explained in few words, so I'll go on and contemplate on functional programming style in Python and built-in functions (the why) that are an old hat to seasoned Python coders. I'll also describe a real-world example (the where to use) in a rather free-form way. There's little working code in this tutorial. For those still exploring the vast Python-lands, this tutorial might offer an insight as to where functional thinking and the titular partial function application can be useful, even though it seems rather superfluous at first.
The Good Old Log Example
To give an example I already used in a previous article of mine (shameless plug: Closures in Python), imagine we want to have a number of logging functions like log_info(), log_warning(), log_error() and so on. These simply print any message passed to them on the screen, but precede each message with a logging level like “info”, “warning” or “error”. There are a number of possible ways to construct them without code repetition. Two of them – using a class and using closures – are presented in the aforementioned article. Here is a third version:
from functools import partial
def log_template(level, message):
print("{}: {}".format(level, message))
log_info = partial(log_template, "info")
log_warning = partial(log_template, "warning")
log_error = partial(log_template, "error")
Let's test it on the Python command line interpreter:
>>> log_info("test test test 1 2 3")
info: test test test 1 2 3
The usage of the partial function is as intuitive as the example suggests: Simply pass a function and then a number of arguments. The partial function will then return a new function. This new function is like the old function with some of its parameter fixed to the arguments which were passed. The fixation happens from left to right, so in our example, the level parameter was fixed, but message was not, because we have only passed one argument. If we wanted, we could have written
log_info = partial(log_template, level="info")
as the partial function also takes keyword arguments. That way we can fix any parameter we want (and likewise leave any parameter free, even if it's to the left of a fixed parameter).
The official documentation for this feature resides in the Python documentation. There's even a pure-python implementation for the partial function showcased, which behaves almost like the real version (but not quite, as we will see).
While all of this is nice and fun, it may leave you, dear reader, wondering
Why Would We Ever Need That?
As I said before, there are at least two other methods available for producing the same result as above. Both could be easily considered superior, because with these we don't need to import some module. Also, one might claim that with a little more typing, we could have fixated the level parameter just as well with a wrapper function:
def log_template(level, message):
print("{}: {}".format(level, message))
def log_info(message):
return log_template("info", message)
def log_warning(message):
return log_template("warning", message)
def log_error(message):
return log_template("error", message)
Note that the return is not really needed, because nothing (None) is returned anyways, but this mimics the behavior of the partial function better. A version with lambda expressions allows for less clutter, like so:
log_info = lambda message: log_template("info", message)
Is there a difference to using a partial object? Indeed, there is, albeit a seemingly small one.
Examining Stack Traces
Let's consider a rather impractical function throw that takes one argument (that will be fixed later). This function corresponds to the log_template function above. However, all it does is raise an exception:
def throw(unused_arg):
raise Exception
Not very useful, is it? When we execute it, we see
>>> throw(True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in throw
Exception
And that is the whole purpose of this exercise: To see a stack trace. At the moment the exception was raised there are two frames in our execution stack: One from the command line environment, and one from the throw function. This is very much expected and shouldn't surprise anybody.
Let's create a wrapper function that wraps our (unused) argument. This, too, has no purpose other than to show a stack trace. It corresponds to the wrapped versions of log_info and friends.
# assume 'throw' is defined as above
def throw_wrapper():
return throw(True)
Let's watch it do nothing and fail:
>>> throw_wrapper()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in throw_wrapper
File "<stdin>", line 2, in throw
Exception
Again, that is very much as we anticipated. Apart from the command line stack frame, there are two functions on the execution stack, which is our throw function and the wrapper function throw_wrapper. Now we construct a partial object with the partial function:
from functools import partial
# again, assume that 'throw' is defined
throw_partial = partial(throw, True)
and watch it blow up mindlessly:
>>> throw_partial()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in throw
Exception
This is interesting – we have just one function on the stack: throw, even though we called the function throw_partial. And I'm going to give this away without torturing you with more dummy code: The showcase-version from the Python documentation also does not behave like the actual partial function (and the documentation doesn't claim that it does; it's just there to explain the principle), it, too, shows two function frames on the execution stack.
That example also demonstrates another (mis-)feature: The partialized (is that even a word?) function is named the same as the old function, although it is obviously different. If we ever expose such a function as part of our API, we should remember to overwrite the __doc__ attribute of the new function to reflect its new signature and behavior. Note that I normally don't use docstrings in my tutorials because I want to keep the code as short as possible, but for any production code it is very important to document functions, at the very least any kind of public API.
So there we have one very practical advantage of using the partial function form functools over our own wrapper: We avoid constructing (and destructing) another stack frame, which is a costly thing to do if done very often (read: It's slow). Now we all know about how “premature optimization is the root of all evil”, but using the partial function is typically not much an effort, really. If the partial version is no more complicated than a wrapper, we might as well use it.
The question remains: Are there real-world problems where partial function application is really more convenient than other techniques? Well, I do happen to use it in some of my projects, so these problems do apparently exist. Just recently I happened to stumble over one, and even though the decision to go for partial objects was more due to the deficiencies of the multiprocessing module than to anything else, it was a good opportunity to think about basic functional programming in Python once more.
Real-World Example: Crunching Scientific Data
Consider a case I encountered while peering over the shoulders of a friend of mine (a physicist): He had a file with a lot of measurements from some kind of experiment (creating black holes, no doubt). These were to be read, then processed with some kind of expensive calculation, and the results were written to disk again. The problem was: It was painfully slow. On a modern computer it took several minutes to do the processing. We reworked the code and brought the time down by an order of magnitude with several tricks. One of them was to use multiple processes to split the calculation onto the many cores of the machine.
We don't need to bother thinking about the reading and writing, but let's assume the calculation function looked something like that:
def calculate(value):
# do some expensive calculations
# involving some physical parameters
# that are global variables in this script
return result
This is a very simplified view, of course. The value was a singular measurement (out of thousands) from the file, while the mentioned physical parameters were some kind of global variables in this script. The parameters where not hard-coded, because they were set by the meta-information in the header of the mentioned file.
We all know global variables are bad. To defend the physicist in question, the script was actually pretty sane and the use of global variables was somewhat justified (I won't go into detail here).
As mentioned, we decided that multi-processing was the way to go to speed this up (multi-threading was no-go because of Python's Global Interpreter Log (GIL)). There are two modules in Python's standard library that offer a process pool: concurrent.futures and multiprocessing. concurrent.futures is a great module built around the concept of futures, but we decided to go with multiprocessing because its simpler and does all that we needed.
multiprocessing has a nice Pool class, whose objects have a map method that behaves just like the built-in map function, but spreads the execution across all available processors. Now what does map do? Let's have a small excursions on the basics on functional programming in Python and examine
The map Function And Other Functional Goodies
One way to look at our problem at hand is: We have a list of values (list_1), and from that we want to construct another list that contains new values (list_2). The new values are not made up from thin air, but we derive them from the old values. This derivation can be expressed as a function (func) that takes an old value and outputs a new one. So we have this:
list_1 = [x1, x2, x3, ... , xn]
And want to get this:
list_2 = [func(x1), func(x2), func(x3), ... , func(xn)]
A Python-ish implementation to get list_2 from list_1 could be:
# not real executable code!
# func takes exactly one argument and returns one
list_1 = [x1, x2, x3, ..., xn]
list_2 = []
for value in list_1:
list_2.append(func(value))
There's an even nicer syntax in Python to do exactly that: We could use a list comprehension:
# not real executable code!
list_1 = [x1, x2, x3, ..., xn]
list_2 = [func(value) for value in list_1]
If the function body is just one expression, we could have just used that expression in the list comprehension instead of the function call.
There is a third way to construct list_2, and that is with the already mentioned map function:
# not real executable code!
list_1 = [x1, x2, x3, ..., xn]
list_2 = map(func, list_2)
All of these examples do the same (almost: The result of map is not actually a list, but an iterator). And if you ask me: I like the last version best. But if you are relatively new to programming or only used to procedural style code, the last line might seem odd to you: Are we really passing the function itself? Yes, we do. Python allows us to do that.
Other than in the other two examples, in the map example we never see the function func called – the calling of the function happens somewhere inside the map function. So we don't call the function func with a value, we pass it to the map function together with a list of all values.
The map function doesn't only take lists: It takes e.g. range objects:
result = map(func, range(10))
it takes generator expressions:
squares = (x**2 for x in range(10))
result = map(func, squares)
in short: It takes everything that can be iterated over.
Both list comprehensions (together with its siblings, dict and set comprehensions as well as it's cousin, generator expressions) and map fall into the general category of functional programming. In this programming paradigm, we don't so much see a program as a series of commands being executed one after another, but rather as as a composition of functions and expressions. Python, while not a full-fledged functional language, offers quite some practical constructs that help with a functional line of thinking. Aside from map, there's filter, the ability to create closures, decorators fall into this category and all the immensely useful tools in the itertools module. A characteristic trait of functional programming is the lack of explicit loops.
Now that we have acquired a tiny overview over the functional view on programming, here is
The Whole Point: How partial Fits In
As you may have noticed, the function passed to map is fed only one value from the list at a time. In other words, you may only use functions that take exactly one argument. If you have a function that takes more – well, bad luck. If there only was a way to fixate all function parameters besides one … OK, no more drama: You have observed the awesome power of partial to fix parameters, and so it's a perfect fit for map, filter, functools.reduce and all the other juicy functional bits Python has to offer. While we could use wrappers or other ways to reduce the amount of function parameters to one, partial is so nice because it goes well with a functional style of coding. Let's go
Back To Our Physicists Program
At first glance, the calculate function above is a good match for the map method of multiprocessing.Pool. It takes one value, returns one, all is well. Or is it? Notice how I subtly hinted on the existence of physical parameters which were implemented as global variables that are used within the calculate function. As it turned out, we had to refactor the calculate version to not depend on global state so that we could use it with Pool.map. That meant making the physical parameters arguments to our function, which then looked like that:
def calculate(value, parameter1, parameter2, parameter3):
...
Now it didn't depend on global state any more (which was good), but it still couldn't be used for map because it had multiple parameters (which was bad). My first reaction was to write a closure, but the multiprocessing module has quite some peculiarities, and, as it turned out, not liking closures as functions for Pool.map is one of them. So I remembered the partial function, we fixed the parameters to some values, used the Pool.map method and gained a 10x speed-up. True story.
Wrapping Up
I have used this real-world example because it is exemplary for almost all uses of partial in my code: Sometimes a functional approach matches the problem at hand best. That is often the case in very mathematical or otherwise algorithmic code that can't convincingly be represented by class-based OO.
While this tutorial is a bit thin on content, I hope to have shed some light on the context where partial function application is useful in day-to-day code. On the theoretical side, there's the related subject of currying (wikipedia/currying) you might find interesting as a follow-on. On the practical side, I encourage you to try some of the functional tools Python offers, if you haven't done so already.
I hope you enjoyed the article as much as I enjoyed writing it.
