Python Descriptors Are Magical Creatures

It is clear to me that maintaining a popular helper is not worth the time nor the hassle. Without entering into details, this has become an unpaid job I dislike more and more - and I’ve been talking about it for far too long too. As such, this project is now unmaintained.

Arch Linux forum’s user Spyhawk. December 14th, 2017.

The Arch User Repository, AUR for short, is a community-driven repo that hosts every piece of software ever written by mankind. AUR packages are not officially supported by Arch and people have to rely on so-called AUR helpers to manage them.

For a long time the land of AUR helpers had a single undisputed king: pacaur, maintained by my boy Spyhawk. What was once an harmonious monarchy turned into a God-forsaken hell when, in a tragic December night, Spyhawk announced the end of an effort he had driven for over half a decade.

Having been lacking a serious contender for so long, people took pacaur for granted and forgot that managing AUR packages was a serious deal. There was no man brave enough to adopt the 2000 lines of bash left behind by Spyhawk and cheap copycats started appearing all over the place, but the bar was just set too high.

Being stunned by pacaurs death and overwhelmed by an endless array of AUR helpers I did the only reasonable thing: I wrote another one. That’s how aurepa, a mixture of AUR and arepa, a typical Colombian dish, started to take form. Writing aurepa has taken me to explore some of python’s darkest corners. Hidden in the shadows I found creatures so smoothly embedded in the python language that you don’t even realize they exist. Creatures like python descriptors.

Getters And Setters Should Be Illegal

Arch Linux packages have two version identifiers: the package version, something like 1.3.5, and a release number, a monotonically increasing integral that identifies different builds of the same version. The full version of a package is constructed by concatenating the package version with the package release: 1.3.5-2.

In my first design of aurepa this represented a small inconvenience, I had written a Package class that had two attributes: version and release.

>>> p.__class__
<class 'aurepa.Package'>
>>> p.version
'1.3.5'
>>> p.release
2

But I also wanted users of Package to have access to a full_version attribute. Maybe by adding this line to Package’s constructor?

self.full_version = self.version + '-' + self.release

This might not be the best idea given that it results in duplicated data: version and release exist twice. If one of them changes in between full_version will become outdated.

You may be having flashbacks of your professor saying something about getter methods. What about this?

def get_full_version(self):
    return self.version + '-' + str(self.release)

No. Hell no. Don’t even think about it. This is not Java.

This is inconsistent. This exposes implementation details. When exposing attributes via getters users are forced to make a distinction when accessing them: some are accessed directly with the very nice object.attribute notation, others divert to a method invocation. Even when you write a getter to protect access to some attribute, users are not forced to call it, as the protected underlying data is still accessible.

The bottom line is that your object holds a function when it should be holding a data member instead.

Wouldn’t it be nice if we could use the object.attribute syntax for full_version as well?

The One Word You Need To Remember: Property

Solving our problem is actually horribly simple. All you need is the @property decorator:

@property
def full_version(self):
    return self.version + '-' + str(self.release)

That’s it! Python’s built-in property decorator creates a read-only attribute from a function that computes it, i.e. a getter. Now we can do p.full_version just like with any other any other attribute, even if it is calculated dynamically.

>>> p.full_version
'1.3.5-2'

There is no duplicated data: version and release exist in a single place. There is no inconsistency. Users have transparent access to your attributes.

I could call it a night; my problem is solved. Maybe you should, too… Unless you are feeling adventurous, unless you don’t like to take things for granted, unless you posses the same lunatic love for python decorators as I do.

How does @property actually work?

A Journey Into Python’s Internals

By using @property, python is somehow able to transform a getter function into a data attribute. But how?

First of all let’s recall that writing @property on top of func is equivalent to func = property(func). But what is property? a class. Property is a class and we are invoking its constructor:

class property(object):
    def __init__(self, fget=None, fset=None, fdel=None, doc=None):
        self.fget = fget
        self.fset = fset
        self.fdel = fdel
        self.__doc__ = doc

    def __get__(self, obj, objtype=None):
        if obj is None:
            return self
        if self.fget is None:
            raise AttributeError("unreadable attribute")
        return self.fget(obj)

    def __set__(self, obj, value):
        if self.fset is None:
            raise AttributeError("can't set attribute")
        self.fset(obj, value)

    def __delete__(self, obj):
        if self.fdel is None:
            raise AttributeError("can't delete attribute")
        self.fdel(obj)

So when we write:

@property
def full_version(self):

We are creating an object of class property by passing the function full_version as the first parameter to its constructor. Remember that functions in python can be passed around like any other object, because they are, well, objects.

full_version = property(full_version)

So we replaced full_version in the Package class by an instance of property. This means that when users access package.full_version they should get back an object of type property, right?

>>> p.full_version
'1.3.5-2'
>>> type(p.full_version)
<class 'str'>

When full_version is accessed the getter method we decorated is somehow being executed behind the scenes. full_version doesn’t seem to be of type property after all, even thought we just replaced it by one! The magic behind this lies in the definition of the methods __get__, __set__, and __del__ in the property class. These methods implement an interface, better said a protocol, python’s data descriptor protocol.

Python’s Descriptor Protocol

Studying the implementation of property created more questions than answers. We have found out that decorating a getter method with @property replaces it by an instance of the property class. This new object stores the decorated getter function, self.fget = fget, and executes it inside __get__(): return relf.fget(obj).

However, when we access the property object we don’t get the object itself, but rather the result of its __get__ method. But why is __get__ being executed? The answer is simple: magic. That’s just how it is, if an object declares a method called __get__, python won’t return the object itself when accessed, but the result of its __get__.

Objects that define either of these methods:

__get__(self, obj, type=None)
__set__(self, obj, value)
__del__(self, obj)

Are called descriptors. If an objects implements all three, like property, it’s called a data descritor. If only __get__ is present, the object is called a non-data descriptor.

Descriptors can define custom behavior when they are being accessed, modified or deleted. In other words, they override python’s default attribute access. When the interpreter encounters o.x it looks up the __dict__ of o for an entry named x. If x defines a __get__, then o.x is translated to type(o).__dict__['x'].__get__(o, type(o)). If not, the default o.__dict__['x'] is returned.

A similar magic happens when the attribute is being modified, e.g. o.x = 5, or deleted del x. __set__ and __del__ will be called, respectively, instead of performing default behavior.

Note that data descriptors have higher precedence than instance variables in __dict__. If you define a class that contains a data descriptor, e.g. a property, they will opaque all other instance variables with the same name, since a.property = 'some_new_value' will execute a.property.__set__ instead of replacing it. Hence, data descriptors can’t be overridden in objects. Non-data descriptors, on the other hand, don’t define a __set__ and can therefore be reassigned in the object’s __dict__.

I discovered that a lot of python’s functionality relies in the descriptor protocol: @staticmethod, @classmethod are implemented as non-data descriptors. Even normal method invocation is!

Property Objects Are Data Descriptors

The implementation of property now becomes clear: it replaces the decorated function by a data descriptor. When the attribute is accessed __get__ is called which in turn calls fget, the decorated function. Properties are just wrappers around our function that implement the data descriptor protocol.

If you pay close attention you will notice that property also implements __delete__ and __set__. This is important because the definition of these methods makes property objects data descriptors. Key in this implementation is that both __delete__ and __set__ raise an AttributeError per default when executed. This is great! I don’t want users to alter the version of my package or delete it, it should be a read-only attribute! Turns out you can actually achieve proper data encapsulation in python after all.

Writable Data Attributes

It is sometimes useful to have modifiable properties that define custom behavior when users call del obj.property or obj.property = some_new-value. You can achieve this by passing the corresponding setter and deleter functions as arguments when constructing the property object. The new object will store them and execute them when the attribute is written or deleted, i.e. from __set__ and __delete__, respectively.

class Computer():
	def __init__(self):
		self._operating_system = None

	def get_operating_system(self):
		return self._operating_system

	def set_operating_system(self, os):
		if 'Linux' not in os:
			raise AttributeError('You should use Linux you fool!')
		self._operating_system = os
		print('{} has been installed!'.format(os))

	operating_system = property(get_operating_system, set_operating_system)
>>> laptop = Computer()
>>> laptop.operating_system = 'Windows 10'
...
AttributeError: You should use Linux you fool!
>>> laptop.operating_system = 'Arch Linux'
Arch Linux has been installed!

This works but it is a rather rustic solution that forces use to use a class attribute. Wouldn’t it be nice if we could use decorators? Unfortunately, you can’t create a modifiable attribute by using @property alone, as per definition only the first argument will be passed when instantiating property.

The class Property, however, defines convenience functions for setters and deleters as well, so that you can use them as decorators:

class Property():

	def getter(self, fget):
		return type(self)(fget, self.fset, self.fdel, self.__doc__)

	def setter(self, fset):
		return type(self)(self.fget, fset, self.fdel, self.__doc__)

	def deleter(self, fdel):
		return type(self)(self.fget, self.fset, fdel, self.__doc__)

This let’s you do the following:

class Computer():
	def __init__(self):
		self._operating_system = None

	@property
	def operating_system(self):
	    return self._operating_system

	@operating_system.setter
	def operating_system(self, os):
		if 'Linux' not in os:
			raise AttributeError('You should use Linux you fool!')
		self._operating_system = os
		print('{} has been installed!'.format(os))

This took me a while to understand but once you see it’s fairly simple. The functions setter, getter and deleter are methods of the property class but also decorators. But what do they decorate? They decorate self, i.e. the property object itself they belong to!

The setter decorator, for example, receives the decorated fset as argument and constructs a new property object by passing it to its constructor. Note, however, that existing functions in self (fget or del) are also propagated to the new object. This is exactly what decorators are all about, they receive some object, add some stuff to it (in this case fset) and return it.

aurepa might not be the successor of pacaur and I might not become the man that Spyhawk once was, but at least this journey has given me, and you, a glimpse of an obscure python feature I didn’t know existed but use every day. Maybe that cold December night wasn’t that tragic after all, like some ying and yang kind of thing.


© 2023. All rights reserved.

Powered by Hydejack v9.1.6