Static Checking via Metaclasses

Suppose you are maintaining a domain-specific machine learning library. Users of the library's API expect that every machine learning algorithm offered by the API will have the same interface (i.e., the same methods with the same signatures) regardless of its underlying implementation. You would like to allow a community of contributors to define new algorithms that can be added to the library, but you would like to reduce your own effort and that of contributors when it comes to validating that a new algorithm conforms to the API.

Python metaclasses are the underlying, higher-order constructs that instantiate class definitions. Understanding what metaclasses are and how they can be used gives you a significant amount of control over what happens when a new class is introduced by users. This in turn allows you to constrain users when necessary and to provide assistance to users that can save them time and effort.

How Classes are Made

In Python, functions, classes, objects, and values are all on an equal footing. One consequence of this is that it is possible to pass any of these entities as arguments to functions and to return any of these entities as the result of a function (this fact was discussed in another article that covered Python decorators). But this also means that much of the syntax you normally use is actually just syntactic sugar for function calls.

What happens when the Python interpreter executes a class definition such as the one below?

In [1]:
class Document():
    def __init__(self):
        self.is_document = True

The class (not an instance or object of that class, but the class itself) is created and assigned to a variable that is in the scope. In the example above, that variable is Document.

In [2]:
Document
Out[2]:
__main__.Document

Python's built-in type function actually serves a number of purposes beyond determining the type of a value. Given a few additional parameters, the type function can be used to define a new class. Executing the statement in the example below is equivalent to executing the class definition for Document above.

In [3]:
def __init__(self):
    self.is_document = True

Document = type('Document', (), {'__init__': __init__})

Now that Document is a class, it is possible to create objects of this class.

In [4]:
d = Document()
d.is_document
Out[4]:
True

How Metaclasses are Made

In a manner similar to that of many programmaing languages that support the object-oriented programming paradigm, Python allows programmers to define derived classes that inherit the attributes and methods of a base class. The example below illustrates this by defining a class Passport that is derived from the Document class. Notice that the base class constructor Document is specified in the class definition.

In [5]:
class Passport(Document):
    pass

The Passport class inherits the attributes of the Document class. The example below illustrates that it inherits the __init__ method of the Document class.

In [6]:
p = Passport()
p.is_document
Out[6]:
True

The example in which Document was defined using the built-in type function suggests that type can be viewed (at least as a loose analogy) as a means for creating classes. In a way, it behaves like a constructor for the "class of all possible classes". Thus, if type is a kind of constructor for a class, it should be possible to use it in the same context as any other class constructor. But what should this mean? What is MetaClass in the example below?

In [7]:
class MetaClass(type):
    pass

Following the analogy to its logical conclusion, this must mean that MetaClass has inherited the capabilities of type. And, indeed, it has. In the example below, MetaClass is used to define a new class in the same way that type was used before.

In [8]:
Document = MetaClass('Document', (), {'__init__': __init__})
d = Document()
d.is_document
Out[8]:
True

The ability to use a metaclass in place of type as in the above example is also supported by the more common class syntactic construct.

In [9]:
class Document(metaclass=MetaClass):
    def __init__(self):
        self.is_document = True

Using Metaclasses to Enforce an API

Returning to the motivating example from the first paragraph, suppose you introduce a metaclass called MetaAlgorithm for machine learning algorithms that is derived from type. This metaclass definition can override the method __new__ that is normally invoked when a new class is defined using type (or using the equivalent class syntactic construct). This alternate definition of __new__ performs some additional checks before the class is actually created. In this use case, that additional work involves validating that the class being defined (corresponding to a new machine learning algorithm) conforms to your API.

In [10]:
from types import FunctionType

class MetaAlgorithm(type):
    def __new__(cls, clsname, bases, attrs):
        # The base class does not need to conform to the API.
        # See the paragraph below for an explanation of this check.
        if clsname != 'Algorithm':

            # Check that the programmer-defined
            # class has a contributor string.
            if 'contributor' not in attrs or\
               not isinstance(attrs['contributor'], str):
                raise RuntimeError('missing contributor')

            # Check that the programmer-defined class has the
            # methods required for your API.
            if 'train' not in attrs or\
               not isinstance(attrs['train'], FunctionType):
                raise RuntimeError('missing training method')

            if 'classify' not in attrs or\
               not isinstance(attrs['classify'], FunctionType):
                raise RuntimeError('missing classification method')

        return\
            super(MetaAlgorithm, cls)\
            .__new__(cls, clsname, bases, attrs)

Now that there is a way to define new classes, there are two ways to proceed. One approach is to require that all algorithm classes that contributors implement must include the metaclass=MetaAlgorithm parameter in the class definition. However, this is easy for a contributor to forget and also may require that contributors have a solid understanding of metaclasses. An alternative is to create a base class from which all contributed algorithm classes must be derived.

In [11]:
class Algorithm(metaclass=MetaAlgorithm):
    pass

Using this approach, it is sufficient to export the Algorithm base class and to inform all contributors that their classes must be derived from this base class. The example below illustrates how a contributor might do so for a very basic algorithm.

In [12]:
class Guess(Algorithm):
    contributor = "Author"

    def train(items, labels):
        pass
    
    def classify(item):
        import random
        return random.choice([True, False])

As the example below illustrates, an attempt by a user to define a class that does not conform to the API results in an error.

In [13]:
try:
    class Guess(Algorithm):
        def classify(item):
            return False
except RuntimeError as error:
    print("RuntimeError:", str(error))
RuntimeError: missing contributor

To emphasize: the error above occurs when the Python interpreter tries to execute the definition of the class, and not when an object of the class is created. It would be impossible to reach the point at which the interpreter attempts to create an object of this class because the class itself can never be defined.

Despite the fact that Python does not technically support static checking beyond ensuring that the syntax of a module is correct, it is arguably justifiable to say that what MetaAlgorithm does is a form of static checking. In many routine scenarios, the checks would be performed at the time that module is imported and before any other code has had a chance to run.

Further Reading

This article reviewed how user-defined classes are defined in Python and how the mechanism for creating classes can itself be customized. The built-in types library provides a number of additional methods that can assist in the dynamic creation of new types and classes. The motivating example in this article illustrated how these capabilities can be used to perform a form of static analysis of user-defined classes.

It is worth noting that the approach presented in this article is compatible with the methods for checking type annotations and unit testing functions presented in the article on type annotations. For example, it would be straightforward to require that the training and classification method definitions include annotations specifying the exact types of the data that they can handle. It would even be possible to test the methods by generating test cases having the appropriate types. Another observation is that the motivating use case in this article can also be solved by using techniques presented in other articles, such as by applying decorators to class definitions or by performing a static analysis of the class definition itself.