Suppose you are maintaining a domain-specific machine learning library. Users of the library's API expect that every machine learning algorithm offered by the API will have the same interface (i.e., the same methods with the same signatures) regardless of its underlying implementation. You would like to allow a community of contributors to define new algorithms that can be added to the library, but you would like to reduce your own effort and that of contributors when it comes to validating that a new algorithm conforms to the API.
Python metaclasses are the underlying, higher-order constructs that instantiate class definitions. Understanding what metaclasses are and how they can be used gives you a significant amount of control over what happens when a new class is introduced by users. This in turn allows you to constrain users when necessary and to provide assistance to users that can save them time and effort.
In Python, functions, classes, objects, and values are all on an equal footing. One consequence of this is that it is possible to pass any of these entities as arguments to functions and to return any of these entities as the result of a function (this fact was discussed in another article that covered Python decorators). But this also means that much of the syntax you normally use is actually just syntactic sugar for function calls.
What happens when the Python interpreter executes a class definition such as the one below?
class Document():
def __init__(self):
self.is_document = True
The class (not an instance or object of that class, but the class itself) is created and assigned to a variable that is in the scope. In the example above, that variable is Document
.
Document
Python's built-in type
function actually serves a number of purposes beyond determining the type of a value. Given a few additional parameters, the type
function can be used to define a new class. Executing the statement in the example below is equivalent to executing the class definition for Document
above.
def __init__(self):
self.is_document = True
Document = type('Document', (), {'__init__': __init__})
Now that Document
is a class, it is possible to create objects of this class.
d = Document()
d.is_document
In a manner similar to that of many programmaing languages that support the object-oriented programming paradigm, Python allows programmers to define derived classes that inherit the attributes and methods of a base class. The example below illustrates this by defining a class Passport
that is derived from the Document
class. Notice that the base class constructor Document
is specified in the class definition.
class Passport(Document):
pass
The Passport
class inherits the attributes of the Document
class. The example below illustrates that it inherits the __init__
method of the Document
class.
p = Passport()
p.is_document
The example in which Document
was defined using the built-in type
function suggests that type
can be viewed (at least as a loose analogy) as a means for creating classes. In a way, it behaves like a constructor for the "class of all possible classes". Thus, if type
is a kind of constructor for a class, it should be possible to use it in the same context as any other class constructor. But what should this mean? What is MetaClass
in the example below?
class MetaClass(type):
pass
Following the analogy to its logical conclusion, this must mean that MetaClass
has inherited the capabilities of type
. And, indeed, it has. In the example below, MetaClass
is used to define a new class in the same way that type
was used before.
Document = MetaClass('Document', (), {'__init__': __init__})
d = Document()
d.is_document
The ability to use a metaclass in place of type
as in the above example is also supported by the more common class
syntactic construct.
class Document(metaclass=MetaClass):
def __init__(self):
self.is_document = True
Returning to the motivating example from the first paragraph, suppose you introduce a metaclass called MetaAlgorithm
for machine learning algorithms that is derived from type
. This metaclass definition can override the method __new__
that is normally invoked when a new class is defined using type
(or using the equivalent class
syntactic construct). This alternate definition of __new__
performs some additional checks before the class is actually created. In this use case, that additional work involves validating that the class being defined (corresponding to a new machine learning algorithm) conforms to your API.
from types import FunctionType
class MetaAlgorithm(type):
def __new__(cls, clsname, bases, attrs):
# The base class does not need to conform to the API.
# See the paragraph below for an explanation of this check.
if clsname != 'Algorithm':
# Check that the programmer-defined
# class has a contributor string.
if 'contributor' not in attrs or\
not isinstance(attrs['contributor'], str):
raise RuntimeError('missing contributor')
# Check that the programmer-defined class has the
# methods required for your API.
if 'train' not in attrs or\
not isinstance(attrs['train'], FunctionType):
raise RuntimeError('missing training method')
if 'classify' not in attrs or\
not isinstance(attrs['classify'], FunctionType):
raise RuntimeError('missing classification method')
return\
super(MetaAlgorithm, cls)\
.__new__(cls, clsname, bases, attrs)
Now that there is a way to define new classes, there are two ways to proceed. One approach is to require that all algorithm classes that contributors implement must include the metaclass=MetaAlgorithm
parameter in the class definition. However, this is easy for a contributor to forget and also may require that contributors have a solid understanding of metaclasses. An alternative is to create a base class from which all contributed algorithm classes must be derived.
class Algorithm(metaclass=MetaAlgorithm):
pass
Using this approach, it is sufficient to export the Algorithm
base class and to inform all contributors that their classes must be derived from this base class. The example below illustrates how a contributor might do so for a very basic algorithm.
class Guess(Algorithm):
contributor = "Author"
def train(items, labels):
pass
def classify(item):
import random
return random.choice([True, False])
As the example below illustrates, an attempt by a user to define a class that does not conform to the API results in an error.
try:
class Guess(Algorithm):
def classify(item):
return False
except RuntimeError as error:
print("RuntimeError:", str(error))
To emphasize: the error above occurs when the Python interpreter tries to execute the definition of the class, and not when an object of the class is created. It would be impossible to reach the point at which the interpreter attempts to create an object of this class because the class itself can never be defined.
Despite the fact that Python does not technically support static checking beyond ensuring that the syntax of a module is correct, it is arguably justifiable to say that what MetaAlgorithm
does is a form of static checking. In many routine scenarios, the checks would be performed at the time that module is imported and before any other code has had a chance to run.
This article reviewed how user-defined classes are defined in Python and how the mechanism for creating classes can itself be customized. The built-in types library provides a number of additional methods that can assist in the dynamic creation of new types and classes. The motivating example in this article illustrated how these capabilities can be used to perform a form of static analysis of user-defined classes.
It is worth noting that the approach presented in this article is compatible with the methods for checking type annotations and unit testing functions presented in the article on type annotations. For example, it would be straightforward to require that the training and classification method definitions include annotations specifying the exact types of the data that they can handle. It would even be possible to test the methods by generating test cases having the appropriate types. Another observation is that the motivating use case in this article can also be solved by using techniques presented in other articles, such as by applying decorators to class definitions or by performing a static analysis of the class definition itself.