I’ve been using django-taggit to provide a tagging model for content items in my app. However, I wanted to arrange the tags into a hierarchy/taxonomy. It’s simple enough to use a custom through model to define a custom tag model with a parent pointer, which lets you arrange your tags into a tree:
from taggit.models import TagBase, ItemBase from taggit.managers import TaggableManager ... # the custom tag model class HierarchicalTag (TagBase): parent = models.ForeignKey('self', null=True, blank=True) # the through model class TaggedContentItem (ItemBase): content_object = models.ForeignKey('ContentItem') tag = models.ForeignKey('HierarchicalTag', related_name='tags') # the content item class ContentItem (ItemBase): tags = TaggableManager(through=TaggedContentItem, blank=True)
However, suppose you have a tree of tags like this:
Vehicle Car BMW Z4 Ford Fiesta Chevrolet Volt
and you have content items tagged with leaves (Z4, Fiesta, Volt), but you want to search for all items tagged with anything from the ‘Car’ branch of the tree. Chances are you’ll end up writing a recursive function to gather up all the descendants of ‘Car’, which doesn’t scale because it involves many SQL queries, or using esoteric SQL syntax available only in the big database engines (and certainly not sqlite3).
At work, where we use a non-relational database engine, we long ago overcame the same issue (efficient manipulation and querying of hierarchical models), so I already had an idea of what I needed to do. But, as is the way with Python and Django, I figured there would probably already be packages that implement efficient hierarchical data — and there are.
The two main contenders seem to be django-mptt and django-treebeard. I tried mptt first, mainly because the consensus seemed to be that it was smaller and easier to use, but also because it purported to allow you to add hierarchical structure to existing models by configuration, which in my case would mean I didn’t have to define a custom tag model and could attach hierarchy directly to taggit’s Tag model.
However, my experience of mptt was poor – the documentation appeared to be out of date with respect to both the version of mptt I got from pip and the latest git trunk. Also, when I tried to use mptt’s admin classes for Django, I got exceptions (I admit I didn’t try very hard to overcome them).
So I gave treebeard a go, and had a much smoother time. Treebeard implements a number of hierarchy techniques with different performance characteristics (e.g. cheap querying but expensive insertion), allowing you to choose which one suits your application’s use of the trees. In my case I went for ‘Materialised Path Trees’ because it’s the relational equivalent of the technique I’m already familiar with. Implementing hierarchical tags was a straightforward case of having my custom tag model extend treebeard’s MP_Node model which, as the name suggests, implements a node in a Materialised Path Tree:
from treebeard.mp_tree import MP_Node ... class HierarchicalTag (TagBase, MP_Node): node_order_by = [ 'name' ] class TaggedContentItem (ItemBase): content_object = models.ForeignKey('ContentItem') tag = models.ForeignKey('HierarchicalTag', related_name='tags') class ContentItem (ItemBase): tags = TaggableManager(through=TaggedContentItem, blank=True)
(The node_order_by is what treebeard uses to order siblings when a new node is added to the tree.) That was literally all that was needed. Going back to the ‘Car’ example, the code to find all ContentItems tagged with any of the descendants of ‘Car’:
# look up the Car term car = HierarchicalTag.objects.get(name='Car') # get a queryset of all its descendants: with treebeard this is 1 SQL statement # use HierarchicalTag.get_tree(car) if you want to include 'Car' treeqs = car.get_descendants() # now find the ContentItems using an inner queryset qs = ContentItem.objects.filter(tags__in=treeqs)