CMS Content Organization Structures: Trees vs Facets vs Tags

Originally published at: http://www.sitepoint.com/cms-content-organization-structures-trees-vs-facets-vs-tags/

This article discusses the state of trees as a content organization structure in modern CMS as opposed to other approaches.


For several years I have been interested in content repositories as a key aspect of modern CMS. With “modern”, I mean CMS that are not just “page management systems” but CMS that actually manage content, thereby enabling authors to reuse their content on different devices and even different applications. In this spirit, I was very intrigued by services like prismic.io and contentful.com that essentially provide content repositories as a service. I was especially impressed by Prismic’s UI. But when evaluating these systems, I noticed a surprising trend: they do not leverage trees, neither as a native storage concept nor as a visualization concept. Instead, they for the most part rely on flat structures with tagging. My gut feeling was telling me that this was a mistake, especially when managing larger content repositories. At the same time I wondered: “Am I just a dinosaur that is missing the ark?”.

I discussed the topic with Ekke at a recent conference and after a short Twitter exchange we decided to write down our thoughts. I found additional inspiration in an article by David Weinberger who helped put my feelings in a historical context as well as explaining the advantages of different approaches to content organization, namely: trees, facets and tags. Additionally, I also want to mention the concept of references since they are supported by Contentful.

Introduction

Trees are the oldest of the methods mentioned above. The reason for this is likely that they work great in the physical world, ie. good old paper books, because it requires no duplication of content. That is, every piece of information is placed in exactly one place. The fact that trees have been around so long also gives them one distinct advantage: everyone knows how they work. Facets and tags, however, very much leverage the new possibilities of the digital age in that content can easily live in several places at once. But just because trees predate the digital age does not make them a dinosaur waiting for extinction. Let us first look at some of the advantages and disadvantages of facets and tags.

Continue reading this article on SitePoint
2 Likes

Interesting recurring topic indeed, thanks for sharing.
I am also thinking trees have a lot to offer, plus the advantage of not preventing anyone from using tagging or categorization & taxonomies.
I think facets are another beast though, purely intended to be used for Search & Find. I would not really consider them as a solution to organize our content upfront. Are they?

It’s also interesting to see how FS and other software (beside CMS) are evolving. FS UI, which are per definition trees, are less broadly used by end-user apps. Apps such as iCloud, GDoc, IA Writer, the Mac Finder and many others are more and more trying to leave the concept of “physical tree” to get to something more based on meta-information, that can also be a virtual tree sometime but with different // organizations. From a UX standpoint, trees have a lot of flaws, as you mention it.

To be continued…

Fascinating stuff. I would love to read a follow-up that covers hybrid systems in depth, as that’s an issue that many site owners confront every day.

As a manager of (lots and lots of) content, I struggle with this daily. In fact, there’s something deliciously meta about this article being listed under the PHP section when it has very little to do with PHP! I suspect it has to do with editorial assignment (and @swader can chime in), as you describe in the Tree section.

It becomes messy when you factor in multiple platforms: content that “lives” in one CMS (let’s say WordPress) will be distributed via many channels (RSS feeds, either via a full feed, a category feed, or a tag feed), found via search engines, and, in our case, listed on a forum with similar categories.

Indeed in practice it likely makes the most sense for organizations to look towards hybrid systems to get the best of all worlds that are relevant to them. For example on twitter someone suggested to use a curated tree structure of tags to overcome some of the issues I mentioned with tags. This of course can be a huge help but also means you can no longer easily create new tags ad hoc, which is one of the big advantages of tags.

While it would be easy to say I was driven by the fact that the PHP world is so dense with CMS solutions or attempts at them and as such found this topic fitting to the general gist of the channel, it would probably be more honest to admit it was simple human error and bias - having recognized the article as excellent, I neglected to even think about other channels, enthusiastically giving it only one main category - mine. : )

Regarding the categories, I was always fond of a nested tags approach which worked for me in the past on large repositories of content. Tags with children (which essentially translates to categories and subcategories) solve all the problems I can envision in large scale CMS efforts, as long as the creation of tags is centralized and well defined. Internally to a site, there needs to be an approval flow to new tags, and the tags structure needs to support synonyms. Their IDs (and, by extension, URL slugs) need to differ if they’re homonyms and the problem is solved indefinitely.

Furthermore, such tags can be given root (or “meta”) tags that define their purpose, which themselves may be nested. Thereby, a CMS is given the flexibility to define which tags are visible to the end user in the site’s search engine, which are to be interpreted as forum categories, which are to be considered statistics-tracking-related, and so on.

In short, a tagTree-on-tagTree model has worked very well for me in the past, covering thousands of books each with dozens of chapters, hundreds of thousands of users, dozens of journals with hundreds of entries and dozens of quarterlies each, all in a single system, and every entity tagged to some extent. The same tag engine powered the user-facing search, the employee-facing search and CRM, and the system-facing invoice tracker and more.

1 Like

I just read over some PHPCR…interesting…but how would you express many-to-many relationships? It seems the PHP version does not implement the shareable nodes option – which I understand is what would allow nodes to have more than one parent???

Drupal (as you likely know) has a very powerful data model…cross between entity attribute value or a graph database…it’s taxonomy module allows unlimited arbitrary categorization of data.

Interesting article :slight_smile:

Very thought provoking stuff. Having a software development background I’m definitely a fan of trees (and especially the filesystem) as an organizational tool. The single-location limitation was solved in filesystems by using links (hard or symbolic) and I see no reason you couldn’t model taxonomies in a content repositories the same way. (e.g. every item existing at one or more locations in the tree).

Disclaimer: I work for Contentful, but the views above are my own and don’t reflect any sort of official position.

Indeed PHPCR currently does not support multiple parents. At least in all projects I have done so far, I didn’t really miss this.

Yeah, in PHPCR the solution we have for this are references. I think references are more expressive than multiple parents, since it still means that “ownership” is clearly assigned to a specific place in the tree.

I think that facets, tags and category trees are just different expressions of basically the same paradigm. Facets are tags organized in groups with some enforcement related to specific information types (one-of, many-of, required facet). Strict category trees are tags hierarchies which only allow one tag per object.

The distinction between the three is made with consideration to UI, performance and data structure in mind, but I don’t see a reason to treat them as completely separate entities.

I generally agree with this statement - it’s all about how you use them, and you can turn each of those into the other.


Note that this post got a reply from Contentful and you can read it here.

Great post!

It took us some time, but since you’ve been calling us out directly, here are [our official Contentful thoughts on trees][1]. Long story short: We actually do like (and support) trees as one form of creating structure. True, not in your drag and drop type of web CMS visual UI, but conceptually for sure. However, we also think of trees as just one specific organizing principle with certain use cases (e.g. your collection of evergreen pages type of website). Other other scenarios may call for very different structures (think recipe app). [1]: https://www.contentful.com/blog/2015/02/17/content-trees-tags-and-facets-in-contentful/

2 Likes

Indeed references can be used to build tree structures or at least makes it possible to navigate as if it was a tree. I mentioned this in the article above. This can indeed be a great solution however to gain all the above mentioned advantages of trees, one needs an actual tree structure (which then also means one gets the disadvantages as well). For example while a proper tree structure can serve to also “inherit” properties, the same cannot really be done with a graph/tree (where a graph is of course a super set of a tree structure) build via references.

That being said, some of the advantages of references you mention in your response would of course be compatible with a tree structure as well. Take the Roger Federer example. If one creates a node “Roger Federer” using references one can of course point articles to that node and then one can make all such references query-able. Using tags in this scenario is imho not ideal as likely as a content author I would actually like to be able to point to a specific place to actually describe who this Roger Federer guy is and what he has done. As such pointing to an actual node is imho way superior in this case.

Additionally to me tags can become a problem when rewording an article. Lets say an article originally mentioned Roger Federer, but that sentence is removed. The tags might be overlooked. Now if instead the reference would be set by essentially putting a reference in the actual text, then if I remove the sentence, the reference is removed as well.

But I guess I am just a tag-skeptic :smile:

At any rate, which approach makes the most sense depends on the specific use case. As such contentful offers quite a range of options but the lack of trees means that for me its not useful in many cases, especially when dealing with large data sets.

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.