A new XML-based content management system that is not based on DITA. Bet you didn’t see that coming. But I think it tells us something interesting about the two sides of structured writing.
Tom Johnson’s recent sponsored post explains the origins of Paligo, a relatively new CCMS out of Sweden. Paligo was developed by a company that had formerly been a DITA consulting shop in an attempt to come up with something that was easier to use (and less expensive) than the DITA solutions they were implementing.
What is interesting to me about Paligo is that they chose DocBook rather than DITA as the underlying XML vocabulary. Why? Quoting Tom:
And it turns out that building on a foundation of Docbook XML is considerably easier than building with DITA. DITA tends to impose more restrictions about what you can and can’t do, Svensson says. Even so, Paligo is only “based on Docbook.” Paligo extends from this foundation, adding what they need and not letting the content model restrict the system, while maintaining full capability to export to the open standard.
This is interesting because DocBook is a large and complex specification. (I want to say larger and more complex than DITA, but I’m not sure if that is true anymore.) Why use it as the basis for what is supposed to be a system that is easier to use than DITA?
The answer seem to be lack of restriction. DocBook may be as large and as complex as DITA, if not more so, but it is much less restrictive. DITA has lots of restrictions on what you can and cannot do in each topic type. DocBook has very few. You can combine DocBook elements in just about any way that might occur to you. And apparently Paligo loosens DocBook even further.
Why is this significant? In a CCMS (Component Content Management System) the whole point of the system is to let you assemble documents out of pieces. The main benefit of this is content reuse. The main problem standing in the way is composability: the ability to put pieces together and have them work.
Composability is an interesting problem. Lego sets and Mechano both have composability within their own systems, but there is little composability between Lego and Mechano. You cannot freely exchange the bits. Composability is similarly a problem between the many different file types used across a typical enterprise. If you want to practice component content management on an enterprise scale, you need a composable format across the enterprise.
There are two ways to get this. One is to allow each group in the enterprise to have their own format, but insist that it must be transparently convertible into a common composable format. The other is to get everyone to use the common composable format directly. The latter sounds easier, so that is what most people choose. (It is not always easier, but people only find that out later.)
To get everyone in the enterprise using a single composable format, you need it to be easy to use, as well as being flexible enough to serve everyone’s needs. What to choose?
DITA has been the default choice for a while now, but the problem is that DITA is not really easy to use. Several companies have tried to make DITA easier to use. (Tom mentions EasyDITA in his article.) But DITA comes with restrictions and restrictions are hard to learn and annoying to comply with unless you really understand the point of them.
Lightweight markup languages such as Markdown and ReStructuredtext have become popular as well, with various CMS and publishing systems being built around them. But while they are simple and easy to use, their simplicity can be limiting. There are things that occur in complex publications that they cannot easily represent.
DocBook offers a far richer set of markup structures that can represent all of these things, but without the restrictiveness of DITA. It makes sense, therefore, for a company like Paligo to choose it for their underlying document structure.
There is a rub here though, and it has to do with the two sides of Structured Writing that I mentioned at the beginning. Those two sides are composability and constraint. I am writing a book about structured writing (currently being serialized on TechWhirl) . That book focuses on the constraint side of structured writing.
The constraint side of structured writing is about expressing and enforcing constraints on content. It is about limiting and shaping what is written to meet a particular need. For example, it may constrain a recipe to follow a particular format and include particular pieces of information.
Here is a constrained version of a recipe:
recipe: Hard Boiled Egg introduction: A hard boiled egg is simple and nutritious. ingredients:: ingredient, quantity eggs, 12 water, 2qt preparation: 1. Place eggs in pan and cover with water. 2. Bring water to a boil. 3. Remove from heat and cover for 12 minutes. 4. Place eggs in cold water to stop cooking. 5. Peel and serve. prep-time: 15 minutes serves: 6 wine-match: champagne and orange juice beverage-match: orange juice nutrition: serving: 1 large (50 g) calories: 78 total-fat: 5 g saturated-fat: 0.7 g polyunsaturated-fat: 0.7 g monounsaturated-fat: 2 g cholesterol: 186.5 mg sodium: 62 mg potassium: 63 mg total-carbohydrate: 0.6 g dietary-fiber: 0 g sugar: 0.6 g protein: 6 g
Why would you want to impose these constraints as opposed to just letting writers write and format as they go? Some of the more important reasons.
- It keeps things consistent.
- It guides the author and ensures they don’t forget things.
- It takes away the formatting task so the author can focus on content.
- It makes the content accessible to algorithms. The content not only follows constraints, it records the constraints that it follows. If you have a collection of recipes marked up like this and you want to make a low calorie cookbook, you can easily query the collection to pull out all recipes with a calorie count under 100.
- It allows you to implement sophisticated validation and auditing systems to verify the correctness and completeness of your content.
- It allows you to factor out other constraints. Even when writers are working in a freeform environment, they are expected to follow all kinds of constraints, often laid out in style guides or requirements documents. Structured writing allows you to factor out many of those constraints and to encode others in the structures to help guide authors.
The composability side of structure writing is, as we noted, simply about making sure that all the bits you create can be put together and published. You can use minimally constrained structured writing systems like MarkDown or DocBook to achieve that composability without introducing constraints into the writing process. It makes sense, therefore, that as DITA has popularized the idea of component content management, less constrained rivals have come along to challenge its position.
But here is that rub I have been promising: Constraints are a powerful aid to composability. Here’s why:
The first requirement of composability is simply to get the bits of text to format and print correctly after you put them together. As long as they all use the same markup language, and as long as the bits go together in a way that is valid in that markup language, then this requirement is satisfied. The looser the constraints of the markup language, the easier this requirement is to meet since bits can go together in more ways.
But the second requirement of composability is to get the bits of text to work together as a coherent piece of writing after you combine them. This is a much more difficult requirement to meet. In the early days of a system, it may seem easy enough to meet it with human effort, tweaking as required each time the pieces are composed. But the larger the collection grows, and as more bits are being put together in new ways, the harder it becomes. What fits in one place does not sound right in another, or contains information that duplicates what has already been said, or leaves a gap that needs to be filled ad-hoc, or uses different terminology from the surrounding text.
Maintaining a collection of content chunks that can be smoothly combined in different ways to create different publications actually requires fairly strict constraints on what each type of content chunk contains and how it is expressed. Without such constraints, there is no guarantee that the document put together from those chunks is going to read well, that it will be complete and free from repetition or even conflicting information.
Composability on any scale, in other words, requires content constraints as much as it requires a universal format.
And that is why DITA starts with the idea of topics and its basic topic types: task, concept, and reference. They are an attempt to define the basic content constraints that composability requires. By dividing content into these three types, it is hoped, you make sure each topic does its own job and does not conflict with other topics when combined with them.
The problem with this is that generic constraints like these don’t work well for many people’s content. Writers end up chafing at the constraints without benefiting from them. This is the nature of content constraints. Like a pair of shoes, they have to fit well or they are agony. And when they don’t fit well, they don’t achieve their end. Many people complain that the content coming out of their DITA systems is choppy and does not read well. The content constraints are not doing their job.
Of course, DITA’s basic topic types were not necessarily intended to be used out of the box. The were intended as a basis for specialization, DITA’s process for defining more precise topic types as “specialized” versions of the base types. In theory, at least, specialization should enable you to create content constraints that fit better and therefore do a better job.
I have made no bones about the fact that I am not a fan of the base topic types. They are based on a theory of information design that I find to be naive. I am not a fan of the specialization mechanism either. But they exist to play a vital role. DITA’s main application is as an enabler of component content management. Component content management requires composability. And manageable composability requires content constraints. It requires a constraint mechanism, and topics and specialization are the constraint mechanism that DITA provides.
Many DITA practitioners avoid specialization like the plague. I don’t know if this is because they find the constraints of the base topics types sufficient, if they don’t understand the role of content constraints in composability, or if they just don’t believe in customizing systems in the way that specialization requires. In any case, I suspect that many unspecialized DITA applications are not really taking great advantage of these basic content constraints, and that their users may therefore be open to the blandishments of a less constrained systems.
For my part, I seek a third way. First and foremost, I look to structured writing as a means to improving content quality and making authoring easier and more effective through appropriate constraints that fit the author’s task like an old and well loved pair of hiking boots.
In the realm of content management, I see that many of the tasks we now do by hand, juggling generic bits of content, could be automated if we had appropriately constrained content that algorithms could understand well enough to manage. In particular, I think we can factor out many of the structures that we currently have to create and manage in content or in a CCMS, making it possible to do sophisticated content management without authors needing to interact with a content management system in any substantial way.
And as I have discussed before, I believe that Every Page is Page One information design and hypertext information delivery greatly reduce the need for large-scale content management.
But I applaud DITA for understanding that content constraints are essential to composability and therefore to component content management. But DITA exists at what may be the point of maximum complexity on the curve. It is vulnerable to systems like Paligo on the one side that are willing to minimize content constraints for greater ease of use. But it is not seemingly well poised to take people in the other direction towards the greater use of content constraints to factor out the management tasks and restore simplicity (albeit a different kind of simplicity) to the authoring process.
If DITA cannot find a way to make content constraints a more attractive proposition, and easier to understand and to implement, it seems vulnerable to having its share of the market chipped away by systems like Paligo that offer composability and component management without significant content constraints.