DocBook resurgent: what it tells us about structured writing and component content management

A new XML-based content management system that is not based on DITA. Bet you didn’t see that coming. But I think it tells us something interesting about the two sides of structured writing.

Tom Johnson’s recent sponsored post explains the origins of Paligo, a relatively new CCMS out of Sweden. Paligo was developed by a company that had formerly been a DITA consulting shop in an attempt to come up with something that was easier to use (and less expensive) than the DITA solutions they were implementing.

What is interesting to me about Paligo is that they chose DocBook rather than DITA as the underlying XML vocabulary. Why? Quoting Tom:

And it turns out that building on a foundation of Docbook XML is considerably easier than building with DITA. DITA tends to impose more restrictions about what you can and can’t do, Svensson says. Even so, Paligo is only “based on Docbook.” Paligo extends from this foundation, adding what they need and not letting the content model restrict the system, while maintaining full capability to export to the open standard.

This is interesting because DocBook is a large and complex specification. (I want to say larger and more complex than DITA, but I’m not sure if that is true anymore.) Why use it as the basis for what is supposed to be a system that is easier to use than DITA?

The answer seem to be lack of restriction. DocBook may be as large and as complex as DITA, if not more so, but it is much less restrictive. DITA has lots of restrictions on what you can and cannot do in each topic type. DocBook has very few. You can combine DocBook elements in just about any way that might occur to you. And apparently Paligo loosens DocBook even further.

Why is this significant? In a CCMS (Component Content Management System) the whole point of the system is to let you assemble documents out of pieces. The main benefit of this is content reuse. The main problem standing in the way is composability: the ability to put pieces together and have them work.

Composability is an interesting problem. Lego sets and Mechano both have composability within their own systems, but there is little composability between Lego and Mechano.  You cannot freely exchange the bits. Composability is similarly a problem between the many different file types used across a typical enterprise. If you want to practice component content management on an enterprise scale, you need a composable format across the enterprise.

There are two ways to get this. One is to allow each group in the enterprise to have their own format, but insist that it must be transparently convertible into a common composable format. The other is to get everyone to use the common composable format directly. The latter sounds easier, so that is what most people choose. (It is not always easier, but people only find that out later.)

To get everyone in the enterprise using a single composable format, you need it to be easy to use, as well as being flexible enough to serve everyone’s needs. What to choose?

DITA has been the default choice for a while now, but the problem is that DITA is not really easy to use. Several companies have tried to make DITA easier to use. (Tom mentions EasyDITA in his article.) But DITA comes with restrictions and restrictions are hard to learn and annoying to comply with unless you really understand the point of them.

Lightweight markup languages such as Markdown and ReStructuredtext have become popular as well, with various CMS and publishing systems being built around them. But while they are simple and easy to use, their simplicity can be limiting. There are things that occur in complex publications that they cannot easily represent.

DocBook offers a far richer set of markup structures that can represent all of these things, but without the restrictiveness of DITA. It makes sense, therefore, for a company like Paligo to choose it for their underlying document structure.

There is a rub here though, and it has to do with the two sides of Structured Writing that I mentioned at the beginning. Those two sides are composability  and constraint. I am writing a book about structured writing (currently being serialized on TechWhirl) . That book focuses on the constraint side of structured writing.

The constraint side of structured writing is about expressing and enforcing constraints on content. It is about limiting and shaping what is written to meet a particular need. For example, it may constrain a recipe to follow a particular format and include particular pieces of information.

Here is a constrained version of a recipe:

recipe: Hard Boiled Egg introduction: A hard boiled egg is simple and nutritious. ingredients:: ingredient, quantity eggs, 12 water, 2qt preparation: 1. Place eggs in pan and cover with water. 2. Bring water to a boil. 3. Remove from heat and cover for 12 minutes. 4. Place eggs in cold water to stop cooking. 5. Peel and serve. prep-time: 15 minutes serves: 6 wine-match: champagne and orange juice beverage-match: orange juice nutrition: serving: 1 large (50 g) calories: 78 total-fat: 5 g saturated-fat: 0.7 g polyunsaturated-fat: 0.7 g monounsaturated-fat: 2 g cholesterol: 186.5 mg sodium: 62 mg potassium: 63 mg total-carbohydrate: 0.6 g dietary-fiber: 0 g sugar: 0.6 g protein: 6 g read more

The Tyranny of the Terrible Troika: Rethinking Concept, Task, and Reference

Tom Johnson’s blog post Unconscious Meaning Suggested from the Structure and Shape of Help, includes a graphic showing three shapes of content:

Tom Johnson's "Shapes of Help" graphic. Tom Johnson’s “Shapes of Help” graphic.

These three shapes are meant to represent the DITA topic triad of concept, task, and reference. I didn’t get it. As I said in a comment on Tom’s blog, I was trying to match the shapes to something more specific. It was odd that I didn’t recognize them as concept, task, and reference, I said, because I have be “battling the tyranny of the terrible troika” for the last few years. Tom asked what I meant by “the tyranny of the terrible troika”; this is my answer. read more

The Design Implications of Tool Choices

Every documentation tool has a built in information design bias. When you choose a tool, be it FrameMaker, DITA, AuthorIt, a WIKI, or SPFE, you are implicitly choosing an approach to information design. If you don’t understand and accept the design implications of your tool choice, as many people do not, you are setting yourself up for expense, frustration, and disappointment.

The Segmentation of Tech Comm

Segmentation

There is a growing segmentation of the tech comm profession.

I was flattered that my post Technical Communication is not a Commodity was used as a catalyst for Scott Abel’s discussion with Val Swisher, Jack Molisani and Sarah O’Keefe on The Changing Face of Technical Communications, What’s Next? I had a fair amount to say in the comment stream that followed to defend my assertion that Tech Comm is indeed not a commodity, but since then a few other interactions have convinced me that there is another important trend in tech comm that should be recognized: the growing segmentation of the field. read more

Introducing the SPFE Architecture

Today, I am announcing the launch of a new website, SPFE.info. SPFE.info is a site about the SPFE architecture for building structured authoring systems. Why would the world, need such a thing when it already has DITA? The site will attempt to answer that. Why have I spent the last 15 years or so working on what I now call SPFE? That I will try to explain here.