DocBook resurgent: what it tells us about structured writing and component content management

A new XML-based content management system that is not based on DITA. Bet you didn’t see that coming. But I think it tells us something interesting about the two sides of structured writing.

Tom Johnson’s recent sponsored post explains the origins of Paligo, a relatively new CCMS out of Sweden. Paligo was developed by a company that had formerly been a DITA consulting shop in an attempt to come up with something that was easier to use (and less expensive) than the DITA solutions they were implementing.

What is interesting to me about Paligo is that they chose DocBook rather than DITA as the underlying XML vocabulary. Why? Quoting Tom:

And it turns out that building on a foundation of Docbook XML is considerably easier than building with DITA. DITA tends to impose more restrictions about what you can and can’t do, Svensson says. Even so, Paligo is only “based on Docbook.” Paligo extends from this foundation, adding what they need and not letting the content model restrict the system, while maintaining full capability to export to the open standard.

This is interesting because DocBook is a large and complex specification. (I want to say larger and more complex than DITA, but I’m not sure if that is true anymore.) Why use it as the basis for what is supposed to be a system that is easier to use than DITA?

The answer seem to be lack of restriction. DocBook may be as large and as complex as DITA, if not more so, but it is much less restrictive. DITA has lots of restrictions on what you can and cannot do in each topic type. DocBook has very few. You can combine DocBook elements in just about any way that might occur to you. And apparently Paligo loosens DocBook even further.

Why is this significant? In a CCMS (Component Content Management System) the whole point of the system is to let you assemble documents out of pieces. The main benefit of this is content reuse. The main problem standing in the way is composability: the ability to put pieces together and have them work.

Composability is an interesting problem. Lego sets and Mechano both have composability within their own systems, but there is little composability between Lego and Mechano.  You cannot freely exchange the bits. Composability is similarly a problem between the many different file types used across a typical enterprise. If you want to practice component content management on an enterprise scale, you need a composable format across the enterprise.

There are two ways to get this. One is to allow each group in the enterprise to have their own format, but insist that it must be transparently convertible into a common composable format. The other is to get everyone to use the common composable format directly. The latter sounds easier, so that is what most people choose. (It is not always easier, but people only find that out later.)

To get everyone in the enterprise using a single composable format, you need it to be easy to use, as well as being flexible enough to serve everyone’s needs. What to choose?

DITA has been the default choice for a while now, but the problem is that DITA is not really easy to use. Several companies have tried to make DITA easier to use. (Tom mentions EasyDITA in his article.) But DITA comes with restrictions and restrictions are hard to learn and annoying to comply with unless you really understand the point of them.

Lightweight markup languages such as Markdown and ReStructuredtext have become popular as well, with various CMS and publishing systems being built around them. But while they are simple and easy to use, their simplicity can be limiting. There are things that occur in complex publications that they cannot easily represent.

DocBook offers a far richer set of markup structures that can represent all of these things, but without the restrictiveness of DITA. It makes sense, therefore, for a company like Paligo to choose it for their underlying document structure.

There is a rub here though, and it has to do with the two sides of Structured Writing that I mentioned at the beginning. Those two sides are composability  and constraint. I am writing a book about structured writing (currently being serialized on TechWhirl) . That book focuses on the constraint side of structured writing.

The constraint side of structured writing is about expressing and enforcing constraints on content. It is about limiting and shaping what is written to meet a particular need. For example, it may constrain a recipe to follow a particular format and include particular pieces of information.

Here is a constrained version of a recipe:

recipe: Hard Boiled Egg introduction: A hard boiled egg is simple and nutritious. ingredients:: ingredient, quantity eggs, 12 water, 2qt preparation: 1. Place eggs in pan and cover with water. 2. Bring water to a boil. 3. Remove from heat and cover for 12 minutes. 4. Place eggs in cold water to stop cooking. 5. Peel and serve. prep-time: 15 minutes serves: 6 wine-match: champagne and orange juice beverage-match: orange juice nutrition: serving: 1 large (50 g) calories: 78 total-fat: 5 g saturated-fat: 0.7 g polyunsaturated-fat: 0.7 g monounsaturated-fat: 2 g cholesterol: 186.5 mg sodium: 62 mg potassium: 63 mg total-carbohydrate: 0.6 g dietary-fiber: 0 g sugar: 0.6 g protein: 6 g read more

Writing Excellence Through Domain Awareness

A little while back, Tom Johnson posted an article entitled Seeing things from the perspective of a learner in which he says, “The balance between knowing and not knowing is the tension that undergirds the whole profession of technical writing.”.

I think that is absolutely correct. The point, after all, is to assist the reader on their journey from ignorance to knowledge. I say assist, because this is not a journey that can be accomplished simply by reading. The reader has work to do to integrate their knowledge. They need to get their hands dirty. But a sympathy with the troubles and perils of that path is at very least, highly useful to the writer. read more

Subject First; Context Afterward

In communication, they say, context is everything. Actually, “everything” consists of context and subject. Useful information is subject in context. The question is, which comes first: context or subject?

In the book era, the content search pattern was: context first, subject afterwards. That is, suppose you deliver three different products and have released three different versions of the product. Assuming only a single manual per product/version that meant you had 9 manuals, each with a page on feature X. read more

Safari Flow and the EPPO-fication of Books

Summary: Safari Flow represents a move to Every Page is Page One navigation for books, but its success is limited when the content is not written in Every Page is Page One style.

At Tom Johnson’s suggestion, I have recently subscribed to Safari Flow. Safari Flow is a new take on the Safari Books Online concept which allows you to rent online access to a large library of technical books. What makes Safari Flow different? Essentially, it takes an Every Page is Page One approach to the navigation of the content it provides. read more

Structured Writing is Essential for Developer Docs

Tom Johnson wrote a post recently in which he questioned the value of structured writing for developer documentation. Needless to say, I disagree. But Tom and I are not really at odds here. Rather, he means something different by “structured writing” than I do.

Structured writing is about content quality, not publishing

What I mean by “structured writing,” and what structured writing has traditionally meant, is essential for developer docs. But it is something different from what “structured writing” has come to mean in tech comm of late, as we can see from Tom’s argument: read more

Passive vs. imperative linking

Summary: Writers worry about whether links will distract users. To discuss this concern, we need to begin by distinguishing between imperative links that command the reader to click and passive links that merely make finding ancillary material easier.

Tom Johnson wrote a post recently in which he raised an important question about linking, and referred to an earlier article of mine on the subject. When you refer to another document in a post or article, should you link to it immediately? Tom wrote: read more

Topics, Pages, Articles, and the Nature of Hypertext

What is the right word to describe a node of a hypertext?

What should we call the basic unit of information that we present to readers? Is it a page, a topic, or an article? (I’m going to take it as read that the answer is no longer “a book”. If you disagree, that’s what the comments are for.)

I raise this now because of Tom Johnson’s latest blog post, DITA’s output does not require separation of tasks from concepts in which he makes the distinction between topics as building blocks and articles as finished output:

One reason so many people mistake the architecture of the source files with the architecture of the output files is because the term “topic” tends to get used for both situations. I prefer to call the output files “articles” rather than topics. An article might consist of several topics. Each of those topics might be of several different types: concept, task, or reference. read more

The Paradox of Help Quality

 Why does help still kind of suck even after so many years?

Tom Johnson asks this poignant question in his post Do We Need a New Approach to Help? Why Are Users So Apathetic Towards Help after 50 Years of Innovation?

Tom provides a great survey of the trends and ideas in help design, starting with John Carroll’s seminal work on minimalism and suggests multiple possible ways forward.

I think there is enormous promise in many of the paths Tom invites us to explore, but at the same time, I am struck by the need to recognize that there is limit to how much help help can be, and a real danger in trying to do too much. read more

Structured Writing FOR the Web

Tom Johnson started the discussion with  Structured authoring versus the web. Sarah O’Keefe and Alan Pringle took it up in Structured authoring AND the Web. My turn: Structured authoring FOR the Web.

One of my long term grievances is that structured authoring has been adopted piecemeal. Rather than approaching it holistically as a method that can provide a wide range of quality and efficiency benefits to the authoring process, people have tended to adopt it for a single purpose, and to use it only to the extent that it achieved that singular purpose. read more

A New Approach to Organizing Help

In his blog post, Two Competing Help Models: One-Stop Shopping or Specialized Stores?, Tom Johnson explores the dilemma posed by trying to put a company’s help information on the Web using the old tri-pane help paradigm. Do you combine all the help for your product lines into one site with a master TOC, or do you put up multiple small sites with separate TOCs?

Tom shows that there are significant problems with both approaches, but he looks at it only from the top down perspective (which assumes that the reader first finds “the help” and then uses the search and browse mechanisms of the tri-pane help system to find specific pieces of content). Further problems become evident when we consider what is likely the more common case: that the reader Googles for a specific answer and lands not on “the help”, but on a specific page in the help. read more