Is Single-Sourcing Dead?

Neil Perlin poses the question (and answers in the negative) in a response to my post “Time to Move to Multisourcing“. Perlin raises a number of points that deserve discussion. But first, a little clarification is needed.

The term “single sourcing” is used to mean a number of different things in tech comm and content strategy. Among them:

Producing PDF and Web pages (or help) from the same source file or files.
Content reuse (in other words, storing a single source for a piece of information and publishing it in various places).
Using a single repository/file format for all content.

These three things are independent of each other, though they may well be used together. For instance, you can produce PDF and help from the same source files stored locally on different workstations running different software, and you can use a single repository and file format without reusing content or outputting to more than one format (as many people using a Web CMS or Wiki do).

My original post was about the third of these, using a single source repository and file format for all content. This has been a common approach in technical communication and content strategy over the past decade particularly, and I argued that it was time to move away from it. By itself, my argument had nothing to do with either the first or second meaning of “single sourcing”.

Nonetheless, the ideas are related because the first and second meanings of single sourcing have been common reason for arguing for the use of a single repository and or file format for content. If you wanted to perform these operation, the argument went, you need sophisticated tools and content management systems and that is best achieved by a single source format and a single repository.

That single source format/single repository model has several significant disadvantages, however. I outlined those in my original post on the subject. But since the single format/repository model was used in part to enable multi-format delivery and content reuse, does that mean that those things are dead if we move away from the single format/repository model?

In a word, no, since they can manifestly be done independent of it. But we have to think seriously about how we do them if we don’t have a single source format and a single repository. Going back to everyone using their own independent desktop tools and storing the files on their own hard drives has all sorts of well documented disadvantages, and is a non-starter if and when you move to an integrated web strategy, semantic content, or Information 4.0. So if the single source/single format approach isn’t the right one either, we have a legitimate question about how we do multi-format publishing and content reuse going forward.

Perlin divides his post into two parts, first reviewing the major pain points of single sourcing and suggesting solutions, and then questioning whether these pain points can be addressed in a multi-source, shared-pipes model. Since those points clearly need to be addressed in any alternate model, I’ll look at each of them in turn. The headings below mirror those in Perlin’s post.

Inappropriate tools

Perlin argues that most companies won’t spring for high-end single sourcing tools like MadCap Flare for all authors and force them use Microsoft Word which, he says, is “lacking the flexibility and output options of full-power single-sourcing”.

The solution Perlin proposes is simple: Buy the appropriate tools for everyone who needs them.

But there are a couple of problems with this, beyond the unwillingness of companies to pony up the cash. First, these tools are unfamiliar to most of the people who would be asked to use them and they are conceptually more complex than Word. That introduces an training overhead and adds complexity to the writing task every day. And if the contributors don’t use those tools full time, they will forget most of their skills they are trained in.

Give everyone more complex tools is not really a sustainable strategy, nor is it one that is easy to sell.

Inappropriate training

Perlin argues that many current problems with single sourcing arise because writers are not properly trained to use the tools they have. The solution: more training.

I’m not a fan of this argument in any context. There are certainly tools that require training, supervision, qualification, and even periodic recertification. But these are expensive things. Whenever someone blames poor training for a problem, my first instinct is to suspect that poor system design is the real culprit.

Here’s the thing about multiple output formats and content reuse: unless each format is prepared entirely by hand (in which case, why do you need any special tools?) you are asking writers to prepare structured content that will be processed by algorithms to create different documents and/or different formats on the output side. Whether or not your tools identify as structured writing tools, they are requiring specific structures in order to work.

If such a system is going to fail, it will be because the writers did not create the structures that the algorithms were expecting. If they are going to require additional training, that is what they are going to need additional training on.

But whenever you are asked to prepare data to a specific format, there are two factors that are crucial to success: guidance and validation. Writers need clear guidance on the structures to be created and clear and immediate feedback when they make a mistake. If you don’t give people clear guidance and clear and immediate feedback, no amount of training is going to make them perform the task consistently, and if you do, relatively little training will be required. More training isn’t the answer. Better system design, in the form or more robust guidance and validation, is.

Inappropriate standards

“People often have no standards to follow to when it comes to using their tools – no templates for different types of material, or style usage standards, for example,” Perlin argues.

Here, of course, we are in complete agreement (see my note on the previous point). The question is, how do you define standards, where do you implement them, and how do you validate compliance with them?

Perlin goes on to recommend that we should “embed the standards into the authoring tools as much as possible to make their use automatic.” Here again we are in violent agreement. But the catch here is the extent to which your tools make it possible to do this. In standard commercial tools today, that capability is limited, as Perlin’s examples show:

For example, create topic-type templates with embedded prompts – “type the list of tools here” – to guide authors as they write. Or create a stylesheet with clear style names and make it the project’s master stylesheet so that it will be applied automatically to every topic.

That’s about as much of that as a standard unstructured tool like Flare is capable of supporting. But there are severe limitations here:

Templates and embedded prompts get overwritten with content, so the structured is not retained and is not available to guide subsequent writers and editors.
There is no ability to validate compliance with the template.
There is very limited ability to factor out invariant pieces of content, which is a key feature of true structured writing approaches.

For these reasons, variations from the standard can easily creep in, especially over time. You are still left relying more on training than on guidance and feedback.

Increasing complexity

Here I think it is useful to quote Perlin in full:

Single-sourcing requires many tasks beyond just writing the content. Authors have to decide which output is primary in order to decide which features to use because some won’t work well or at all on different outputs. That means understanding those features. Authors have to create and assign conditions to control which content to use for which output. Define re-usable chunks of content. Create style sheets that behave differently depending on the output. Perhaps define microcontent. And more. And this all must be documented somewhere for reference by the current authors and later ones.

The result? The increasing power of our tools and increasing customer demands are leading to increasingly complex projects that that can easily go out of control.

The solution? Again, simple. Document your project.

I agree absolutely with Perlin that complexity is the central problem, for all of the reasons he lists, and more. The managing of complexity is the central theme of my new book, Structured Writing: Rhetoric and Process (on sale today) . In it I talk about how you can use structured writing to redirect much of the complexity of content management and publishing away from writers, allowing them to focus more of their precious attention on research and writing.

And this is where I disagree with Perlin. Documenting all of your complexity is not a good (or simple) solution. Documenting it does not remove it from the writers attention. It is better than not documenting it, but not by much. The writer still has to deal with it, still has to spend time and mental energy on it, and can still make mistakes in following the complex procedures you have documented. Much of this complexity can be factored out using the right structured writing techniques. (I would elaborate, but that would spoil the book for you.)

Lack of motivation on authors’ parts

This is a fundamental part of the problem. Single sourcing, for the most part, is a solution to someone else’s problem, not the writer’s. As a result, as Perlin writes: “Authors type their content and make sure it prints well and that’s that.”

Perlin’s solution is essentially imperative: make single sourcing a job requirement, explain why it is important to the company, and provide management oversight to ensure compliance.

This is indeed the best you can do with current tools, but I have two fundamental problem with this approach:

With the best will in the world, people can’t comply with a regime that benefits someone else rather than themselves unless they get clear, direct, and immediate feedback, which current tools don’t give them, because the only real feedback is the appearance of the final outputs.
Management oversight can’t ensure compliance in the production phase of a process if it can only perceive compliance in the finished product. Assessing the finished product every time is too time consuming and error prone (you can miss things). And the effectiveness of management oversight decreases exponentially the longer the delay between when the writer writes and when the manager finds the defect.

What is the alternative? Use structured writing technique to factor out the parts of the process that are not of interest to the writer and institute a clear mechanical compliance regime to validate content as it is produced. In other words, remove the need for them to be motivated to do single sourcing at all, and give them an environment the rewards them for the things they are motivated to do.

Alternatives to Traditional Single-Sourcing?

Here Perlin turns to the second part of his argument, whether the multi-source / shared pipes / structured writing approach is an effective alternative to conventional single sourcing tools.

It is one thing, after all, to talk about the advantages of the structured writing approach and to show how it provides a better alternative to the challenges of single sourcing. But it is another thing entirely to implement it.

On the implementation front, Perlin objects that the model “add[s] a complex black box in the center of the process, where the conversion and coding is done.” I think the choice of words is wrong here. This is not a black box, which is a term for a piece of the system you can’t inspect. This is quite the opposite. With structured writing, there are algorithms you have to create and they are entirely visible to you because you write them. It is the traditional single sourcing tools that are more of a black box.

But this is a quibble. The heart of Perlin’s objection is this: “In my view, the more that this conversion and coding can be pushed back upstream to the individual authors by giving them templates, style sheets, and other tools and leaving the black box central processor to the tool vendor, the easier life will be.”

This is, indeed, the heart of the argument. It is the fundamental question of where the complexity of your content system will be handled. Since the new book spends 500 pages on this issue, I won’t try to discuss it fully here. But here’s the crux of the question: whose life are we making easier?

In Perlin’s model, all of the complexity of making single sourcing work is pushed onto the writers. “the more that this conversion and coding can be pushed back upstream to the individual authors … the easier life will be”. Well, if all that work is pushed to the writers, it is not their lives that are being made easier, since all the the work and the responsibility is being pushed onto them. If anyone’s life is being made easier, it is the tool builder’s life.

But tools are supposed to makes user’s lives easier, not the other way round. That may sound a bit snarky, but there is actually a very important reason why most current authoring tools are not designed to make the writer’s life easier. Understanding that reason is vital to understanding why the current documentation tool chains are the way they are, and thus why single sourcing, among other things, suffers from all the problems that Perlin outlines.

Software vendors have a problem. Computer memory, disk space, and computing cycles are as near free as makes no difference. It is extremely difficult to make money from selling the right to execute your software. With modern networks and servers, if you were just selling the right to execute, one copy of your product would be enough for the needs of most major corporations. That means you would be charging major corporations and individual users the same fee, and that is not a viable economic model for most tool vendors. It is one of the reasons that so much of the software that runs the Internet is open source, and why so many corporations contribute so actively to the open source movement: they need the infrastructure that open source provides, but which there is no viable economic model for selling.

If a software company wants to make money proportional to the amount of use that a customer makes of their product, they can’t do it selling the simple right to execute. They have to do it by selling seat time. If using your software means that the user has to sit in front of your user interface, you can make money by selling seat licenses. In this scenario, the more the users uses, the more they pay, which is the economic model you need to serve large and small customers alike. And this is exactly how every commercial authoring product out there is sold today.

In a structured writing system designed according to the principles I described in my previous post (and in far more detail in the book), most writers could work in a plain text editor most of the time. You would not need to buy extra seat licenses for additional contributors, or if you did they would be for relatively simple and inexpensive tools. Virtually all the processing would be done in batch. This does not present the vendor with a viable economic model.

So, “the more that this conversion and coding can be pushed back upstream to the individual authors … the easier life will be” … for the tool vendor. Thus the tool vendors have no economic motive to fix the problems with single sourcing — problems that have remained more or less static for the past couple of decades.

And that also means that if you want to go the structure writing route, you are going to have to take on more of an integration role. You are going to need people in the roles of content engineer and information architect.

It does not mean that you are going to have to build an entire system by yourself from scratch. Many of the horror stories you hear about people who got into trouble with in-house systems that they could not maintain come down to three issues:

Most of them were built before most of the common components that are readily available today were created, meaning that they built almost everything from scratch.
They were not conceived of as living systems that would need to be maintained, but as one-off static tools. The roles of content engineer and information architect were not created or staffed.
They were, for the most parts, generic document domain systems that did not really do much that commercial and open source projects didn’t come to do, and eventually do better. Switching to an available commercial or open source implementation of the same fundamental architecture was the logical thing to do.

Today there is a rich collection of tools and standards available (largely created to run the Web, which is to say, to build and deliver content systems). With the right roles defined and the right system design, you can construct an appropriate custom system using these components. People do it every day, and at every scale. For examples of people doing it for technical documentation, you have only to look at the Docs as Code movement. For instance, check out this post from Tom Johnson: How I handled data for about 10 device specifications on the same page — the advantages of a flexible, customizable web-based framework like Jekyll.

Are there some holes in this tool set? Yes there are. It is probably going to require an open source effort to fill those holes, since there isn’t much economic incentive for vendors to fill them. One missing piece is a lightweight semantic markup language. That is why I am working on the SAM project.

Finally, Perlin addresses the “subject domain” model that I mentioned in my post (and that I develop in great detail in the book). Perlin says:

In my view, this model can be handled by creating information-type templates for authors to use. We generally think of templates as specific to types of information/topics, but there’s no reason why templates can’t be applied to specific domains of information as well.

Perlin is entirely correct that you can do templates in the subject domain, and that can provide useful guidance on occasion. But it only scratches the surface of what you can do with subject domain structured writing. By itself, subject domain templating suffers from all of the disadvantages of any other form of templating: The structure is not captured explicitly in the content and is invisible to subsequent editors. It cannot be validated mechanically. It cannot be used by algorithms for downstream processing (to drive things like reuse and single sourcing). It cannot be used to factor any of the complexity that is pushed back to the writer by the desktop publishing model.

The subject domain is about far far more than this. Indeed, it would require a book to describe them all. (On sale today!)

But while putting together an effective subject domain structured writing system is now much easier than it used to be, it is still more of an adventure than buying a commercial system. Structured writing is an easier place to live, but a harder place to get to. There is still work to do to change that. Until that work is done, Perlin’s point about life being easier with conventional tools still has some bite to it.

Where the argument for conventional tools really starts to break down is when we look beyond the standard manual and help system to bottom-up information architectures, semantically rich content, and the notion of Information 4.0. But that is matter for another post. And for a rather large book.