Time to move to multi-sourcing

Single sourcing has been the watchword of technical communication for the last several decades. We have never fully made it work. A pair of seminal posts by prominent members of the community give me cause to hope that we may be ready to move past it.

Single sourcing is about the relationship between sources and outputs of content. Traditional publishing methods had a one to one relationship between sources and outputs, where each output was generated from a single source.

The inefficiencies of this model are evident, particularly where different outputs are just different formats of the same document, such as PDF and HTML. Single sourcing attempts to remove this inefficiency by combining content into a single source:

As Sarah O’Keefe comments in  Single-sourcing is dead. Long live shared pipes!, the first of the two seminal post I want to discuss:

[W]e have been trying for a world in which all authors work in a single environment (such as DITA XML, the Darwin Information Typing Architecture), and we pushed their content through the same workflow to generate output. We have oceans of posts on “how to get more people into XML” or “how to work with part-time contributors.” The premise is that we have to somehow shift everyone into the One True Workflow.

But there are two problems with this model. The first is that the source in this model is significantly more complicated than the individual sources in the original model. The added complexity comes from the need to support all of the potential outputs from a single source format and all of the management operations you want to perform on that content. But that complexity is imposed on everyone who writes any of the sources involved. Where people used to be able to write in Word or a simple visual HTML editor, they now have to write in Flare or DocBook or DITA or a complex CMS interface.

You can’t get everyone to agree that taking on this additional complexity is worth their while, and in practice it often slows down getting particular pieces of work done (even if it improves efficiency overall). So what happens is that some people drop out of the system, or refuse to sign on for it in the first place, or don’t use it for everything they do, leaving you with something that looks like this:

Life remains just as complex for the folks who continue to use the system, but the organization realizes fewer benefits from it because not everyone is using it.

Modern content demands have only made this situation worse. We now look to do more than simply issue content in multiple formats. We want to select and combine content to create different outputs for different audiences. We want to enhance content with metadata for consumptions by intelligent downstream systems. We want to richly link and organize content both statically and dynamically. We want to control terminology, manage quality, and steamline localization. All of this leads to greater complexity in your source format in order to support all of these things:

This added complexity is only going to result in more defections and dealing with the complexity, not to mention paying for the complex system, is only going to compromise your hoped-for ROI.

Is there another way?

O’Keefe suggests that the answer may lie in what she calls “shared pipes”. That is, a system in which content flows from many sources through a shared publication system and out to many different outputs.

This is a diagram I have been drawing for many years, and I love Sarah’s shared pipes analogy to describe it. It comes down to this:

There are many sources of content and many outputs, but the source material goes through a shared process in the middle that handles multiple outputs, selecting and composing for multiple audiences, adding rich metadata, rich static and dynamic linking, and all the rest.

But wait, don’t all those separate sources constitute that great boogeyman of the content industry, silos? Yes, they absolutely do, and that brings me to the second seminal post, Don’t Dismantle Data Silos, Build Bridges by Alan Porter. Porter begins by noting the reluctance of people to throw out their current way of doing things in favor of the one great single sourcing system:

Let’s face it: no one is going to throw out their incumbent systems just because we say they should. Especially not if those systems are still doing the job they were purchased to do. We have all worked with systems that are “good enough” to fulfill a specific set of tasks.

Removing and replacing existing systems isn’t quick or cheap, but the biggest hurdle isn’t budget, or technology (although that’s what’s often cited) — it’s human.

The human element is indeed crucial. Technically, the single sourcing model might work very well if it didn’t have to be used by humans. But, as Porter notes, humans build systems to suite their own work and they are not willing to give them up to make someone else’s work easier.

Nor should they, since the quality of a person’s work very much depends on the suitability of their tools to the task at hand. You don’t do your best brain surgery with tools designed for trimming hedges. The current systems that people are using may not be the best possible system they could be using, but at least those systems are specific to the work they are doing. They are comfortable. Perhaps we could design them a system that is even more comfortable, but if so it will be a system more specific to their task, not a single gigantic complex system designed to be a single source of everything.

So, a multi-source system lets us keep each source system focussed on the needs of the individuals who contribute to it. We then collect information from all those systems and run them through a shared publication process to produce whatever outputs we need. As Porter says:

Each customer interfacing system can still stand alone and address the needs of a particular line of business, or be an enterprise single source of truth. Yet by passing data between them, or existing enterprise business systems, they can be the foundation of a fully connected continuous customer experience.

Of course, it is not quite so easy as that. Setting aside the technical issues of actually connecting the various systems together, we are still left with the issue of whether the content coming from each of these systems has enough structure and metadata attached to it for the shared pipes to actually perform all the operations we need them to perform.

As O’Keefe points out, you don’t actually need every source to contain the structures to perform every system function:

not all content needs the same level of quality review. Instead, we’ll have make a distinction between “quick and dirty but good enough” and “must be the best quality we can manage.”

Thus it might be perfectly fine for some of your content to be written in a simple format like MarkDown and pass through your shared formatting pipe without necessarily passing through every other function and process of your central system.

But what about the content that does need to pass through  those functions. Can we support this without making every individual silo format as complex as our single source format:

Clearly if the silo formats have to be as complex as the single source format, we have accomplished very little. Even if each of these super-complex formats is specific to its silo, it is not likely to be supported by the existing system, nor is it likely to be comfortable for the individual contributor.

How then do we keep the formats of the individual silos small while still supporting all of the functions of the central publishing system for all the types of content that need it?

The answer lies in structured writing, particularly in the style of structured writing that I call “subject domain.” In subject-domain structured writing, the markup you add to the text is specific to the subject matter. Here, for example, is a recipe in subject-domain a structured writing format:

recipe: Hard-Boiled Egg introduction: A hard-boiled {egg}(food) is simple and nutritious. ingredients:: ingredient, quantity, unit eggs, 12, each water, 2, qt preparation: 1. Place eggs in (pan){utensil} and cover with water. 2. {Bring water to a boil}(task). 3. Remove from heat and cover for 12 minutes. 4. Place eggs in cold water to stop cooking. 5. Peel and serve. prep-time: 15 minutes serves: 6 wine-match: champagne and orange juice beverage-match: orange juice nutrition: serving: 1 large (50 g)   calories: 78 total-fat: 5 g saturated-fat: 0.7 g polyunsaturated-fat: 0.7 g monounsaturated-fat: 2 g cholesterol: 186.5 mg sodium: 62 mg potassium: 63 mg total-carbohydrate: 0.6 g dietary-fiber: 0 g sugar: 0.6 g protein: 6 g read more