Time to move to multi-sourcing

By | 2018/04/06

Single sourcing has been the watchword of technical communication for the last several decades. We have never fully made it work. A pair of seminal posts by prominent members of the community give me cause to hope that we may be ready to move past it.

Single sourcing is about the relationship between sources and outputs of content. Traditional publishing methods had a one to one relationship between sources and outputs, where each output was generated from a single source.

The inefficiencies of this model are evident, particularly where different outputs are just different formats of the same document, such as PDF and HTML. Single sourcing attempts to remove this inefficiency by combining content into a single source:



As Sarah O’Keefe comments in  Single-sourcing is dead. Long live shared pipes!, the first of the two seminal post I want to discuss:

[W]e have been trying for a world in which all authors work in a single environment (such as DITA XML, the Darwin Information Typing Architecture), and we pushed their content through the same workflow to generate output. We have oceans of posts on “how to get more people into XML” or “how to work with part-time contributors.” The premise is that we have to somehow shift everyone into the One True Workflow.

But there are two problems with this model. The first is that the source in this model is significantly more complicated than the individual sources in the original model. The added complexity comes from the need to support all of the potential outputs from a single source format and all of the management operations you want to perform on that content. But that complexity is imposed on everyone who writes any of the sources involved. Where people used to be able to write in Word or a simple visual HTML editor, they now have to write in Flare or DocBook or DITA or a complex CMS interface.

You can’t get everyone to agree that taking on this additional complexity is worth their while, and in practice it often slows down getting particular pieces of work done (even if it improves efficiency overall). So what happens is that some people drop out of the system, or refuse to sign on for it in the first place, or don’t use it for everything they do, leaving you with something that looks like this:

Life remains just as complex for the folks who continue to use the system, but the organization realizes fewer benefits from it because not everyone is using it.

Modern content demands have only made this situation worse. We now look to do more than simply issue content in multiple formats. We want to select and combine content to create different outputs for different audiences. We want to enhance content with metadata for consumptions by intelligent downstream systems. We want to richly link and organize content both statically and dynamically. We want to control terminology, manage quality, and steamline localization. All of this leads to greater complexity in your source format in order to support all of these things:

This added complexity is only going to result in more defections and dealing with the complexity, not to mention paying for the complex system, is only going to compromise your hoped-for ROI.

Is there another way?

O’Keefe suggests that the answer may lie in what she calls “shared pipes”. That is, a system in which content flows from many sources through a shared publication system and out to many different outputs.

This is a diagram I have been drawing for many years, and I love Sarah’s shared pipes analogy to describe it. It comes down to this:

There are many sources of content and many outputs, but the source material goes through a shared process in the middle that handles multiple outputs, selecting and composing for multiple audiences, adding rich metadata, rich static and dynamic linking, and all the rest.

But wait, don’t all those separate sources constitute that great boogeyman of the content industry, silos? Yes, they absolutely do, and that brings me to the second seminal post, Don’t Dismantle Data Silos, Build Bridges by Alan Porter. Porter begins by noting the reluctance of people to throw out their current way of doing things in favor of the one great single sourcing system:

Let’s face it: no one is going to throw out their incumbent systems just because we say they should. Especially not if those systems are still doing the job they were purchased to do. We have all worked with systems that are “good enough” to fulfill a specific set of tasks.

Removing and replacing existing systems isn’t quick or cheap, but the biggest hurdle isn’t budget, or technology (although that’s what’s often cited) — it’s human.

The human element is indeed crucial. Technically, the single sourcing model might work very well if it didn’t have to be used by humans. But, as Porter notes, humans build systems to suite their own work and they are not willing to give them up to make someone else’s work easier.

Nor should they, since the quality of a person’s work very much depends on the suitability of their tools to the task at hand. You don’t do your best brain surgery with tools designed for trimming hedges. The current systems that people are using may not be the best possible system they could be using, but at least those systems are specific to the work they are doing. They are comfortable. Perhaps we could design them a system that is even more comfortable, but if so it will be a system more specific to their task, not a single gigantic complex system designed to be a single source of everything.

So, a multi-source system lets us keep each source system focussed on the needs of the individuals who contribute to it. We then collect information from all those systems and run them through a shared publication process to produce whatever outputs we need. As Porter says:

Each customer interfacing system can still stand alone and address the needs of a particular line of business, or be an enterprise single source of truth. Yet by passing data between them, or existing enterprise business systems, they can be the foundation of a fully connected continuous customer experience.

Of course, it is not quite so easy as that. Setting aside the technical issues of actually connecting the various systems together, we are still left with the issue of whether the content coming from each of these systems has enough structure and metadata attached to it for the shared pipes to actually perform all the operations we need them to perform.

As O’Keefe points out, you don’t actually need every source to contain the structures to perform every system function:

not all content needs the same level of quality review. Instead, we’ll have make a distinction between “quick and dirty but good enough” and “must be the best quality we can manage.”

Thus it might be perfectly fine for some of your content to be written in a simple format like MarkDown and pass through your shared formatting pipe without necessarily passing through every other function and process of your central system.

But what about the content that does need to pass through  those functions. Can we support this without making every individual silo format as complex as our single source format:

Clearly if the silo formats have to be as complex as the single source format, we have accomplished very little. Even if each of these super-complex formats is specific to its silo, it is not likely to be supported by the existing system, nor is it likely to be comfortable for the individual contributor.

How then do we keep the formats of the individual silos small while still supporting all of the functions of the central publishing system for all the types of content that need it?

The answer lies in structured writing, particularly in the style of structured writing that I call “subject domain.” In subject-domain structured writing, the markup you add to the text is specific to the subject matter. Here, for example, is a recipe in subject-domain a structured writing format:

 recipe: Hard-Boiled Egg
     introduction:
         A hard-boiled {egg}(food) is simple and nutritious.
     ingredients:: ingredient, quantity, unit
         eggs, 12, each
         water, 2, qt
     preparation:
         1. Place eggs in (pan){utensil} and cover with water.
         2. {Bring water to a boil}(task).
         3. Remove from heat and cover for 12 minutes.
         4. Place eggs in cold water to stop cooking.
         5. Peel and serve.
     prep-time: 15 minutes
     serves: 6
     wine-match: champagne and orange juice
     beverage-match: orange juice
     nutrition:
         serving: 1 large (50 g)
         calories: 78
         total-fat: 5 g
         saturated-fat: 0.7 g
         polyunsaturated-fat: 0.7 g 
         monounsaturated-fat: 2 g 
         cholesterol: 186.5 mg 
         sodium: 62 mg 
         potassium: 63 mg 
         total-carbohydrate: 0.6 g 
         dietary-fiber: 0 g 
         sugar: 0.6 g 
         protein: 6 g 

Subject-domain structured writing has three major advantages for use in multi-sourcing shared pipe environments:

    1. Subject-domain formats are comfortable for users in the silos that create content on that subject because they are familiar with the terms and structures used. Contributors already think and work in the domain of the subject. Subject-domain markup simply formalizes the categories they already think in.
    2. Subject-domain formats contain the metadata required for most of the functions you need the central system to perform, since most of the decisions that the central system has to make are in fact based on knowing the type and subject matter of each piece of information. (I will expand on this in great detail in my book, Structured Writing: Rhetoric and Process, coming soon from XML Press.)
    3. Subject-domain structures allow you to express precise constraints on the structure of the content being created. This is important because if you are going to have an efficient central system drawing content from many sources, that system needs to know, with a high degree of reliability, exactly what it is getting from each source.

You can’t use the subject domain as the format for a conventional single source system because such a system has to support content on all kinds of different subjects. Thus conventional single sourcing systems use structures from what I call the document domain (whether that is in the form of a document-domain markup language like DocBook, or in the document-oriented structures of a proprietary format). And because the document domain does not contain enough information to run all the processes of a complex single sourcing system, these systems also add what I call management-domain structures for the additional information needed. It is this combination of document domain and management domain structures that makes such systems both complex and uncomfortable for most contributors to use.

Thus moving to the subject domain for your multi-source silos means you can have much simpler and more comfortable structures in your silos while still being able to run all the functions of a complex publishing system.

This relationship between the subject domain and the silo is reciprocal. In fact, when you move content to the subject domain, you are creating a silo just by adopting that format. The recipe markup above is great for marking up recipes. It is of no use at all for other kinds of content. Thus moving your recipes to subject-domain recipe markup means creating a recipe silo.

And that is just fine, because silos are a comfortable place for people to work and, if well managed, allow you to control content quality so that it both adds value and proceeds smoothly through your publishing process. A well designed silo contributes to the integration of systems, to the efficient use of shared pipes, by ensuring the quality of the data it produces.

This is fine when you are setting up content silos for yourself, which you definitely should be doing if you want to improve the quality of your content and the reliability of your content process, but what about all those existing systems that Porter warns us it will be difficult to get people to abandon? Actually, this is not necessarily as big a problem as it might seem.

Remember that reciprocal relationship between subject domain content and silos? Silos are built by creating structures specific to the subject matter you are writing about — subject domain structures. So when you examine your existing silos, you are often going to find that they already contain subject domain structures that, with perhaps a little format translation, you can direct straight to your common publishing process, just by teaching that process how to interpret that data.

For example, API reference content is often created in a silo using tools like JavaDoc or Doxygen. Here is an example of JavaDoc content:

/**
 * Validates a chess move.
 *
 * Use {@link #doMove(int theFromFile, 
 *                    int theFromRank, 
 *                    int theToFile, 
 *                    int theToRank)} to move a piece.
 *
 * @param theFromFile file from which a piece is being moved
 * @param theFromRank rank from which a piece is being moved
 * @param theToFile   file to which a piece is being moved
 * @param theToRank   rank to which a piece is being moved
 * @return            true if the move is valid, otherwise false
 */
boolean isValidMove(int theFromFile, 
                    int theFromRank, 
                    int theToFile, 
                    int theToRank) {
    // ...body
}

This markup was designed specifically for the JavaDoc tool. But guess what, those @param and @return tags are subject-domain markup. They are specific to the subject of and API routine. It is a trivial matter to take XML output from tools like this and feed it through your shared pipes to create API reference manuals that look the same as the rest of your documentation set and are also fully cross-linked with all your other docs.

This won’t always be the case, of course. Sometimes the existing silos will not contain the kinds of structures and metadata you need to drive all of the processes you need to apply to their content or data. In this case, you may need to negotiate with the keepers of those silos. But at least you will not be trying to get them to knock down their entire system and switch to using your complex single sourcing system. Instead, you can go to them and say, “We need these additional structures from you. What is the smallest change you can make to your current system to provide these structures, and what can we do to help you?”

As an added sweetener, you can point out that if they do this, they may no longer need the back-end publication part of their own system since they can publish through your shared pipes. That can significantly reduce their maintenance burden and make new functionality available to them without having to destroy the system they have worked so hard to fit to their current needs.

Finally, this is not just confined to existing systems that produce content. There are plenty of cases of systems that store subject-domain data for transactional purposes but don’t currently produce content at all. Often there is an entirely separate process in which writers look up content in these systems and create content that reports that information by hand. But once you have a shared pipes system that is capable of consuming subject-domain data and turning it into content, you can eliminate some or all of the manual content creation process around these systems.

Well run silos connected by shared pipes through which flow subject-domain structured content. It is an idea whose time may finally have come.

18 thoughts on “Time to move to multi-sourcing

  1. Alessandro Stazi

    Hi Mark. Very interesting your article, as usual. The model that you have presented it’s not so unusual in other technological fields. For example, in the management of digital identities of the big organizations, you have information stored on different target systems, for different transactional functionalities.
    From all these systems can be queried/collected information by a “shared process in the middle”, as you call it. In the example of digital identities, this element is an “Identity Governance System”, where data arriving from different sources (silos on the left, in your model) and used for different functionalites, are filtered/translated/combined/transformed and “presented” to the user.
    The complexity of the system is commonly concentrated into the “shared process in the middle”, while the “silos” (target systems) can work freely, with own rules/paradigma/approaches and different technologies. But in this schema, are very important also the links between the “silos” (on the left) and the “shared process in the middle”. Through these links, the we can call “connectors”, you can to negotiate how the different information arriving from silos have to be “re-arranged” for to be manipulated according to the publication needs (tagged/profiled for multi-channel, multi-user, etc.). I have some years of experience in these type of architectures and it’s a model very engaging but not so unusual in the dimension of “enterprise applications”, where typically you can find pre-existing systems/workflows/procedures that needs to be harmonized but that you cannot erase for starting by scratch with an another unified process. I will be waiting for the evolution of this discussion.

    Reply
    1. Mark Baker Post author

      Thanks for the comment, Alessandro. I agree, this model occurs in many places. Unfortunately, the content world frequently seems to be a step or two behind the rest of the world in its techniques and technology. Some of it, I suspect, comes from too much thinking by analogy, of trying to apply principles of physical organization, where centralization is of undoubted value, to digital organization, where identity is everything and location is irrelevant.

      Reply
      1. Chris Despopoulos

        I agree with your assessment here Mark… Old-school thinking about publication prevails. But I also think publication technology lags because of revenue models. Outside of print or social media that drive ad sales, publishing is seen as a cost liability. And social media revenue seems to focus on video more than print/text- or data-driven bots.

        Where I work, the last feature to get into the product is online or inline explanation of the product. People are clamoring for product features… Database, IT management, medical records, etc. Nobody wakes up in the morning and says, “I really need to READ DOCS today… I would pay for a better product for that.” Well, almost nobody. But I think the numbers are growing.

        It’s like that continent-sized pile of plastic in the Pacific. Everybody knows it’s there, everybody knows it’s a problem… Who’s going to spend money to clean it up? If there was a clear path to revenue gain we’d have a list of candidates. But it’s not a gain, it’s prevention of long-term loss. Our economic model rewards putting off the problem until the cost is put upon everybody. Well, one day we WILL clean it up.

        Reply
        1. Mark Baker Post author

          Chris, agreed, revenue models are a big part of it. They are a big part of it in another way as well, since they drive the design of most content tools. A distributed “shared pipes” system with simple front ends in multiple silos would allow writers to use simple editors and submit to simple local repositories. The shared pipes would then take over and collect and compile in the background.

          The problem with that model is that it is very tough for a vendor to make money from it. All the hard stuff takes place in the background with no one logged in. Computing cycles are as near free as makes no difference, so what exactly do you charge people for in that model?

          Software vendors essentially sell seats. To make any money, you need to sell more seats as your customers make more use of your software. If your customers can add users without buying more seats, it is difficult to make any money. So the only systems you are likely to find for sale are the ones that require a complex front end (usually WYSIWYG) and a constant connection to a back end repository (which is why tool vendors love DITA). That way, no one can work without tying up a seat and more use means you need more seats.

          But such a system design is incompatible with a shared pipes model. The parts and pieces you need to implement shared pipes, therefore, are unlikely to come from vendors. It is likely going to have to come from the open source community.

          Reply
  2. Chris Despopoulos

    This is very much in sync with microservices and IoT. The idea is to distribute processes as components that can be joined together to form larger “applications”. Then you have a client that either talks to a single entry-point to the group of components, or else it talks to several components individually — this presents the “application” to the user.

    There’s no reason documentation should be treated differently. Multiple sources should be treated as a single body of work. HTML sites have been doing this with mashups for a while now. But this should grow. As process distribution takes over, then documentation source should stay with the distributed components. That is the only way I see to scale with the increase in complexity.

    One thing I’ve worked on is stitching together distributed content in the client. I can’t post a diagram, so you have to imagine… Instead of many circles – to one diamond – to many squares, you can have many circles to many squares, where your client connects with one group of sources, and my client connects with a different group. So now you just have webs of clients and sources.

    It gets more interesting. For example, one source I connect to is our social site. So now I have a client that connects to specific distributed doc sources, and also to a social site that manages its own silo of sources. The client understands a specific protocol, and some doc sources serve that directly (in my case, raw DITA or HTML). But the client also uses microservice components to map that client protocol to different source protocols. To bridge to a new silo, you just need a mini component that maps to the client. The component only knows how to request source content, and how to translate it to the client protocol. The client stitches it together, because only the client knows what constitutes a viable “document” for the given request.

    That expands your diagram further to have many sources (the circles) connecting to many mediators (the diamonds), and you have many clients (the squares) connecting to many circles and diamonds.

    Reply
    1. Mark Baker Post author

      Thanks for the comment, Chris. I agree completely. I was deliberately vague about what the diamond represents, but what Sarah’s image of shared pipes suggest is that it is kind of a distribution system in which content passes through whichever pipe is appropriate for its source and destination types. Each process shares what it has in common and separates out what it unique to it.

      This does not necessarily mean that there is any one pipe, any one process, that every single piece of content passes through. In fact, as long as we are delivering to both static linear media and dynamic/hypertext media content should probably not be traveling through the same pipes, as there is no document domain format that is ideal for both types of media.

      This is another reason to work in the subject domain, so that you can algorithmically optimize content for both linear and hypertext, and for both dynamic and static deliver from a multitude of sources. Working in the document domain always means you are working in a source that is optimized for one medium or another. True independence requires a step back from the document domain and that automatically involves the use of multiple processing channels for multiple input source to multiple output destinations. Managing that almost certainly requires some form or central direction and distribution, but not any one central format, process, or repository.

      Reply
    2. Alessandro Stazi

      Hi Chris. When you speak about:
      “But the client also uses microservice components to map that client protocol to different source protocols. To bridge to a new silo, you just need a mini component that maps to the client.”
      … this exactly very close to my concept of “connector”, that is not only a dedicated channel for connecting the customer source with the “shared process in the middle” (more briefly “core process”), but also an active element for mapping the differences between the data standard developed target-side and core-process publication rules. Some months ago a colleague was presenting to me the needs to build an “harmonized” system where the “core process” was the proprietary CCMS. This CCMS had to be feeded by different doc sources and not was possible to oblige these sources (target systems) to produce doc in DITA format or S1000D or other standard. This was the very high level design and I don’t know if the project has been abandoned or not, but one of the most critic point was the development of “connectors”, in many cases the very tricky point of these architectures. As you have expressed in your comments, the final consumer of the published result, don’t care about the architecture used for producing it. So who want to pay for developing these ideas? Very likely, only few companies that have to manage large volumes of doc and have to speed-up the process of publishing. As you have mentioned, if we look to IoT ecosystems as “content producers” we fall in the schema of large volumes of contents to be managed and this could be a new frontier to explore.

      Reply
      1. Mark Baker Post author

        Thanks for the comment, Allesandro. Re “was not possible to oblige these sources (target systems) to produce doc in DITA format or S1000D or other standard”. Yes, the shoe has to be on the other foot. It has to be the responsibility of the central system to consume content in the format that silos create, not the responsibility of the silos to create content in the format the central system consumes.

        What the central system must say to the silos is, I need the content you produce to contain structures with the following semantics because I need those semantics in order to perform the operations you are asking me to perform.

        The problem today is that most central systems express those semantic requests in the form of specific document domain and management domain semantics that are equivalent to asking for a specific format such as DITA or S1000D. If they would instead express those semantic requests in subject domain terms, their requests would be much less onerous to comply with.

        Of course, this means that we are throwing the responsibility back to the silos to create subject domain content and that is certainly a major change for them. But it is a change to something that can greatly simplify their life and process and that requires them to know a great deal less about how the content systems works. It is also the same type of change that they have had to make to do every other kind of integration that have been involved in: providing rich data semantics specific to the domain. In a well regulated silo system, this is what every silo must do: expose its semantics for consumption by other silos.

        Reply
  3. Diego Schiavon

    Hello Mark, thank you for this post.

    I am evaluating new software for our department. One of my requirements is that it must easily scale to other departments a well, without getting too much in their way.

    The barrier to change must be as low as possible. I do not want to go around preaching the benefits of the new systems: it must fit into other people’s way of working as seamlessly as possible. Ideally, I would like to walk away from a system specialized for technical writers, to something less scary for casual users.

    So what you write about complex single-source systems vs. ad-hoc systems sounds about right, though I had not thought of multi-sourcing before.

    So what solutions would you suggest that support a multi-sourcing pattern?

    (SPFE, I know. I have not really tried it yet, but it is not really ready for production, or is it? So what other solutions, besides SPFE?)

    In some ways multi-sourcing was DITA’s core concept: a very general XML dialect that different companies or departments within a company can customize, then easily exchange customizations.

    But DITA is still XML, meant more for a public of professional technical communicators. It is not really meant for the “part-time contributors” you mention above. I believe the only tool that really applies to “part-time contributors” is Microsoft Word.

    Reply
    1. Chris Despopoulos

      Another approach that is not quite ready for production is Lightweight DITA. The idea is to A) Reduce the DITA element set and simplify the model and B) Implement the model in XML, Markdown, and HTML5. The goal is precisely what you mention… Lower the bar for part-time contributors.

      I’ll point out that current DITA does support Markdown to DITA in the Open Toolkit, and that oXygen does offer that support. But you’re still left with managing your source and implementing publishing streams. And implementing true multi-sourcing would be a DIY project.

      IMO, MS Word is going in the wrong direction… It’s a silo and it’s a proprietary format.

      Also beware… IMO the DITA content management systems I have bumped into can easily become their own silos. We use GIT to manage our DITA source.

      You might also look at Markdown and things like Jykell to product static sites. Maybe not a move toward multi-sourcing, but there’s interesting stuff in there.

      Reply
      1. Diego Schiavon

        Hi Chris,

        Lightweight DITA seems interesting, I will look into it, particularly the HTML5 version. I had come across it before, and dismissed it, but I do not remember why.

        Markdown is a non-starter: people writing documents would have to learn it first, which is exactly what I am trying to avoid. And Oxygen is not exactly “lowering the bar”. People should just continue using what they already use: Word or a browser.

        Word is proprietary, but I was thinking of Simply XML Content Mapper, that uses Word as DITA XML editor.

        I am not very enthusiastic about DITA, though. Most of the time I would have to work around the information types. Even Simply XML suggests to use the general topic template, bypassing even the most basic specialization. I really wonder why they did not use DocBook, since they are bending DITA to behave like DocBook 5 anyway.

        Our content management system would be… SharePoint. I know, lame. But it fits in the strategy of using what is already there, and what people feel comfortable using.

        Reply
        1. Mark Baker Post author

          Diego, this is what everyone wants, and what every vendor it trying to sell, but, for the reasons I outlined in my other reply, it is simply not possible. The way contributors work has to change for this model to be successful. The question is, how does it have to change and how can we make such a change congenial to the contributors.

          Subject domain structured writing is, I think, the last thing that could possibly work that has not already been tried on a broad scale. It has been tried, successfully, in individual cases, but most of the industry is still trying to make document domain solutions work despite repeated failures.

          We know subject-domain data gathering works because it is how we do data gathering in every other field — using forms. The question is if we can make it work for content. I’m convinced the answer is yes, if we can get past the idea that nothing must change for users. That is simply a non-starter. The whole problem is one of reliable data gathering and that is a problem that exists at the interface between the contributor and the system. That interface has to change if anything is going to work on the back end.

          Reply
    2. Mark Baker Post author

      Thanks for the comment, Diego.

      Yes, multi-sourcing was part of DITA’s core concept, in the sense that specialization was supposed to allow you to create different document types for different applications. This has run into three problems:

      • Specialization isn’t easy to do, either conceptually or technically.
      • Specialization mainly affects the document domain structures of the DITA model, not the management domain structures. Yet it is the management domain structures that make DITA difficult to learn and use on a casual basis. It is also the management domain features that constitute the argument for the value of basing everything on DITA rather than creating independent systems in each silo. In short, the DITA model fails to factor out the very operations that we need to factor out if we are going to make the shared pipes model attractive to those we are asking to share it.
      • The DITA world seems more interested in specializations created by committee for whole industries to use than in local specialization for individual silos to use in individual organizations.

      SPFE is not going to be ready for mainstream production use until it first gets some serious beta use. And it is not going to get serious beta use until I finish the docs and until I make the case in a compelling way. The EPPO book is my attempt to make the case for the bottom-up information architecture that SPFE is designed to support. The new book is my attempt to make the case for the subject-domain structured writing approach that SPFE is designed to support. And when that book is out I need to get back to the SPFE docs.

      As far as fitting into people’s work as seamlessly as possible is concerned, the fundamental constraints are these:

      1. The format people write in must be conformable to the shared pipes that will process it. You can take one structured format (MarkDown for instance) and conform it to a shared pipe that expects a different format (DocBook for instance) as long as you can map the structures of one to the structures of the other. Not every system currently in use in the enterprise is necessarily going to be conformable to whatever formats you choose for your shared pipes, so at minimum you are going to need to change the habits and/or tools of every department that is not currently producing conformable content. This may be most of them, since things like ad-hoc formatting in Word make Word documents non-conformable to just about any process not based on Word.
      2. The format created by the silo must contain the metadata semantics necessary to drive each of the processes applied to it by the shared pipes. How big an issue this is depends on how many functions you want the shared pipes to perform for you (as opposed to those that will be performed by each silo before feeding content to the shared pipes). The more you want the shared pipes to do, the more metadata you will need to include (in a way that is conformable to the metadata required by the pipes).

      If you need a lot of metadata to drive a lot of processes in your shared pipes, then you are probably going to have to change the format that the authors in the silos use to create content.

      If you set up a shared pipes system that has few processes, and thus requires little metadata, it is not going to be clear to anyone what benefit they will get from the shared process. Indeed, the shared process may end up being less capable than the individual systems they are using now.

      If you want you shared systems to be attractive, it is going to need to have a lot of features for people to take advantage of. If it is going to have a lot of features, it is going to have to collect a lot of metadata to drive those features. If it is going to collect a lot of metadata, it is going to require changing how people work today.

      This is why structured writing and common content infrastructure has always been such a hard sell. Either you give up features or you give up the familiar work environment.

      Subject-domain structured writing offers a way out here, not by hiding the need for metadata collection, but by using a form of metadata that is easy to collect. With a good subject domain structured writing system you can hopefully sell your various silos on changing the way you work by giving them something is that is easier to use than what they have now, and that requires them to think less about publication details and to focus more on their subject matter.

      But to make this truly easy, there are still parts of the puzzle that we need to develop and put in place.

      Reply
  4. Paul Hanson

    A lot to process…

    Working on a project that is meant to break down the existing silos. Right now, you have to know who provides a service – oh, the information is available in a OneNote file, but you don’t have access – rather than being able to search for the service from a central point. The point about forcing teams to use a new system is spot-on. One team has a wiki; another team has a labyrinth of folders: another team has 1000 PDFs on a website that doesn’t have a way to search for the terms you want. There’s no universal style guide – the PDFs are Word documents that use a table to control the layout for procedures.

    The new system that is to be used? No snippet functionality. With a sample set of 8 documents, 58% of the content existed in more than 1 document; 55% of the graphics existed in more than 1 document. I can’t in good faith imagine how that can scale when we have 100s of knowledge articles to maintain. I can’t find anyone else who uses that system who cares about content reuse.

    Thanks for the great article!

    Reply
  5. Haakon Dahl

    Great post. I don;t know beans about XML (I know what I know, and it’s hard and I don’t want to do it), but I have used the concept you describe in another way: federating databases, and for exactly the same reason. People have their chosen tools, and more often than not, use the tool itself as the data store and as the report. This is like drawing with crayons. Examples are keeping data in a running-update set of slides, or in a spreadsheet which cannot function as a list or table, because it is already partly aggregated.
    You;re never going to get these people to jump in the boat with you until they see some benefit. Another part of the strategy in your diagram (shared process in the middle) is to show them that their own update cycle, where the stuff in the squares hits the fan and then requires work in the circles, will get more manageable for having been partly normalized in the shared process. Why? because the shared process can now take their neighbors’ data and give it to them in their chosen (broken) format. You can;t fix these people, but you can incentivize them to fix themselves. Once they begin to see a light at the end of their own tunnels, they may get onboard with fixing their own circles, to align with a better neighbor.
    The shared process in the middle is not only an effective tool for making the processes compatible (on some level), but there is a meta-process enabled: IMPROVEMENT.

    Reply
    1. Mark Baker Post author

      Thanks for the comment, Haakon

      In many ways, structured writing, which does not have to mean XML, is a database, through a hierarchical rather than relational one, and so this process is really just another case of federating databases. And I agree about the value argument for the shared process. But the argument always has to be, “this will make your life easier” (and that argument actually has to be true). “This will make my life easier,” and “this will make everyone else’s life easier” arguments don’t cut it, and “do it because I said so” doesn’t produce good data even if you have the authority to say so.

      Reply
  6. Pingback: Is Single-Sourcing Dead? – Every Page is Page One

Leave a Reply