17 thoughts on “Time to move to multi-sourcing”

  1. Hi Mark. Very interesting your article, as usual. The model that you have presented it’s not so unusual in other technological fields. For example, in the management of digital identities of the big organizations, you have information stored on different target systems, for different transactional functionalities.
    From all these systems can be queried/collected information by a “shared process in the middle”, as you call it. In the example of digital identities, this element is an “Identity Governance System”, where data arriving from different sources (silos on the left, in your model) and used for different functionalites, are filtered/translated/combined/transformed and “presented” to the user.
    The complexity of the system is commonly concentrated into the “shared process in the middle”, while the “silos” (target systems) can work freely, with own rules/paradigma/approaches and different technologies. But in this schema, are very important also the links between the “silos” (on the left) and the “shared process in the middle”. Through these links, the we can call “connectors”, you can to negotiate how the different information arriving from silos have to be “re-arranged” for to be manipulated according to the publication needs (tagged/profiled for multi-channel, multi-user, etc.). I have some years of experience in these type of architectures and it’s a model very engaging but not so unusual in the dimension of “enterprise applications”, where typically you can find pre-existing systems/workflows/procedures that needs to be harmonized but that you cannot erase for starting by scratch with an another unified process. I will be waiting for the evolution of this discussion.

    1. Thanks for the comment, Alessandro. I agree, this model occurs in many places. Unfortunately, the content world frequently seems to be a step or two behind the rest of the world in its techniques and technology. Some of it, I suspect, comes from too much thinking by analogy, of trying to apply principles of physical organization, where centralization is of undoubted value, to digital organization, where identity is everything and location is irrelevant.

      1. I agree with your assessment here Mark… Old-school thinking about publication prevails. But I also think publication technology lags because of revenue models. Outside of print or social media that drive ad sales, publishing is seen as a cost liability. And social media revenue seems to focus on video more than print/text- or data-driven bots.

        Where I work, the last feature to get into the product is online or inline explanation of the product. People are clamoring for product features… Database, IT management, medical records, etc. Nobody wakes up in the morning and says, “I really need to READ DOCS today… I would pay for a better product for that.” Well, almost nobody. But I think the numbers are growing.

        It’s like that continent-sized pile of plastic in the Pacific. Everybody knows it’s there, everybody knows it’s a problem… Who’s going to spend money to clean it up? If there was a clear path to revenue gain we’d have a list of candidates. But it’s not a gain, it’s prevention of long-term loss. Our economic model rewards putting off the problem until the cost is put upon everybody. Well, one day we WILL clean it up.

        1. Chris, agreed, revenue models are a big part of it. They are a big part of it in another way as well, since they drive the design of most content tools. A distributed “shared pipes” system with simple front ends in multiple silos would allow writers to use simple editors and submit to simple local repositories. The shared pipes would then take over and collect and compile in the background.

          The problem with that model is that it is very tough for a vendor to make money from it. All the hard stuff takes place in the background with no one logged in. Computing cycles are as near free as makes no difference, so what exactly do you charge people for in that model?

          Software vendors essentially sell seats. To make any money, you need to sell more seats as your customers make more use of your software. If your customers can add users without buying more seats, it is difficult to make any money. So the only systems you are likely to find for sale are the ones that require a complex front end (usually WYSIWYG) and a constant connection to a back end repository (which is why tool vendors love DITA). That way, no one can work without tying up a seat and more use means you need more seats.

          But such a system design is incompatible with a shared pipes model. The parts and pieces you need to implement shared pipes, therefore, are unlikely to come from vendors. It is likely going to have to come from the open source community.

  2. This is very much in sync with microservices and IoT. The idea is to distribute processes as components that can be joined together to form larger “applications”. Then you have a client that either talks to a single entry-point to the group of components, or else it talks to several components individually — this presents the “application” to the user.

    There’s no reason documentation should be treated differently. Multiple sources should be treated as a single body of work. HTML sites have been doing this with mashups for a while now. But this should grow. As process distribution takes over, then documentation source should stay with the distributed components. That is the only way I see to scale with the increase in complexity.

    One thing I’ve worked on is stitching together distributed content in the client. I can’t post a diagram, so you have to imagine… Instead of many circles – to one diamond – to many squares, you can have many circles to many squares, where your client connects with one group of sources, and my client connects with a different group. So now you just have webs of clients and sources.

    It gets more interesting. For example, one source I connect to is our social site. So now I have a client that connects to specific distributed doc sources, and also to a social site that manages its own silo of sources. The client understands a specific protocol, and some doc sources serve that directly (in my case, raw DITA or HTML). But the client also uses microservice components to map that client protocol to different source protocols. To bridge to a new silo, you just need a mini component that maps to the client. The component only knows how to request source content, and how to translate it to the client protocol. The client stitches it together, because only the client knows what constitutes a viable “document” for the given request.

    That expands your diagram further to have many sources (the circles) connecting to many mediators (the diamonds), and you have many clients (the squares) connecting to many circles and diamonds.

    1. Thanks for the comment, Chris. I agree completely. I was deliberately vague about what the diamond represents, but what Sarah’s image of shared pipes suggest is that it is kind of a distribution system in which content passes through whichever pipe is appropriate for its source and destination types. Each process shares what it has in common and separates out what it unique to it.

      This does not necessarily mean that there is any one pipe, any one process, that every single piece of content passes through. In fact, as long as we are delivering to both static linear media and dynamic/hypertext media content should probably not be traveling through the same pipes, as there is no document domain format that is ideal for both types of media.

      This is another reason to work in the subject domain, so that you can algorithmically optimize content for both linear and hypertext, and for both dynamic and static deliver from a multitude of sources. Working in the document domain always means you are working in a source that is optimized for one medium or another. True independence requires a step back from the document domain and that automatically involves the use of multiple processing channels for multiple input source to multiple output destinations. Managing that almost certainly requires some form or central direction and distribution, but not any one central format, process, or repository.

    2. Hi Chris. When you speak about:
      “But the client also uses microservice components to map that client protocol to different source protocols. To bridge to a new silo, you just need a mini component that maps to the client.”
      … this exactly very close to my concept of “connector”, that is not only a dedicated channel for connecting the customer source with the “shared process in the middle” (more briefly “core process”), but also an active element for mapping the differences between the data standard developed target-side and core-process publication rules. Some months ago a colleague was presenting to me the needs to build an “harmonized” system where the “core process” was the proprietary CCMS. This CCMS had to be feeded by different doc sources and not was possible to oblige these sources (target systems) to produce doc in DITA format or S1000D or other standard. This was the very high level design and I don’t know if the project has been abandoned or not, but one of the most critic point was the development of “connectors”, in many cases the very tricky point of these architectures. As you have expressed in your comments, the final consumer of the published result, don’t care about the architecture used for producing it. So who want to pay for developing these ideas? Very likely, only few companies that have to manage large volumes of doc and have to speed-up the process of publishing. As you have mentioned, if we look to IoT ecosystems as “content producers” we fall in the schema of large volumes of contents to be managed and this could be a new frontier to explore.

      1. Thanks for the comment, Allesandro. Re “was not possible to oblige these sources (target systems) to produce doc in DITA format or S1000D or other standard”. Yes, the shoe has to be on the other foot. It has to be the responsibility of the central system to consume content in the format that silos create, not the responsibility of the silos to create content in the format the central system consumes.

        What the central system must say to the silos is, I need the content you produce to contain structures with the following semantics because I need those semantics in order to perform the operations you are asking me to perform.

        The problem today is that most central systems express those semantic requests in the form of specific document domain and management domain semantics that are equivalent to asking for a specific format such as DITA or S1000D. If they would instead express those semantic requests in subject domain terms, their requests would be much less onerous to comply with.

        Of course, this means that we are throwing the responsibility back to the silos to create subject domain content and that is certainly a major change for them. But it is a change to something that can greatly simplify their life and process and that requires them to know a great deal less about how the content systems works. It is also the same type of change that they have had to make to do every other kind of integration that have been involved in: providing rich data semantics specific to the domain. In a well regulated silo system, this is what every silo must do: expose its semantics for consumption by other silos.

  3. Hello Mark, thank you for this post.

    I am evaluating new software for our department. One of my requirements is that it must easily scale to other departments a well, without getting too much in their way.

    The barrier to change must be as low as possible. I do not want to go around preaching the benefits of the new systems: it must fit into other people’s way of working as seamlessly as possible. Ideally, I would like to walk away from a system specialized for technical writers, to something less scary for casual users.

    So what you write about complex single-source systems vs. ad-hoc systems sounds about right, though I had not thought of multi-sourcing before.

    So what solutions would you suggest that support a multi-sourcing pattern?

    (SPFE, I know. I have not really tried it yet, but it is not really ready for production, or is it? So what other solutions, besides SPFE?)

    In some ways multi-sourcing was DITA’s core concept: a very general XML dialect that different companies or departments within a company can customize, then easily exchange customizations.

    But DITA is still XML, meant more for a public of professional technical communicators. It is not really meant for the “part-time contributors” you mention above. I believe the only tool that really applies to “part-time contributors” is Microsoft Word.

    1. Another approach that is not quite ready for production is Lightweight DITA. The idea is to A) Reduce the DITA element set and simplify the model and B) Implement the model in XML, Markdown, and HTML5. The goal is precisely what you mention… Lower the bar for part-time contributors.

      I’ll point out that current DITA does support Markdown to DITA in the Open Toolkit, and that oXygen does offer that support. But you’re still left with managing your source and implementing publishing streams. And implementing true multi-sourcing would be a DIY project.

      IMO, MS Word is going in the wrong direction… It’s a silo and it’s a proprietary format.

      Also beware… IMO the DITA content management systems I have bumped into can easily become their own silos. We use GIT to manage our DITA source.

      You might also look at Markdown and things like Jykell to product static sites. Maybe not a move toward multi-sourcing, but there’s interesting stuff in there.

      1. Hi Chris,

        Lightweight DITA seems interesting, I will look into it, particularly the HTML5 version. I had come across it before, and dismissed it, but I do not remember why.

        Markdown is a non-starter: people writing documents would have to learn it first, which is exactly what I am trying to avoid. And Oxygen is not exactly “lowering the bar”. People should just continue using what they already use: Word or a browser.

        Word is proprietary, but I was thinking of Simply XML Content Mapper, that uses Word as DITA XML editor.

        I am not very enthusiastic about DITA, though. Most of the time I would have to work around the information types. Even Simply XML suggests to use the general topic template, bypassing even the most basic specialization. I really wonder why they did not use DocBook, since they are bending DITA to behave like DocBook 5 anyway.

        Our content management system would be… SharePoint. I know, lame. But it fits in the strategy of using what is already there, and what people feel comfortable using.

        1. Diego, this is what everyone wants, and what every vendor it trying to sell, but, for the reasons I outlined in my other reply, it is simply not possible. The way contributors work has to change for this model to be successful. The question is, how does it have to change and how can we make such a change congenial to the contributors.

          Subject domain structured writing is, I think, the last thing that could possibly work that has not already been tried on a broad scale. It has been tried, successfully, in individual cases, but most of the industry is still trying to make document domain solutions work despite repeated failures.

          We know subject-domain data gathering works because it is how we do data gathering in every other field — using forms. The question is if we can make it work for content. I’m convinced the answer is yes, if we can get past the idea that nothing must change for users. That is simply a non-starter. The whole problem is one of reliable data gathering and that is a problem that exists at the interface between the contributor and the system. That interface has to change if anything is going to work on the back end.

    2. Thanks for the comment, Diego.

      Yes, multi-sourcing was part of DITA’s core concept, in the sense that specialization was supposed to allow you to create different document types for different applications. This has run into three problems:

      • Specialization isn’t easy to do, either conceptually or technically.
      • Specialization mainly affects the document domain structures of the DITA model, not the management domain structures. Yet it is the management domain structures that make DITA difficult to learn and use on a casual basis. It is also the management domain features that constitute the argument for the value of basing everything on DITA rather than creating independent systems in each silo. In short, the DITA model fails to factor out the very operations that we need to factor out if we are going to make the shared pipes model attractive to those we are asking to share it.
      • The DITA world seems more interested in specializations created by committee for whole industries to use than in local specialization for individual silos to use in individual organizations.

      SPFE is not going to be ready for mainstream production use until it first gets some serious beta use. And it is not going to get serious beta use until I finish the docs and until I make the case in a compelling way. The EPPO book is my attempt to make the case for the bottom-up information architecture that SPFE is designed to support. The new book is my attempt to make the case for the subject-domain structured writing approach that SPFE is designed to support. And when that book is out I need to get back to the SPFE docs.

      As far as fitting into people’s work as seamlessly as possible is concerned, the fundamental constraints are these:

      1. The format people write in must be conformable to the shared pipes that will process it. You can take one structured format (MarkDown for instance) and conform it to a shared pipe that expects a different format (DocBook for instance) as long as you can map the structures of one to the structures of the other. Not every system currently in use in the enterprise is necessarily going to be conformable to whatever formats you choose for your shared pipes, so at minimum you are going to need to change the habits and/or tools of every department that is not currently producing conformable content. This may be most of them, since things like ad-hoc formatting in Word make Word documents non-conformable to just about any process not based on Word.
      2. The format created by the silo must contain the metadata semantics necessary to drive each of the processes applied to it by the shared pipes. How big an issue this is depends on how many functions you want the shared pipes to perform for you (as opposed to those that will be performed by each silo before feeding content to the shared pipes). The more you want the shared pipes to do, the more metadata you will need to include (in a way that is conformable to the metadata required by the pipes).

      If you need a lot of metadata to drive a lot of processes in your shared pipes, then you are probably going to have to change the format that the authors in the silos use to create content.

      If you set up a shared pipes system that has few processes, and thus requires little metadata, it is not going to be clear to anyone what benefit they will get from the shared process. Indeed, the shared process may end up being less capable than the individual systems they are using now.

      If you want you shared systems to be attractive, it is going to need to have a lot of features for people to take advantage of. If it is going to have a lot of features, it is going to have to collect a lot of metadata to drive those features. If it is going to collect a lot of metadata, it is going to require changing how people work today.

      This is why structured writing and common content infrastructure has always been such a hard sell. Either you give up features or you give up the familiar work environment.

      Subject-domain structured writing offers a way out here, not by hiding the need for metadata collection, but by using a form of metadata that is easy to collect. With a good subject domain structured writing system you can hopefully sell your various silos on changing the way you work by giving them something is that is easier to use than what they have now, and that requires them to think less about publication details and to focus more on their subject matter.

      But to make this truly easy, there are still parts of the puzzle that we need to develop and put in place.

  4. A lot to process…

    Working on a project that is meant to break down the existing silos. Right now, you have to know who provides a service – oh, the information is available in a OneNote file, but you don’t have access – rather than being able to search for the service from a central point. The point about forcing teams to use a new system is spot-on. One team has a wiki; another team has a labyrinth of folders: another team has 1000 PDFs on a website that doesn’t have a way to search for the terms you want. There’s no universal style guide – the PDFs are Word documents that use a table to control the layout for procedures.

    The new system that is to be used? No snippet functionality. With a sample set of 8 documents, 58% of the content existed in more than 1 document; 55% of the graphics existed in more than 1 document. I can’t in good faith imagine how that can scale when we have 100s of knowledge articles to maintain. I can’t find anyone else who uses that system who cares about content reuse.

    Thanks for the great article!

  5. Great post. I don;t know beans about XML (I know what I know, and it’s hard and I don’t want to do it), but I have used the concept you describe in another way: federating databases, and for exactly the same reason. People have their chosen tools, and more often than not, use the tool itself as the data store and as the report. This is like drawing with crayons. Examples are keeping data in a running-update set of slides, or in a spreadsheet which cannot function as a list or table, because it is already partly aggregated.
    You;re never going to get these people to jump in the boat with you until they see some benefit. Another part of the strategy in your diagram (shared process in the middle) is to show them that their own update cycle, where the stuff in the squares hits the fan and then requires work in the circles, will get more manageable for having been partly normalized in the shared process. Why? because the shared process can now take their neighbors’ data and give it to them in their chosen (broken) format. You can;t fix these people, but you can incentivize them to fix themselves. Once they begin to see a light at the end of their own tunnels, they may get onboard with fixing their own circles, to align with a better neighbor.
    The shared process in the middle is not only an effective tool for making the processes compatible (on some level), but there is a meta-process enabled: IMPROVEMENT.

    1. Thanks for the comment, Haakon

      In many ways, structured writing, which does not have to mean XML, is a database, through a hierarchical rather than relational one, and so this process is really just another case of federating databases. And I agree about the value argument for the shared process. But the argument always has to be, “this will make your life easier” (and that argument actually has to be true). “This will make my life easier,” and “this will make everyone else’s life easier” arguments don’t cut it, and “do it because I said so” doesn’t produce good data even if you have the authority to say so.

Leave a Reply

Your email address will not be published. Required fields are marked *