Parts and Provenance

By | 2012/09/27

One of the most neglected aspects of the discussion of topic-based writing is that of provenance.

Every technical document has provenance of some kind. It may be a highly structured and elaborate provenance, such as certification according to a standard performed by an outside agency, or it may be the implicit provenance of being published by a brand name company.

Because of the implied provenance (not to mention implied liability) that goes with the publication of content under their name, most companies have some more or less formal and rigorous process for proving documents before they are released. In technical communications, this usually take the form of review and/or sign-off on a document by engineering and product managements (and sometimes legal).

When we move from writing and proving books, to writing topics, the question of how topics get proved, and how provenance attaches to topics and to collections of topics becomes important.

For example, if you take a document that has provenance, break it down into reusable blocks, do the blocks inherit the provenance of the book from which they were derived? If you assemble new books from those blocks, do those books inherit the provenance of the original book? Generally speaking, the answer to this question has to be no.

Car being fixed

Just because the parts have provenance does not mean that the thing you assemble from the parts has provenance. (Image courtesy of Surachai /

Consider the case of a car. Suppose the car has had a safety check and is then

disassembled into its constituent parts and reassembled again. Should the safety check still be considered valid? Probably not, since there is no guarantee that the parts were reassembled correctly. Suppose the person who put the car back together did not bleed the brake lines. All the parts are back in their proper places, but now the brakes don’t work. Before the car goes back on the road, the car as a whole needs to be checked for safety.

Many topic-based writing teams get around this problem in a very simple way. They construct books out of reusable blocks, then get the resulting book reviewed in full to establish its provenance. Provenance is never applied to the parts or to the method of assembly, only to the finished book.

This is fine if you are still working in a traditional book-oriented fashion, in which:

  • A limited number of books are being produced
  • There is a significant period of time between one book release and the next
  • Books are being written for a mass audience, with no customization for individuals or groups.
  • There is sufficient time in the schedule for review of books before product release.

But what if you want to deliver information incrementally in topics, or deliver custom information to individuals or groups. What if you want to deliver information with high frequency and immediacy? What if you have a large number of products based on the same technology, and you want to avoid having to have people review each and every book to establish its provenance? In any of these cases, proving the result of an assembly of reusable blocks is going to be both time consuming and expensive.

Old car parts

Just because the car had provenance, does not mean that the parts you take off the car have provenance.  (Image courtesy of zirconicusso/

The only way to escape the requirement to prove the result is to prove the pieces. But there are two problems with proving the pieces:

First, proving the pieces requires that the pieces have a logical completeness that allows them to be proved on their own. I’ve written before about the need for a topic to merit its metadata. Provenance is essentially a piece of metadata, but the strength of the provenance relies on all the other pieces of metadata. If the provenance says that the thing is approved, the rest of the metadata says what it is that is being approved, and what it is being approved as. You can’t provide provenance for something that is too small or too incomplete to merit everything its metadata says about it, because the provenance only applies within the scope defined by the metadata.

Second, just because the parts have provenance does not automatically mean that the whole has provenance. Individual car parts all have to conform to various standards, and generally carry their provenance on the box they ship in. But building a car entirely out of proven parts does not mean you have a proven car. You could have assembled the parts in the wrong way, or omitted essential safety systems altogether. The provenance of parts, no matter how comprehensive, does not grant provenance to the whole.

This does not necessarily mean that you have to prove every finished assembly. You don’t have to test every car that comes off the assembly line if you can prove the reliability of the assembly line process itself. Car manufacturers have increasingly moved the this model to reduce costs and improve quality.

But to apply this model to content, you would need to define an assembly process for content that could be proved in the same way. The first prerequisite of this would be to prove the individual content blocks to a sufficient level that they could be guaranteed to perform uniformally when assembled by the standard process. This returns us to the problem of making the parts logically complete enough to merit sufficient metadata that their provenance can be established. In other words, each part has to be something a developer or product manager could confidently sign off, knowing exactly what they were signing off.

The biggest difficulty with this approach, though, is that while an automobile assembly line does make several different configurations of the same car, it is mostly producing exactly the same model over and over again. Proving a process for assembling the same object over and over is one thing. Proving a process that produces a different object each time is something quite different.

Where does this leave us? If proving every assembly of blocks it too expensive, then and alternate approach is to create collections of topics with individual provenance. An Every Page is Page One topic that meets the standard of a narrative minim should be provable in its own right. That is, is should be provable as its stands, without regard or reference to any other topics.

Creating multiple collections of independently proven topics, therefore, presents much less of a challenge than creating multiple proven books from collections of either blocks, whether proven or not.

Collection of jewelery

Unlike an assembly, a collection of proven items can inherit its provenance from the provenance of the items in the collection. (Image courtesy of duron123 /

There are, to be sure, some things you might want to prove about a collection, such as completeness. But completeness can actually be easier to demonstrate when you are dealing with a collection of proven topics, as the topics give you a set of units the completeness or incompleteness should be calculable based on their metadata, providing (and this is the hard part) that you have a standard of completeness against which to judge it.

The same problem exist for books as well. One of the things that the refactoring of books into genuine topics always reveals is vast gaps in the coverage of the original material. Measuring the completeness of books is very tough. Measuring the completeness of a collection of topics is actually easier, though still tough.

But as far as provenance goes, a collection makes far fewer claims about itself than a book does. In fact, the main claim a collection makes about itself is that each item in the collection is proven — a claim which can easily be established based on the provenance of the pieces individually. You can thus assemble various collections of separately proven topics without the need to prove the collection. The provenance of the collection is derived from the proven metadata of the pieces, which can be established by software. This provides a much more robust approach to creating more dynamic and responsive information sets.

In the end, of course, a lot will depend on what kind of provenance you wish to provide, or are required to provide. But the issue of provenance, and the management and propagation of that provenance, is one that should be carefully considered as you plan to walk down this road.


4 thoughts on “Parts and Provenance

  1. Rob Echlin

    Your collection is not just a collection of topics, it’s also a collection of topic-sets. You can consider those sets as a higher level provable item that your Info Architect with the writing team and the development team will consider at the start of your project:
    What kinds of information or sets of information need to be in the set of sets?
    The results of this analysis is a document that provides a type of provenance – we thought about the whole of the collection and here is what we decided. I am sure ISO XX00X has a process for that.

    You also have the set of relationships that you define in your topics. When you create soft links (some people call them “mentions”) to identify topics that you want to refer to, you are asking for a link to that topic. You can audit the unsatisfied “soft links” or “mentions” to find out what you wanted to link to.
    Your audit meeting will, among other tasks, categorize the results:
    – don’t care
    – care, but will put off writing the target topic for whatever reason
    – add metadata some place so the mention links to an existing topic (entry in the topic, id in the soft link)
    – reword so that the mention makes a link
    – write a new topic or set of topics that satisfy the need

    Again, the results of that audit are a record that forms part of the provenance of your document set.

    1. Mark Baker Post author

      Hi Robert. Thanks for the comment.

      It certainly could be as you say, with collections made up of other collections. The word “set” is a useful one in this context, because “set” implies a defined criteria for membership, and thus the provenance of the set is that all the members of the set meet the criteria for being members of the set and that, for finite sets, all the members required are present.

      Sets don’t have to be finite, of course, and they don’t have to be planned up front. The waterfall approach of defining the entire project up front, and then judging the outcome against the original criteria, is certainly still common. But in a more agile environment, or in an environment where you are continuing to respond to developing information needs over the life-span of the project, than an upfront list of topic set members is less useful. In those cases you may be less interested in completing the set, and more interested in having new content proved separately and then automatically slotted into the right set or sets based on its provenance.

      Part of the virtue of Every Page is Page One topics in an open ended topic collection is that you can have closed provenance for the individual topics, while having open provenance for the collection of topics. This can help inoculate your content against the kind of decay of order that generally occurs when books are maintained and updated over multiple releases.

      I chose the word “collection” to be as generic as possible, but the more specific “set” is definitely apropos here. For one thing, it helps reinforce the idea that you are creating a system in which the provenance of the set can be determined by looking at the provenance of the members of the set, which cannot be said of an assembly.

  2. Alex Knappe

    Proving documents in a somewhat automated process is a hard task. There’s lots of variables in the process to take care of:
    – integrated parts of the product described
    – the environment of the product
    – use cases
    – audience of the document
    – security obligations
    – etc.

    Defining a process to ensure the provenance of a series of documents reaches a point of impossibility very soon, the more documents are involved.

    Simply approving topics or collections of topics doesn’t do the trick. A valid document always has to take context into account. A human reviewer will (should) be able to put single steps, use cases, whole topics and all sorts of metadata into context and will therefore be able to approve a document as a whole. Defined and automated processes can only check the liability of smaller parts.

    Lets take a bridge as an example:
    You can do everything right, from the planning, to calculating statics, to construction, to connecting it with roads and so on. Problem is, it’s located on a plain, no river, road or valley to cross.
    The whole process might have been streamlined and everything about it is proven – but the result is a total waste of effort as it wasn’t put into the context of its surrounding.

    I don’t want to say, that a lot of the review process can’t be automated. What I want to say is, that at some point in the process, you will need a human reviewer to check, if everything makes sense and is put into the correct context.

    This once might change, when products can “speak” for themselves, but as long as this doesn’t happen, you’ll have the need to check (almost) every documents in a review process.

    1. Mark Baker Post author

      Hi Alex. Thanks for the comment.

      This is precisely the point I am making. You can’t derive the provenance of a document from the provenance of the information blocks from which it is constructed because context matters. The whole has to be proven as a whole.

      On the other hand, you can create a set of context-independent topics without having to prove the whole as a whole. By way of analogy, you can prove a tool box, and then fill it with proven tools, without having to re-prove the whole as a whole, because each tool, and the box itself, continues to function as proved independently of each other.

      This distinction has big implications for the cost of the review process — which matters because review is always on the critical path, and thus always directly impacts product release dates. Thus, the cost of review has to be counted not only in man hours, but in opportunity cost. (This is why product management and engineering so often refuse to hold up release schedules to ensure proper review.)

      Thus, a system that allows a context-independent topics to be reused in a collection without having to be re-proven in context in a document is such a big potential win.


Leave a Reply