8 thoughts on “We Must Create Mutable and Addressable Content

  1. Roy MacLean

    The Addressability – semantic, rather than merely structural – is essentially the embedding of an object model in the documentation content. So with your cookbook, we have Recipe, containing multiple Ingredients (usage, with quantities), of Products (type, units, labels such as ‘organic’), provided by Suppliers, etc, etc. Products and maybe Suppliers might be represented by separate topics. This then allows me to produce range of specific outputs – a Vegan Barbeque Cookbook, or whatever. It also, of course, allows me to query my content – my model, really.

    The key to this is, as you say, the domain-specific markup – what would be a DITA domain – here for recipes and foodstuffs. I’m trying to move in this direction with our System Definition, and other, documentation.

    1. Mark Baker Post author

      Roy, yes, it is very reasonable to see this as embedding the object model inside the content, though I would make the point that the first consideration for the content model is that is should be designed first and foremost to get good data entry for authors. Asking authors to write into a model that is optimized for the computation of outputs is not likely to get you the most efficient or accurate authoring, so I would look at the content structures as being designed to capture from authors the data needed to create an object model for processing, not necessarily being that object model.

      This, of course, is where the the mutability of content comes into play. So many systems are built as if markup was not mutable, and therefore require authors to work in a format that is designed as the input to a publishing engine, which, in turn, requires the author to know far more than they should about how the publishing engine works. DITA is particularly guilty of this, as is Docbook.

      It should be the first principle of markup design that authors should not have to know anything about how the publishing system works. The authoring format should be designed for authoring and for authoring alone, and should be transformed behind the scenes into a format suitable for publishing.

      Failure to do this often results either in the authoring fraternity being artificially reduced by the need to understand the publishing system (a problem also in FrameMaker, of course) or in the markup being reduced to such simplicity and generality that it can be created in Word, or something like Word, without any mutable or addressable structure being created.

  2. Alan Houser

    I’m tickled to have motivated this blog post (I’m @arh on Twitter).

    Building a bit on Roy, the previous commenter —

    The DITA cognoscenti would likely propose that you overlooked the possibility of defining a DITA specialization to represent the “recipe” information model. Such a specialization would provide the semantic metadata necessary to support the addressability and mutability you describe, and provide the added benefit of leveraging DITA-aware authoring and publishing tools and content-management systems.

    Do you have any reaction to the idea that the DITA specialization mechanism was designed specifically to support this scenario? Or is that another blog post? 🙂

    1. Mark Baker Post author

      Alan, thanks for the comment.

      I frequently make the point that there are two DITAs. The first is the generic concept/task/reference DITA that you get more or less out of the box, and which carries with it the (unintentional?) implication that concept, task, and reference topics should inherently be kept separate (with which I strongly disagree — /2013/01/08/confusing-analytic-and-synthetic-truths-in-defining-topic-types/). It does not create addressable mutable content (as illustrated above).

      The second DITA is the DITA of specialization, which is essentially another XML schema language alongside XSD, DTDs, Schematron, and RelaxNG. Like any of these, it can be used to create schemas for creating mutable and addressable content. And like any of these, its use does not automatically ensure you get mutable addressable content. That is all in the specifics of your schema design.

      If you want to create mutable and addressable content, you can choose any of these schema languages to define your markup language. Of the five, DITA is by far the most complex and cumbersome, without having any more expressive power than any of the others. From a standalone point of view, it would make no sense to choose DITA for this purpose.

      The one property that DITA has that these others do not is the ability to support what Eliot Kimber calls “blind exchange”. That is, a schema defined as a specialization of a base DITA topic type can be published by a tool chain designed for the base topic type. Not published well, necessarily. Not even published in any kind of comprehensible or readable form, necessarily, but the transformation will at least run and produce an output.

      Is this a good thing or a bad thing? Personally, I find it hard to imagine a real world use case for blind exchange. It seems to me that if I could have no assurance of how a foreign DITA topic was going to be formatted by my tools, and if I had no knowledge of its specific semantics, I would rather just have a well formatted PDF produced by the originating system. But I leave it to others to decide if they have a use case for it.

      The big difference I think blind exchange makes in practical terms is that it makes DITA a weakly typed schema language, where the other four are all strongly typed languages. (Weak typing means that one type can be automatically transformed to another type without raising an error, and strong typing means that all type transformations must be specific. Weak typing makes it quicker to write small programs, but can allows errors to go undetected. Strong typing requires more code but is more robust.)

      In the programming world there are good reasons for having both weakly typed and strongly typed languages, so again, people will have to decide for themselves which fits their use case.

      But for me, I always want my markup languages to be strongly typed. The point of creating mutable and addressable content is to allow content to be processed by algorithms, without the need for human supervision or review. To achieve that, you want to minimize the possibility of uncaught errors, and for that you want strong data typing.

      So yes, DITA specialization is an option for creating mutable addressable content, but it is not the option I would choose in most cases.

    2. Roy MacLean

      (Sorry for the delayed response).

      The kind of specialization I’m talking about in DITA is a ‘domain’ not ‘structural’ specialization. You specialize the keyword element to create a ‘system domain’ vocabulary. I work with a financial message-processing system, so there is a domain element for Interface, Message Type, Processing Stage, Alert, etc, etc. This is easy enough for authors to use – it’s a kind of ‘semantic highlighting – and is fairly easily extensible. In, say, XHTML output, the markup becomes a span with @class=’msgtype’, or whatever.

      1. Mark Baker Post author

        Hi Roy. Thanks for the comment. They are welcome at any remove.

        I do a similar thing is SPFE with “mention markup”, though I use the native features of XML Schema to implement it (I’m not quite sure why DITA has to reinvent so much functionality that standard XML tools already provide. Perhaps it is a matter of age, as it predates namespaces and XML Schema which provide ways to do everything that specialization and the subject schema do, except blind exchange.) “Semantic highlighting” is a great was to describe it, and it is definitely a component of mutable and addressable content. It is not enough, by itself, to enable all the addressability and mutability you might want, but it certainly a big step.

  3. John Tait

    DITA allows the specialisation of profiling attributes. Profiling attributes can be added to elements in topics, as well as to topicrefs in maps.

    It’s simple enough for writers like me to understand and I think it could be useful for some basic publishing requirements. (Publish only the topics/sections with the attribute receipe=”lamb”.)

    I’d be interested on your thoughts on this and how it relates to addressability.

    On blind exchange, I’d never seen a custom specialised topic. It would be interesting to see a real example of a publication that combines the OASIS specialisations with some custom specialised topics.

  4. Mark Baker Post author

    Thanks for the comment John.

    Yes, you can use attributes to provide addressability. However, I generally prefer to base addressability primarily on elements. One of the things you need to make addressability useful is consistency and reliability, so I look for ways to make it easiest to guide the writer and to provide feedback and validation of the content that needs to be reliably addressed. Elements offer more guidance and validation opportunities than attributes.

    It very much comes down to the distinction between strong and weak topic typing, as I described it above in my reply to Alan. Attributes allow you to add addressability to a weakly typed or highly general schema (as in HTML microformats), but with less reliability. Custom authoring schemas specific to a particular kind of content offer strong typing and greater validation, as well as hiding content management and publishing issues from authors (something I think is an important next step of structured content: /2013/01/28/we-must-remove-publishing-and-content-management-concerns-from-authoring-systems/)


Leave a Reply