14 thoughts on “We Must Remove Publishing and Content Management Concerns from Authoring Systems”

  1. It would be nice to think that the key providers of those CMS/authoring tools would read this post and take action. I would be happy to forward it if I had the right contact info for the people it should go to…

    About spiffy, Mark, I haven’t seen any updates on it for a while. Maybe you could do a post on where you’re at with it?

    1. Hi Laurie, thanks for the comment.

      Actually I’m not sure that it is the CMS/Authoring tool vendors who need to get the message so much as information architects and content strategists.

      To create an authoring schema that does not force the writer to deal with content management and publishing issues, you have to create a structured that captures enough of the semantics of the subject matter itself that the content management and publishing issues can be handled on the back end without the author’s intervention. That generally means that the markup needs to be quite specific to the material.

      One example I use a lot is related to linking. If you use a markup that has specific link markup in it, you force the author to deal with the specifics of linking, to select the resource to link to, and to manage the link. On the other hand, if you create markup like this:

      <p><actor>John Wayne</actor> plays an ex-Union colonel in <movie>Rio Lobo</movie>.</p>

      Then the author only has to mark up the facts that John Wayne is the name of an actor and Rio Lobo is the name of a movie. On the back end, those words can be linked to appropriate resources without the author needing to know anything about how linking works or what to link to.

      Most authoring tools will let you create that markup. Any good CMS should let you insert the processing into the workflow. But it is the information architect or the content strategist that has to put the right authoring markup in place.

      I’m continuing to work on the SPFE Open Toolkit, but the combination of client work and the book I am writing on Every Page is Page One means progress is slow at the moment.

      1. Mark,
        Wow, needless to say, I LOVE this post. Thanks so much for expanding on my earlier comments.

        In my experience, another group who really need to read this are the vp- and c-level execs who approve funding. Content strategists and info architects and product managers need to know what to ask for, but the VP and C level folks have to know why to approve funding for ongoing operational support, and for back-end systems that are as user-friendly as their customer-facing systems. Amen!!

  2. I will note that 2012 is the year that some of the major players in the XML editor field have started to understand this point. I would point out, especially, oXygen and Adobe’s Technical Communication Suite 4 as examples of two different approaches to rendering DITA more user-friendly and accessible to non-specialists.

    It’s particularly worth noting that Adobe did this at the same time that they finally got serious about XML and DITA in Framemaker.

    As both Mark and Laura emphasize, this doesn’t mean that it’s a “done deal.” There is a long way to go – and nowhere more obviously than in the CMS domain.

    I’d love it if CMS companies would talk to some specialists in usability and user experience, and start to think more seriously about how to present content taxonomies and management functionalities in a way that doesn’t require even the geekiest of tech comms to spend at least a week just getting up to speed with the basics.

    1. Ray, thanks for the comment.

      There has definitely been progress made on making DITA easier to use, but as far as I know, this consists of making the publishing and content management tasks (such as linking, map making, reuse) easier to do mechanically, not in removing them from the author’s task altogether. Even if these tasks are made mechanically easier to perform, the author still has to understand what is going on behind the curtain in order to do them correctly.

      What I am talking about is not making these tasks easier to perform, but removing them from the authors task altogether and asking the author for nothing but an explicit structuring of the subject matter itself.

  3. I agree completely. I have been saying for years that the complexity of the authoring tools is standing in the way of effective structured writing. As co-manager of the DITA for Enterprise Documents committee at OASIS, we struggled with it all the time.

    It needs to be as easy to create structured content as it is to create Word content. Quark XML Author does do this. DITA Exchange has done it to a certain extent as well. As have Adobe and oXygen in the last few releases. I don’t think a form will cut it though, it is not flexible enough.

    I don’t think that this has really happened because the majority of the structured authoring to date has been in the product content area and unfortunately they are more tolerant of “techie.” As the web world has finally started to get it, I think they will be the ones to change the way we author structured content. They will not stand for ugly, technical, obtuse user interfaces. My only concern is that it will happen in the web world but not in the broader corporate publishing arena.

    1. Thanks for the comment Ann.

      Tech pubs people have always been publishing people, so they don’t protest when publishing and CMS concerns intrude into their work. But the rest of the enterprise is not used to being asked to deal with these issues, regardless of how mechanically easy it is. Has the DITA for Enterprise Documents committee been looking at ways to get all of the DITA “stuff” out of the author’s view and allowing them just to deal with their subject matter?

      I don’t believe that you can ever make authoring structured content as easy as writing unstructured content in Word, not because of any technical complexity, but because structured content requires structured thought, which is more intellectually demanding than the unstructured thought that goes into so much of the content created today. That’s why I think it is so important to get extraneous matters out of the author’s set of concerns. Getting them to create good structured is challenging enough.

      Have you looked at what oXygen is now supporting for forms-based views of content? It’s very powerful and flexible, and it solves the problem you have with a standard forms package or what to do with the mixed content fields. I’m having great success with it in my current project.

  4. Structured authoring could indeed be as easy as in Word (in fact easier because you know where things start and end!). It all depends on the simplicity of the DTD.
    Back in 1998 we had plans to introduce XML in schools. We then developed the FlexDTD – a very simple DTD. The school project failed but the DTD survived. From year 2000 and up to this day we have successfully helped big Swedish and Finnish international companies to leave the WYSIWYG world in favor of XML. Today we are even helping companies move away from complicated XML solutions (sometimes DITA!) to the simple FlexDTD world.
    How is this possible? Here are the main reasons:
    * A writer only has to learn at most 26 elements. There are some more elements but they are invisible (like a body element in a table) or created automatically (like a list item).
    * All elements have a class attribute specializing the element if needed (e.g. a list can be an ordered list, an unordered list or any other list). This means that the DTD vocabulary can be expanded without changing the DTD and it can be done gradually.
    * Your John Wayne example will look like this in FlexDTD:
    <paragraph><phrase class="actor">John Wayne</phrase> plays an ex-Union colonel in <phrase class="movie">Rio Lobo</phrase>.</paragraph>
    The style sheet will be responsible for creating the wanted links.
    * A FlexDTD file is always backward and forward compatible – files can always be read but an “unknown” specialized element is not presented and handled as nice as it could be. Cutting and pasting from old or new files will always work.
    * The DTD validates the basic structure – the style sheet validates the rest! This two step validation process is smart – it is the final usage of the information that imposes limitations. If your style sheet does not support more than 4 section levels it should give you a warning. The DTD, on the other hand, accepts an unlimited number of nested sections
    * The DTD is very forgiving. You can mix all block elements freely. Sections inside a table inside a list is valid. It might be stupid because your current style sheets cannot handle it. On the other hand the lack of restrictions makes it easy to support new media being more flexible than the paper. This also allows the writer to work “top down”, “bottom up” or in any other “unstructured way”, supporting the creative writing process.
    * Block elements and in-line elements can not be mixed – there is no mixed content (as opposed to DITA) in FlexDTD. This makes the structure more clear and transformation and styling much easier.
    * A reusable unit can be as big or small as you prefer. This means that you can collect everything describing a specific function in one file (sometimes printed as a chapter) even if the file contains many section levels. You are not forced to create “fragments”.
    * Conditions can be on any element and are mathematical Boolean expressions (but this is a CMS feature, handled in our case by Skribenta).
    I presented the FlexDTD on the Extreme Markup Languages Conference back in 2000. Now, I have the privilege to give a presentation on the DITA North America Conference in April, summing up our FlexDTD experience over a decade.

    1. Hi Jan,

      Thanks for the comment. I think we have to make a distinction here between two different kinds of schemas. From what you describe, I believe that the Flex schema is what I will (for present purposes) call an annotation schema.

      An annotation schema is one that fits over a text without requiring any change in the text itself. It then provides additional information about the text by way of attributes. If you want to provide a marked up copy of Shakespere’s plays, for instance, you need an annotation schema, since you can’t change the text itself. A necessary feature of annotation schemas is that they must be very forgiving, since they have to fit an existing text as it is written.

      The alternative is what I will (for now) call a forming schema. A forming schema is designed to impose order on content. You can’t usually apply a forming schema to existing content, because the existing content won’t fit the form — it is either in the wrong order, missing required, elements, or contains content for which no element exists in the form.

      The upside of using a forming schema is that it imposes a high degree of regularity and consistency on the content, which makes it possible to do a lot of validation of the content, to report on the content in various interesting ways, and to reliably process it with algorithms for things like aggregation, linking, organization, and presentation.

      A key requirement of a forming schema is that it be very inflexible. It forces the author to supply the required information in the required order. It is designed to shape the text, not conform to it.

      A forming schema is not inherently hard to use. Every form you fill out is effectively a forming schema for a particular kind of information, and the world practically runs on ordinary people filling out forms so that their data can be processed efficiently by back end systems.

      But a forming schema is only easy to use if it asks the writer for information they know, and in a way that they understand. The problem with many of the forming schemas used in tech comm is that they ask the writer for publishing systems and content management system information that the writer does not have or understand, or that requires a lot of work to find.

      You can certainly have people write in annotation schemas, and do useful things with the annotations you collect. You can’t do as much as you can with forming schemas, because you can’t impose as much structure, but for many applications an annotation schema may be all you need.

      The pity with systems like DITA is that they tend to start out as forming schemas, but get a lot of resistance from writers because they ask for publishing and content management data that the writers don’t have. To get around this, they migrate to being more like annotation schemas, as a way of improving usability. The problem is that in doing so, they lose a lot of the precision and automation that a forming schema allows, without losing all of the complexity of their original design.

      I’d suggest that in most cases people would be better off to either move to a properly designed forming schema that hides publishing and content management detail, or to adopt a simpler annotation schema such as Flex.

      My question about Flex, though, is now that HTML5 has become an annotation schema, thanks to the addition of microformats, is there still a role for a separate annotation schema like Flex?

  5. Jan, I very much like the idea you present, “The DTD validates the basic structure – the style sheet validates the rest!” – although, as you later point out, doing some things might “be stupid” if you know that you have no stylesheet that will accept them.

    Still, the idea of such a simple DTD is intriguing both because it makes the writer’s task easier, and because it works equally well for SME’s, marketers, and other folks who don’t give a fig for XML and what it represents, they just want to get their content stored in some reusable form that they can call on when they need it, and can share (and of course, receive shares from others).

    Does FlexDTD use a lot of information typing? If so, what are the main information types?

    1. No, not a lot. I would estimate it to around 30 types being introduced over the years. And our customers are using more or less the same.

      Despite the fact that all elements are generic and can have a class attribute, the wrapper block element and the phrase in-line element are the ones that are most often specialized.

      Here are some typical wrapper types: block, caution, comment, example, figure, note, quote, tip and warning.

      And here are some typical phrase types: counter, literal, notranslate, subscript, superscript, unit, value and variable.

      Many of these are kind of basic and are now a’days supported “out of the box” in Skribenta, e.g. the value inside the unit element will be localized (meter – feet, Celsius -Fahrenheit, etc), the comment wrapper is by default not included when published, and so on.

    2. Ray, the idea of a simple schema (DTD) is, I would suggest, essential to any meaningful extension of the authoring community.

      But that simplicity can be achieved two ways, either through a general annotation schema such as Flex (or HTML5) or thought a task-specific forming schema that is written just to create one kind of content from one kind of author.

      Which you should choose is based on the extent of the validation and processing you want to do with the content on the back end. If you want to do a lot of this, you need a forming schema to impose reliable order on the content. If you just want to publish it as written with some basic flexibility in presentation, then a general annotation schema will do the job.

      As I pointed out above, though, forming schemas are a regular part of every day life for people in industrialized societies. Indeed, industrialized societies would not operate as they do today without the use for forming schemas. We just have to learn to apply them to content.

  6. I am a yummy in authoring system, but I still wanna to state my view of your discussion. I think each system and its user can be treated as two sides of a communication channel. One transmitter and one receiver. The user want to present their requirements, and the system is always trying to understand the user’s meaning.

    Based on Shannon–Hartley theorem, we have to define a set of the symbols for the communication. More symbles means a wider bandwidth, which means less transmission time. And less symbols means narrow channel and longer transmission time.

    If the paragraph length is given, then a more complex set of symbols will probably provide a bigger information capacity, which equals to describe a topic more accurately. But is we are using a set that only contains less symbols, this may lead to a imcomplete description.

    So I think we should try to find a way that can use the right set of symbols to describe right topics. And to reach a higher symbol using efficiency, I think we should put more effort to refine the document definition, and make our system more clever.

    Simple tool is easy to learn, complex system can solve complex problem. So the problem is you want a Final Cut Pro, or just iMovie. Right?

    1. Thanks for the comment, Daidai.

      Applying the Shannon–Hartley theorem to this is ingenious. One can definitely look at the addition of markup to content as a means of creating additional symbols, and thus increasing the bandwith of the content — allowing it to carry a more precise meaning.

      On this point, you can also get more information bandwidth by using a more specific set of symbols. A symbol that is specific to a particular domain can carry more information unambiguously than a more general symbol can. This argues again for more specific schemas tailored for the specific domains in which writers are writing.

Leave a Reply

Your email address will not be published. Required fields are marked *