15 Responses to XML is Not the Answer

  1. Scot Marvin 2014/01/27 at 14:04 #


    Seriously though, do we have any examples of a doc repository in JSON or a relational database? I’m sure there are. I just haven’t seen any.

    • Mark Baker 2014/01/27 at 14:29 #

      Thanks for the comment, Scot.

      This blog is an example of a doc repository stored in a relational database — WordPress stores all its data in a MySQL database and uses a system it calls short tags for labeling content below the document body level. Labels applying to the whole post are stored in relational fields. In general, most Web CMS systems are based on relational databases and feature varying degrees of structured based on breaking down elements of content into relational fields. Such systems certainly outnumber the XML based repositories used in technical communications.

      I worked on another project that stored content as part of a semantic network in a MongoDB database which uses a modified form of JSON for storage. We used some XML on the front end, though that was not essential, but MongoDB seemed to be the right choice to store the semantic network relationships that were the real core of the project.

      But my point about information in relational databases and JSON feeds is not so much about basing your whole documentation system on them as taking advantage of the labeled information already available in your organization.

      For example, in a project for one client, I was able to generate a component database from information stored in format intended to be read by a C program and integrate it with content derived from an in-house source code commenting system. We did use XML as an integration format for that system, and later integrated authored content from an XML source, but the bulk of the information that went into the reference came from information sources that were structured and labeled in different formats. It was the labeling, not the format, that enabled that content engineering solution.

      Again, my point is not to discourage people from using XML, but to encourage them to think about the substance of what they are doing when they apply labels to content, rather than to the trivial fact that they are using XML to apply the labels.

    • Ben 2014/01/27 at 16:32 #

      How about REST API documentation in JSON? Here’s a demo for the Swagger framework http://petstore.swagger.wordnik.com/

      • Mark Baker 2014/01/27 at 17:05 #

        Indeed. I expect we will see increasing use of JSON to deliver certain types of content. It is not a format that is optimized for content by any means, but it will often be the format of opportunity for Web APIs and similar things. As long as it lets you label content in the ways that support the automation you want to do, there is no reason not to use JSON — economic considerations aside.

  2. Don Day 2014/01/27 at 16:08 #

    You knew you’d hear from me.

    So I agree with you completely on the naming role of XML vs shortcodes, at least within the example you give.

    Since shortcodes are simply text, they can be inserted as valid markers by popular in-browser editing tools, but those tools normally lose touch with that markup after they insert it–they provide no validation and no subsequent markup-aware editing on inserted codes. WordPress programmers have persevered to put a rudimentary shortcode tracking capability in place, but users can easily hand code wrong values since shortcodes are plain text. And there is no universal definition of shortcodes… they are conventions, not standards.

    By contrast, what XML offers is a system for managing content in an object-oriented manner, with at least a prayer of assurance that your markup is correct and organized in a way that allows querying the inner structure, should you need to. Not everyone needs those assurances, nor do I defend the draconian nature of XML validation or the cost of fully structure-aware XML tools. I’m just pointing out that XML provides a set of system-level services that are already fairly ubiquitous, contrasting that situation with the limited scope of support for shortcodes outside the WordPress universe.

    And even for non-XML structured content tools, there’s need for some level of schema-informed methodology to guide authors and to build tools with common behaviors and that are more widely supported and deployed. Are you willing to ditch XML and build all that infrastructure yourself? Or would you not build that system on top of XML, hiding the complexity but exposing those benefits?

    My main concern about all the adaptive content authoring tools out there is that they also follow the presumed mantra that XML is not the answer, and therefore each one ends up being its own expensive, siloed solution with no prayer of widespread open source or vendor support or of having a self-sustaining user community outside of that singular solution. SPFE is a partial solution, but it cannot meet the content management and publishing needs of the Web as a whole. So while “XML is not the answer” predictably is a zinger title for this post, the simple premise overlooks an entire value ecosystem that is available to utilize. I want a solution that is widely accepted, makes use of existing services, brings delight to all potential authors, publishers, and content consumers, and cures the common cold. What say?

    • Mark Baker 2014/01/27 at 17:38 #

      Thanks for the comment, Don.

      I certainly hoped I’d hear from you! You always raise important and interesting points, as you do in this comment.

      I agree that the main economic motive for choosing XML as the format to expressing structure in content is that is comes with a full suite of validation and processing tools, not to mention editors and a variety of repository options that are optimized for storing and retrieving it.

      The fact that XML is designed to be validated, and that there are editors that can validate content on the fly, and can suggest valid options to authors at any point in the document are major advantages in ensuring that content is labeled reliably — and are largely wasted if people use generic formatting-oriented schemas that don’t require any interesting labels to be applied.

      As you know, my critique of many DITA implementations is that in sticking to the generic task/concept/reference topic types, they forego the opportunity to enforce a rhetorical structure appropriate to their business and to label things in their content in ways that make sense for the business, and that would enable greater degrees of automation for their content. DITA can, of course, provide ways of doing this, but many fail to use them, and many advocates discourage their use.

      But that, you see, is the point I am making. XML is a useful implementation tool, but it is not the answer. The answer is the reliable and correct labeling of content. If you implement XML without thinking through how your content should be labeled to meet your business needs, you are missing the point of the exercise.

      In pointing out that there are other useful ways to label content I am not telling people to run away from XML, I am telling people to think about how they need to structured and label their content before rushing into a conversion project. I am also telling them to recognize that any source of content or data that is reliably labeled is a potential input source for content engineering, even if it is not labeled using XML.

      Once someone has a clear idea of how they want to structure and label their content to support their business processes, XML may be the best tool for implementing that structure. But simply converting to XML — any XML is not the answer and may not yield the benefits expected, or any benefits at all.

      XML, therefore, is not the answer, or, at least, not the answer to the most important question, though it may be the answer to a significant secondary question about implementation.

      As for SPFE, as an architecture it is just as general as DITA in its potential applications, though obviously not anywhere like as mature or well supported as DITA, and optimized for a different set of operations. But I agree that neither one is able to meet the content management and publishing needs of the Web as a whole. There is no one ring to rule them all and in the darkness bind them.

  3. George Bina 2014/01/28 at 02:12 #

    Hi Mark,

    Thanks for this post, as always you get to the heart of the problem :).
    I hear a lot from people that want to obtain structured content but they do not want their authors to know anything about the labels they want to set (XML element names in most of the cases), they want the authors to work as they used to work in Word – and that shows exactly that they are missing the point for moving to XML/structured content.
    What I have seen also is that in many cases people add markup/labels to content but they do not use those labels in any way – that makes it difficult for people that create that content to understand why they should take the effort to add those specific labels/markup to the content.
    Anyway, the title is a little misleading, I think something like “XML is just one answer” or “XML is part of the answer” express better the ideas from your post.

    Best Regards,

    • Mark Baker 2014/01/28 at 10:13 #

      Thanks for the comment, George,

      It is exactly that kind of consumer confusion that I am talking about. If people wanted an editor that worked exactly like Word but produced XML, then the right product for them is Word, since Word’s file format now is XML. The only reason not to choose word for this purpose is that the semantics of Word’s XML are purely those of Word’s internal document model.

      Word (like just about every other tool on the planet) gives you XML. But it does not give you configurable structure. You need a tool like oXygen when you want to implement configurable structure, and that will always mean that the writers need to be aware of the structure they are being asked to create. oXygen gives you multiple elegant ways to build and interface to capture that structure, but the writer has to actually apply the structure.

      The title has certainly mislead some people. It was chosen to be provocative. I hoped the subtitle would clarify the argument, but for some readers it has clearly failed to do so. A more correct title would have been, “XML is the Answer to the Second Question, but You Need to Answer the First Question First,”, but that is a bit long to fit in a Tweet.

      Perhaps the reason for the confusion is exactly the thing I am trying to address: that people have conflated XML with structure, and now say XML when they mean structure (which is somewhat like saying wood when you mean house, or hammer when you mean blueprint).

      The problem this creates in the market place is that it gives consumers the idea that XML is a kind of magic pixie dust that you have only to sprinkle on your content to give it magical powers. Having your customer believe in magic can sometimes help you make a sale in the short term, but it can lead to disappointment and frustration in the long term.

      This potential for long term disappointment is clearly shown by you second point about people creating labels they never use. This industry has started to use the word “standard” as a marketing tool, creating the impression in some people that they have to create these labels they never use because it is required by some standard. But this is just spending money for nothing, and sooner or later, someone is going to question why that money is being spent.

      “Because it’s a standard” is never a reason to do anything. Standards are operational, not normative. You should use them where they give you an economic advantage, not simply because they exist.

  4. Sebastian Göttel 2014/01/28 at 07:37 #

    Hi Mark,

    thank you for highlighting an important aspect in our industry.

    In my experience people often times confuse XML markup with content processing. Just by stating in the content <para language=”EN”> or <safety level=”critical”> nothing is achieved yet. You are definitely right that algorithmic content processing is not confined to XML markup or to a specific form of XML markup. People moving from one XML implementation to another one often have a hard time understanding that it’s more efficient to adapt the way markup is applied than the way the XML engine is processing the markup. It’s the old syntax vs. semantics confusion.

    Best regards,

    • Mark Baker 2014/01/28 at 10:17 #

      Thanks for the comment, Sebastian.

      Indeed, the syntax vs semantics confusion just will not go away. What particularly irks me about it is the people who should know better who talk about XML being essential for information exchange.

      XML is syntax. Given semantics in an unambiguous form, you can easily convert from one syntax to another for exchange. On the other hand, incompatible semantics make exchange impossible regardless of syntax.

  5. Karen Field Carroll 2014/01/29 at 12:19 #

    Well said, Mark! Your post explains well the XML/content management confusion, but it also speaks to a broader problem we have in tech comm these days–that of mistaking tools for talent or skill or even effort in general. Great post!

    • Mark Baker 2014/01/31 at 23:12 #

      Thanks for the comment, Karen.

      “Mistaking tools for talent” is an interesting way of putting it. What I am seeing today is a great deal of mistaking standards for business processes, which may be very much the same sort of thing.


  1. XML is Not the Answer | Technical Communication... - 2014/01/28

    […] XML is not the answer. Structured writing may be the answer. XML is one way to implement structured writing. More and more these days, I am hearing technical publication managers (and not a few con…  […]

  2. XML is Not the Answer | Success of the Technica... - 2014/01/29

    […] XML is not the answer. Structured writing may be the answer. XML is one way to implement structured writing. More and more these days, I am hearing technical publication managers (and not a few con…  […]

  3. XML es la herramienta - 2014/01/29

    […] pasado día 27 de enero leía un post en el blog everypageispageone.com, titulado “xml is not the answer“, y que me servirá como base para este post. A su vez, este post complementa el que realice […]