Structured Writing is not Desktop Publishing plus Angle Brackets

By | 2011/12/12

What constitutes a “real” XML editor? The question is perennial, but is made topical by Tom Aldous’ surprisingly shrill defense of FrameMaker as an XML editor. It is unusual for a market-leading company to indulge in myth-busting aimed at tiny competitors. It is an approach more common to the small and desperate. But if we look past the oddness of Adobe employing this tactic, we see that the question of whether FrameMaker is a real XML editor, as with almost all debates about what makes a “real” anything, is not a debate about the product’s features, but a debate about what “real” means in the context.

Since an XML editor is just a tool for editing XML, and XML is just a tool for structuring content, the question at the heart of the matter is: what is structured writing?

Some see structured writing as desktop publishing plus angle brackets.

The Adobe position is what I would describe as  “desktop publishing plus angle brackets”. If you take this position, the thing a real XML editor must do is to present the writer with a desktop publishing interface, and produce XML under that skin. It is not at all surprising that Adobe takes the “desktop publishing plus angle brackets” view of structured writing. Desktop publishing is their baby, after all, one of the key technologies the company was built on.

But those of us on the “FrameMaker is not a real XML editor” side of the question tend more to the view that desktop publishing is the disease and structured writing is the cure. One of the most common phrases used to describe both XML and structured writing is “separating structure from formatting”. But what has become of that separation if the interface in which the author works is a WYSIWYG representation of the published form of the document? Structure and formatting may have been separated under the skin of the applications, but they have been promptly merged again in the authoring interface, never more to part, in most cases.

The truth is that in many structured writing systems, the separation of structure and formatting is not very great. Many schemas/DTDs are little more than an abstraction of the presentation artifacts of a printed page, such as substituting <emphasis> for <italic>. The design of such systems is done from the published artifacts backwards. In many cases, the abstraction goes no further back than the point where the same abstraction can be used to drive both print and on-line publication.

Additional semantic tagging may certainly be added. This is sometimes what you do when you create a DITA specialization (sometimes you just create another publishing abstraction). For instance: you add some tags to your topic type that are specific to the subject matter of that topic type. But when you do this, you are adding a semantic gloss to an element which is actually an abstraction of a publishing artifact. The whole point of the specialization mechanism is that even with the semantic gloss in place, the element can still be published as the underlying artifact.

Adding a semantic gloss to a schema that is fundamentally an abstraction of a publishing artifact can, of course, yield a number of benefits, but the method is definitely limited, and can sometimes present problems — for example, how exactly is the semantic part of the semantic gloss supposed to be represented in the publishing-oriented view that the author is working in?

The alternate approach to structured writing — the view in which FrameMaker is not a real XML editor — does not work from the published artifact backwards but from the content store outwards in two directions, one toward the author and the other toward the published document. The aim of this approach, plain and simple, is to get data that is as reliable as possible. This is how database systems are designed. You begin with a model of the data that you want to capture, taking into account all of the operations you want to perform on that data, and structuring your data model in such away that it allows you to write queries that will perform all of those operations reliably.

What I mean by “reliably”, here, is that the database structure is designed in such a way that, provided the data is correctly entered, you can run the query and be confident in the result without the need for a human being to check it over. Having human beings check over data is expensive and time consuming, so we naturally want to minimize it as much as possible. Thus when you log into Amazon and check your list of recommendations —  a page that is entirely generated by machine based on all the books you have read, bought, rated, or placed on your wish list — there is no one on the Amazon staff who has to look over that page and approve it before it is sent to your browser. Such an inspection would be too costly for Amazon, and too time consuming for you. If such an inspection were needed, the feature simply would not exist. But the inspection isn’t needed, because the data in Amazon’s databases is reliable enough that they can generate this page for you automatically.

The more functions a database system can perform reliably, in this sense, the more efficient and productive it is. A good database system, therefore, is designed to be as reliable as possible for as many functions as possible. But no amount of reliable structure does you any good if you don’t have reliable data entry. This is the issue that the term “garbage in, garbage out” was coined to describe.

To ensure that you get good data in, you must design your data gathering system to be as clear and unambiguous as possible, to guide authors as fully as possible, to prevent errors as far as possible (for example, by only allowing people to choose values from a list of valid values), and to detect and report any errors that do occur as soon possible. You must also audit your content regularly to make sure errors are not creeping in,  and change how data is collected to avoid such errors in the future.

An XML document is a database. A collection of XML documents is database. To make your structured writing system as efficient and productive as possible, those databases should be designed to be as reliable as possible for as many operations as possible. Creating a schema as an abstraction of a printed page is not generally the best design to achieve this. Similarly, in order to get reliable data from authors, you need to provide them with an authoring interface that ensures that the content they create is as complete and error-free as possible. Having people author in the visual representation of a published page is generally not the best strategy for achieving this.

In this view, structured writing is not desktop publishing with angle brackets; it is database design with content. In a database system, the data structures are designed for reliability, the authoring interfaces are designed for accurate data capture, and the publishing is then based on a programmatic transformation of the data into publishable format by a reporting system. Publishing, in any media, is not a paramount design criteria for the data store or the authoring interface. These activities proceed in the well-founded confidence that a well-structured data store containing reliable data can be published successfully to any media. Structure now is well and truly separated from formatting — separated so much that formatting is not the driver in the design and implementation of the structure or the authoring system.

Database input and output

A database system uses forms for input and publishes documents through a report writer.

A structured writing system that is based on the idea that structured writing is about making content reliable will be designed much more like a database system. Not every type of topic can be captured by a form, of course, but we can take an approach to designing our authoring schemas and our authoring interfaces to maximize the reliability of the content data we collect, rather than to mimic the formatting of the document presentation we will eventually create from that content data.

A real XML editor, for those of us who look at structured writing as database design with content, is not one that presents a DTP interface over an XML schema that is an abstraction of the printed document, but one that allows us to create a highly reliable data entry system for the capture of highly structured, highly reliable content data, which we can then publish any way we like.

14 thoughts on “Structured Writing is not Desktop Publishing plus Angle Brackets

  1. Sarah O'Keefe

    Mark, thank you for your eloquent dissection. I thought the tone of the article bordered on inappropriate (especially the part about competitors “maybe” lying).

    The critical question is not “can FrameMaker do XML?” but rather “is FrameMaker the best way to do XML?” For some organizations, perhaps yes. For others, definitely no.

    Reply
    1. Mark Baker Post author

      Hi Sarah, thanks for the comment.

      Agreed about the question. And it is not a question that necessarily has a single enterprise-wide answer. The key to getting good data is to present an interface to the author that is appropriate to them, and to the kind of information you want from them. Thus the bank uses one interface for their ATMs, another for their on-line banking, another for their tellers, and yet another for their loan officers. All contribute information to the accounts of their customers, but in different circumstances, for different purposes, and from people with different levels of knowledge and training.

      As we writers never tire of explaining to developers of the products we document, you have to get the interface right for the job you want people to perform.

      Reply
  2. Larry Kunz

    Excellent article, Mark. Structured authoring, when done right, follows a database model and doesn’t take into account what the published document will look like. Well said!

    One minor point of disagreement: there is a use case for XML editors that present a DTP facade over an XML schema — because some content contributors, notably SMEs, simply won’t use anything else. However, these editors should be selected with care so that they’re watertight (i.e., the schema can’t be broken). And they probably shouldn’t ever be the primary authoring tool.

    Reply
    1. Mark Baker Post author

      Thanks for the comment, Larry.

      Actually, I don’t disagree that the DTP facade can sometimes be appropriate, if it is the interface that works best for a particular contributor, and you are not asking them for any semantics that cannot be represented that way.

      That said, I have had more success getting reluctant contributors to fill in forms than to use structured editors. With software developers I have been able to get good results asking them to produce raw XML. The key is, you can’t ask them to comply with some complex publishing oriented schema — it just has too much noise for them. You have to give them a schema that works essentially like a form, with nothing extraneous, everything clearly labeled in terms they understand, and little of nothing that is optional.

      This can go a little beyond presenting a different interface over the same schema. In some cases, you have to write a schema specific to the people you want to collect a certain type of data from, then transform that data appropriately as you bring it into your publishing system. I have even created non-XML entry formats on occasion — anything to get good data in.

      Reply
  3. Eddie VanArsdall

    Mark, I echo the thanks. I’ve been trying to get this point across for the last few days, and now all I have to do is refer people to your excellent post.

    Reply
    1. Mark Baker Post author

      Hi Eddie,

      Thanks for the comment. I’d love an update on how things go with the people you are trying to convince. I’m always looking for better ways to make the case, and to anticipate any confusion or objections people may have.

      Reply
  4. Antoinne

    Hi Mark, really enjoyed reading your article, especially your thoughts on a good xml editor. I was at Online Information 2011 @ Olympia last week and came across a product called Liquid XML Studio (http://www.liquid-technologies.com/xml-editor.aspx) which I thought was a fairly good xml editor, do you have any particular preferences yourself?

    Reply
    1. Mark Baker Post author

      Hi Antoinne,

      I prefer different editors for different tasks. For writing and for schema design, I use Oxygen. For XSLT work I use jEdit, which is not properly an XML editor, but a text editor with really good XML editing capabilities. Among other things, it has much better syntax highlighting than oXygen. Oxygen uses the same syntax highlighting for all XML documents, whereas the jEdit uses XSLT-specific highlighting, which makes all the difference in the world when programming. It is all a matter of horses for courses.

      Liquid XML lost me right out of the gate because when I downloaded their trial version, I found it was crippled and would not display modular schemas. This is silly, because I don’t evaluate tools against dinky projects, I test them against my normal work. From their website, I see they have changed that policy now, but I don’t have any pressing reason to evaluate them again.

      Reply
      1. George Bina

        Hi Mark,

        Please check the next oXygen release for XSLT work – we added XPath syntax highlighting inside XSLT attributes and indeed that helps a lot. This is planned for early next year but if you want early access just let me know.

        Regards,
        George

        Reply
        1. admin

          Thanks George, I may take you up on that offer, and I certainly look forward to seeing the feature in the next release. You guys should really take a close look at the XML and XSLT syntax coloring in jEdit. It’s the best I have seen.

          Reply
  5. Pingback: FrameMaker is a real DITA editor…a very poor one | Ditanauts

  6. Richard Rabil, Jr.

    Thanks for this article, Mark. Very useful and insightful.

    Reply

Leave a Reply to admin Cancel reply