Why does XML suck?

By | 2016/01/28

XML sucks. Don’t get me wrong. All kinds of really valuable and important systems use XML to perform vital functions. But performing a vital function does not keep something from sucking. Lots of people think Windows sucks, but it performs a vital function, and lots of people use it because of that. In fact, performing a vital function is what keeps sucky things around. If they perform a vital function, they are hard to get rid of, despite the fact that they suck.

Not that that prevents people from trying to work around them.

Scott Able recently tweeted

If we have markup, why do we need markdown?

Markdown has been making major strides recently, particularly in the Web and developer communities. What does it have that XML doesn’t
have? Simple: It’s easy to write.

XML sucks for writing. Lets look at why, and how it came to be that way.

XML is verbose

First and most obviously, XML is verbose. If you write in raw XML you are constantly having to type opening and closing tags, and even if your editor helps you, you still have to think about tags all the time, even when just typing ordinary text structures like paragraphs and lists.

And when you read, all of those tags get in the way of easily scanning or reading the text. Of course, the people you are writing for are not expected to read the raw XML, but as a writer, you have to read what you wrote.

Of course, you could use a specialized XML editor that hides the tags, but that is another tool to learn and use and add to your workflow, and that sucks. Plus, they don’t completely solve the problem, as we shall see.

Markdown is much less verbose. You can type a markdown document with not much more effort than typing plain text, and you can read it easily in source form.

But there are other less obvious, but ultimately more vexing problems with XML.

XML makes whitespace meaningless

How do you indicate the structure of a document in print or on screen? Partly you do it with font choices and the like, but first and foremost, you do it with whitespace. Whitespace naturally and visually delineates the parts of a document. One paragraph is separated from another by whitespace.

Markdown uses whitespace to delineate structure just the way a normal document does.

XML? Not so much.

In fact, in XML, most whitespace is simply discarded. You have to make special provisions to keep significant whitespace in things like code samples. To create a paragraph, you actually have to create an element (element <p> in some languages, <para> in others). We may be so used to this now we hardly think about it, but really, why the overhead when we could just hit return?

In ordinary printed text, we use indentation to indicate a code sample or a quotation. In markdown, you do the same. In XML? Nope. More elements.

This failure to use whitespace to mean what whitespace means in ordinary documents is a major contributor to the verbosity of XML markup. It is why we need so many elements for ordinary text structures and why we need end tags for everything.

Why does XML ignore whitespace? Well, XML is a descendant of SGML. SGML stands for Standard Generalized Markup Language. The Generalized part means that it was intended for describing all kinds of markup for all kinds of purposes. That includes markup for things that might not have paragraphs or code samples or quotations. That means that whitespace could not mean anything specific in SGML’s “reference concrete syntax” — its default way of writing tags and attributes.

However, SGML has several facilities that you could use to make whitespace and other characters mean something specific in a particular tagging language. Using these features, you could make a markup language that looked like Markdown and still parsed as an SGML document. In other words, in SGML, you could create a markup language that was easy to write. (Admittedly, these features were so fiendishly difficult to use that they had no chance of catching on.)

XML did away with all these features. XML discarded all the things in SGML that made it possible to create markup you could actually read or write.

Why? Because that was not what XML was designed for. XML was designed as a data transport layer for the Web. It was supposed to replace HTML and perform the function now performed by JSON. It was for machines to talk to machines, not for writers to write in. Writers were to write either in SGML or in WYSIWYG editors.

So, XML ended up with no way to make whitespace meaningful. Which sucks, because in actual writing, whitespace is the basic building block of structure. Hitting return to create a new paragraph is an ingrained behavior in all writers, and XML breaks it. That sucks.

XML has no semantics

The reason that whitespace is meaningless in XML is actually far broader than just whitespace. XML markup has no semantics. That is, nothing in XML syntax itself has any meaning relative to the content or data it is marking up. It only has meaning in terms of XML’s own abstract data model of elements and attributes. It is entirely up to each specific XML tagging language to define what its elements and attributes mean. XML does not predefine anything for them.

This is deliberate. XML was designed to be semantic-free, and for good reason. XML is what we call a meta-language. It is a language for describing other languages. Thus XHTML, DITA, DocBook, RDFa, RAML, SVG, XSLT, WSDL, XProc, S1000D, and BeerXML, along with all languages in this list on Wikipedia, and thousands more, are all markup languages with specific individual semantics, all of which are defined in XML. If any part of the XML syntax itself had inherent semantics, all those languages would have to share them, even if it were not appropriate for them. So XML is semantic free.

But that means that you have to specify absolutely all of your semantics yourself by defining elements and attributes for everything. This is laborious, and tends to produce quite verbose data formats. The design goals section of the XML specification actually states: “Terseness in XML markup is of minimal importance.”

For any application for which terseness is a useful property, this sucks. Human readable and writable markup is one of those applications.

The lack of semantics in XML itself also creates a processing overhead. You have to parse the XML syntax first to pull out the elements of the specific tagging language and then run separate application code to interpret the semantics of that language. Along with its lack of terseness, this is what led to XML being rejected in favor of JSON by many web developers. JSON is “Java Script Object Notation”. It has the inherent semantics of Java Script objects right in its syntax, which means it is trivial to parse into memory and access directly with application code.

Being semantic free is an important property of XML in its quest to be highly general. But it makes it suck for things like
writing documents or swapping data between web servers and Java Script programs.

XML hides structure

Of course, if you have a graphical XML editor, you get back the ability to hit enter for a new paragraph. At least, sort of.

It’s really not that simple. Since XML has no semantics, there is no way for the editor to know that Enter means paragraph, or even which tag means paragraph. It can know this for specific well known languages like Docbook, of course. But even then, there are problems.

When the writer hits enter, do they mean “end this paragraph” or do they mean “start a new paragraph”. Typically, they mean both. But suppose there are other elements allowed inside a paragraph, like a list? Maybe the writer wants to create a list that is logically inside the current paragraph. It will still be whitespace separated in the output, of course, but if it is logically inside the paragraph, then we can’t assume Enter means “end this paragraph”. It may mean “start a list inside this paragraph”.

And because there are a number of elements that could come after a paragraph, beside a new paragraph, we can’t assume that Enter means “start a new paragraph” either.

Which means that when you hit Enter, the editor has to ask what you want each time. In some editors, therefore, hitting Enter brings up a list of choices. Usually there is a default choice, which is usually a new paragraph, which you can choose by hitting Enter again. That’s not so bad. But then again you can’t always hit enter twice without looking because sometimes the thing you want next isn’t the default. You have to stop typing and look. And that sucks.

It gets worse when you try to edit something. Let’s suppose you want to insert something between a list item at the end of
one section and the title of the next section of a document.

On your screen you will see white space between the list item and the section title. In that white space there are the starts and ends of several elements.

  • The end of the paragraphs inside the list item
  • The end of the list item
  • The end of the list
  • The end of the paragraph that contains the list (if it is inside a paragraph).
  • The end of the section
  • The start of the new section
  • The start of the title of the new section

But these starts and ends of structures are all invisible to you because they are not part of the WYSIWYG display. Depending on what element you want to insert, you have to get your cursor into the right bit of on-screen whitespace that represents the element you want to be in.

To add an item to the list, for instance, you need to get your cursor inside the list. To add a new paragraph to the section, you need to get your cursor after the end of the paragraph and before the end of the section. But you can’t see any of these locations. All you can see is a block of whitespace. The editor will have aids to help you find the right place, of course, but it is all terribly cumbersome and time wasting for what should be an operation you perform virtually unconsciously.

This is far from being the most difficult instance of this, and the result is that most writers writing in XML regularly switch back and forth between the graphical view and the raw tags view of the document they are working on, in order to find the place they need to be to make the edits they want. And that sucks.

Cutting and pasting content in the graphical view of an XML editor can be even more of a nightmare. You can’t see exactly which set of tags you have picked up, so you can’t tell what is legal to paste in what location, or what damage you might be doing to the current structure by removing the chunk you cut. Again, the editor will help all it can to repair the damage, but, again, what should be a simple and intuitive operation becomes an exercise in negotiation and troubleshooting.

And this is just the problem of dealing with simple text structures. What if you have significant semantic markup in your document? These elements and attributes are not visible in the graphical view because they are not intended to be visible in the published document, but to inform downstream processes. Particularly maddening are XML attributes, which are use for information that is intended to be “out-of-band”, that is, not presented to the reader. Editing and entering attributes in a graphical view invariably involves multiple keystrokes invoking dialog boxes, choices from lists, and typing into dialog fields. It sucks.

In short, you can’t just sit at the keyboard and type your document, and that sucks, no matter how good your XML editor may be. There are some very good XML editors, and you are likely to need one if you have to write a lot of XML, but none of them can take all the suck out of XML itself.

In Markdown, by contrast, you can just sit down and type your document.

To create structure, you need to see structure

The key thing here is that to create structure, you have to be able to see the structure. We have all used those free-form HTML editors in blogging platforms and Web CMSs and we know that the HTML they produce is a horrendous unstructured mess.

Using Markdown is a huge improvement because the author can see the structure and therefore won’t mess it up. And because Markdown uses whitespace to mean what it natively means, there are few end tags to worry about, and therefore to misplace, so it is physically impossible to make as much mess as you can with HTML tags.

XML editors, while much stricter, still don’t let you see the structure, but instead of letting you make a mess, they continually scold you with obscure error messages about structure errors, which are hard to fix because you can’t see the structure. Or they simply prevent you from typing what you want to type until you get your cursor in the right bit of whitespace. (Actually, how much mess you can make vs. how much scolding you get is more a function of the specific language than then editor. But it is always one thing or the other.)

The alternative is to look at the raw XML tags, but while the structure is technically visible in raw XML view, it is often impenetrable because of the verbosity of the tags, and because your most basic structural clue — whitespace — is either not used at all or is used to show the hierarchy of the tags, not the significant structures of the document.

In short, XML hides document structure no matter how you view it, and that really sucks.

The price for generality is suckiness

This is always the problem with generality. The things you have to do to a system or format to make it more general inevitably make it more complex and more verbose. It can’t incorporate all the shortcuts that would seem natural in a system built for a single purpose because they would be inappropriate, awkward, or limiting for other purposes. If it attempts to provide a mechanism for building your own shortcuts (as SGML did) then it becomes even more complicated.

Inevitably, people will create special purpose alternatives for individual applications (like JSON or Markdown) or attempt to simplify the general system (like XML did to SGML).

The upside of the more general system, at least if it has any traction, is that there are tools and people who support it. Sometimes the presence of these resources makes up for the costs of its verbosity and complexity.

In other cases, however, the level of support does not make up for the suckiness. Then new formats are proposed and developed and support systems develop around them, as they have with JSON and Markdown.

Elliot Kimber has an excellent presentation called Why is DITA so Hard. DITA is hard, he acknowledges, but he maintains it is not DITA’s fault. DITA is hard because the problem is hard. The presentation lays out the full extent of the configuration management and publishing problems associated with technical communication (and large scale corporate publishing more generally) and shows how DITA seeks to be a general solution to those difficult problems, which makes DITA itself difficult.

The thing is, of course, that no one organization actually has the complete technical communication problem, which is the union of all the problems that individual organizations face. DITA is hard because the entire problem is hard. But no one has the entire problem. Many organizations don’t have anything like the whole problem. So why do they need a tool that is as hard as the whole problem?

Part of the answer is that if they have a significant part of the whole problem, it is less expensive to adopt a tool that solves the whole problem — even if it is more difficult to use — because of the support base, the tools, and the accumulated knowledge. (This depends, of course, on whether there are other tools available that are closer to the subset of the problem that they have.)

Partitioning the problem space

There is another approach to solving hard problems, other than building one tool to solve them all, and that is to partition the problem space into smaller problems that are easier to solve, and which have solutions that are easier to use. That is what many organizations have done, and it is often the right economic choice to make.

In fact, all organizations partition the problem of creating content to one extent or another. There is no organization that is going to create all of its content in DITA or in any form of XML. It would be cost prohibitive to do so, and people across the organization would rebel against the difficulty and complexity and the general suckiness of trying to do everyday communication tasks with these tools.

The price of partitioning is fragmentation

This is not to say that all is wine and roses with Markdown. Markdown is a million miles from being a satisfactory substitute for XML in all the vital functions that XML is used for. XML is a metalanguage suitable for a wide range of uses. Markdown is a tagging language suitable for writing simple web pages, and not much else.

XML is semantics-free. Markdown is all semantics. Every piece of Markdown syntax means one specific thing for all Markdown documents. If you want a structure that Markdown does not support, there is no way to add it, no way to express it at all, except by embedding a chunk of HTML. This sucks either way.

And if you also want to create a nicely formatted paper document from your Markdown content, you will find it really does not have the ability to express the structures you need for that. Which also sucks.

As a result, lots of people have built their own versions of Markdown by modifying the Markdown parser (or building their own).

Other people have built alternative lightweight markup languages for different purposes. In fact, there are lots of them, some much older than Markdown. For instance, if you are interested in a language you can use to produce a printed book, you could look at AsciiDoc, which was based on DocBook semantics but with non-XML markup.

Most of these simply describe document structures, some supporting a wider range than others. Some of them, like Github-flavored Markdown or WikiMarkup have been adapted to support specific features of the systems they belong to such as wikiwords or GitHub issue numbers. Most express structure through a combination of whitespace and punctuation characters, unlike the named elements that express structure in an XML document.

Still others are designed for highly specialized purposes with very subject-specific semantics. Examples of these include JavaDoc markup, which is specialized for describing Java APIs and designed to be written inside of comments in Java source code. These work brilliantly for what they do, but they tend to result in API documentation being produced by an entirely different system than the rest of the docs, and without any connection between them. Which sucks.

Seeking a middle way

So, there are problems with both the generalized and the lightweight approaches. They both have aspects of them that suck, and in both cases, the suckiness is not the result of a specific design flaw that could be fixed, it is a consequence of a key feature of the approach itself.

Is there a middle way? I think there could be. We should not expect it to be a panacea that cures all ills. Whatever it is, it will have its own pockets of suckiness. The key will be to get enough things right that it meets the needs of a wide enough group of people at a price they are willing to pay. The only way to get there is going to be to try things and see what sticks.

I believe that a key component of such a middle way will be a structured extensible markup language that doesn’t suck for writing.

That language needs to have the simplicity and clarity of a lightweight markup language combined with the ability to add and enforce custom semantics through a schema or a similar mechanism.

It needs to be a markup language because we need structure for a wider range of applicability and automation and we have seen that WYSIWYG editors suck, in one way or another, for creating structure.

It needs to be extensible, because we need a language that can be used for more than one purpose, though we certainly don’t need it to be as general as SGML or XML. Something suitable for writing the kinds of documents that have paragraphs and code samples and quotations and lists will cover a wide range of needs.

Such a language does not have to be semantics-free. It can extend from a base set of semantics, rather than starting from scratch each time.

It needs to make structure clear and visible without it interfering with easily reading and writing the document.

Getting the balance right is going to be the trick. Particularly, how structure is expressed, and what types of out-of-band data are supported and how they are expressed, is going to be key to making it useful while not sucking to work in.

Some lightweight markup languages, like reStructuredText and AsciiDoc, already have extensibility features, though they rely more on programming than on a schema language for creating extensions.

Personally, I find both of them to rely too much on exotic use of punctuation for my taste. And that is very much the point. We are going to need something that appeals to a broad taste. Markdown seems to be more appealing than either AsciiDoc or reStructuredText, despite being less powerful than either.

I’m experimenting with my own approach to this problem with a  project I call SAM (Semantic Authoring Markdown). This post was written in SAM. I’ll be talking more about it in my series on structured writing on TechWhirl, and in my forthcoming book on Structured Writing from XML Press. Both the series and the book are being written in SAM.

SAM encapsulates my own particular views on how markup languages should be created and structured. It makes less use of exotic punctuation — and more use of whitespace — than other lightweight languages. I think it is more readable and writable than any other lightweight markup language, including Markdown, but that’s me. It remains to be seen what others will think. The language is a work in progress, but you can take a look at it in its current state on GitHub.

Unlike most other lightweight languages, SAM is not intended to be an entirely separate tool chain independent of XML. XML may suck as a format to write in, but it has a rich tool set and support network and SAM is designed to take full advantage of it by outputting XML rather than creating output formats directly. I think that is important. The middle way should not throw the baby out with the bathwater.

SPFE will support SAM.

Will SAM hit a sweet spot? The only way to find out is to put it out there and see what happens.

I know not everyone will agree, but we need to find something that sucks less than XML. The growing popularity of Markdown and other lightweight markup languages tells us that much.

Category: Structured writing Tags: , , ,

About Mark Baker

I am an aspiring novelist and former technical writer and content strategist. On the technical side, I am the author of Every Page is Page One: Topic-based Writing for Technical Communication and the Web and Structured Writing: Rhetoric and Process. I blog at everypageispageone.com and tweet as @mbakeranalecta.

15 thoughts on “Why does XML suck?

  1. Shane Taylor

    I agree that DITA is hard (and that’s why some folks are working on a lightweight DITA as a “middle ground”), but much of your argument doesn’t hold up.

    You seem to be very concerned about the use of the RETURN/ENTER key:

    “Hitting return to create a new paragraph is an ingrained behavior in all writers, and XML breaks it.”

    Except that, when I started writing, hitting return on my typewriter created a new line, not necessarily a paragraph (for that I had to indent), and that was ingrained in all writers. Tools change, and you change with them or complain about it (or both).

    And this:

    “XML is semantics-free. Markdown is all semantics.”

    XML in general might be without semantics, but nobody uses generic XML to write in — we use well-defined schemas like Docbook or DITA. And DITA in particular is all about semantics. That’s why it’s so powerful (and complex) — writing is not about the presentation, but about the content. Markdown is all presentation, not semantics. It’s simple, and clean, and rather like using a typewriter, but that’s as far as it goes.

    XML might suck for you, but that doesn’t mean it sucks, period. Many of us enjoy the benefits of using actual structure and reuse — absent from Markdown and typewriters.

    1. Mark Baker Post author

      Thanks for the comment, Shane

      Re Enter key: It is true that there are two ways to delineate a paragraph with white space: a blank line or an indent. And there are three possible keystroke combos to achieve that, depending on your editor: Enter, Enter+Enter, Enter+Tab. I think most of us have learned to automatically use Enter in a word processor and Enter+Enter in email and the like. I think we do it more or less automatically because we are watching the screen and we keep hitting enter until we see the paragraph visually separated. Not sure if anyone still use Enter+Tab.

      The broader point here is the use of whitespace to indicate paragraph boundaries, whatever keystrokes it takes, is easy to do, easy to read, and, just as importantly, hard to screw up.

      Yes, Docbook and DITA have particular semantics. My point is that XML does not allow them to express those semantics in any other way than by creating elements and attributes. It also forces them to explicitly close all those end tag, which makes finding the right insertion point in an XML document difficult. This is not about the semantics of the document, but about the semantics of the syntax. Because the syntax has not semantics, any XML document type is cumbersome to write in.

      But no, Markdown is not all about presentation. Markdown is a structured writing format that does the classic structured writing function of separating content from formatting. It produces plain unstyled HTML. Any formatting you want to add has to come from CSS. Markdown is a simple non-extensible language that describes just a few document structures, but it is still structured writing. In contrast to XML, however, its semantics are in the syntax itself. And since the syntax is compact, and relies on natural document structures as much as possible — specifically the way we write emails — it is very easy to read and write.

      But this is not a case of me espousing Markdown over XML. I want the benefits of structure too. Specifically, I want the benefits of extensible structure, which Markdown lacks. Note that I said the Markdown sucks too.

      But I am increasingly finding the cumbersome nature of writing in XML too high a price to pay, both for me and for other people.

      That is why I am seeking a middle way. Lightweight DITA is an interesting project, but it is not attacking the problem from the same direction as I am. That’s a good thing. So many factors influence the writing and publishing process that it is hard to anticipate exactly which approach will produce the most workable compromise. We need to put different approaches out there and see what works for different people.

  2. John Tait

    I look forward to your new book and I will certainly buy my own copy. I’m not certain though that the world needs yet another documentation format no matter how good it is. It seems to me that they only make sense if a stable community of some kind can form around them. We all seem to be very thinly spread over flavours of XML, lightweight formats, wikis, HATs, TeX and co, Word, Frame, InDesign, groff (well, I use it), WordPress and other blogs, sites like Medium, simple HTML by hand… It’s already bewildering.

    1. Mark Baker Post author

      Thanks for the comment, John

      I agree about the need for a stable community forming around a format. Of course, that can only happen if someone creates the format and puts it out there. People are actually putting these kinds of formats out there all the time. (I’ve been looking.) Naturally only a few develop a community and prosper. Only time will tell for any new proposal.

      But it is worth noting that SGML was designed to reduce the proliferation of formats that was current at the time, or at least, to reduce the proliferation of syntaxes, which is a big deal not only for learning but also for tooling. We actually need a variety of formats for a variety of needs, but a shared syntax greatly reduces the overhead.

      The new proliferation of formats today is a sign that XML has reached the limit of its ability to unify syntaxes for the many different formats that we need. Its verbosity and its heavyweight parsing requirements seem to be the main limiting factors.

      With SAM I am hoping to create a format that solves the verbosity problem while being able to create a wide variety of specific formats. (Not as wide as XML, but a range for which XML is proving unsatisfactory.) Other people are trying to do similar things with other formats. This is all good. If any of us succeed, we will achieve a welcome reduction in syntaxes, with the attendant benefits for learning and tooling.

  3. Mark Baker Post author

    I think it also bears saying that if your previous authoring tool was a cumbersome desktop publishing tool, you may find the overall authoring experience net less cumbersome with XML since most XML languages take away a lot of the overhead of the DTP environment (which is what makes DTP tools inherently cumbersome).

    But to someone coming from more casual authoring tools, XML authoring is cumbersome, even in the best editors. We should note the attempts to spread DTP tools outside the niche of publication departments went nowhere, so there is a clear precedent here. Markdown actually tells us that there is an opening to spread non-WYSIWYG structured authoring to a wider community. But if we want more structured than Markdown provides than we need a less cumbersome format than XML to do it.

  4. Barry Grenon

    What about using HTML forms?

    Could html forms + Markdown be a way forward? Forms to apply the structured layer behind the scenes, Markdown for text entry within the forms?

    I like Markdown for personal writing. But use wikis for work – because of the ability to create on-the-fly forms, but also the fact that you can craft queries against the semantics stored behind the form. You can do some complex things that way – semantics and queries in the background, while keeping the forms pretty simple.

    It’d be great if a stable format came along. Hopefully attempts don’t result in this! 🙂

    Really, I do hope techcomm settles on something. You need a community to rally around an approach, so folks can start sharing solutions, templates, extensions, and so on.

    Any hope for WordPress?

    1. Mark Baker Post author

      Hi Barry. Thanks for the comment

      Forms based approaches have been around for a while and I certainly think they work well for some applications. They have the key advantage that they make structure clear. On the other hand I don’t think they work well for every kind of document or every kind of structure — such as those with repeating or optional elements. They also tend to tie you to a particular authoring system. Hard to use a forms based approach on a plane or on the beach.

      The problem with universal standard that attempt to cover everyone’s use case is that if there is any diversity in the use cases, the one standard that covers them all will be more complex and harder to use than a system tailor made to fit just one.

      In cases where we have a single use case with diverse implementations, the introduction of a standard does then to eliminate most of the competing standards. USB is a good example of this. So is DocBook, which slowly replaced many variations on the theme of complex generic document markup.

      But XML is the classic case of a format trying to do too much and thus sucking at a lot of the things it tries to do. On the other hand, Markdown is so application specific that its range is limited and variations on the theme like AsciiDoc and RestructuredText proliferate for slightly different use cases.

      I’m hoping we can develop something to serve a useful subset of XML’s territory (just as JSON has done) in a way that is extensible enough to eliminate the need for lots of mini languages while also simple enough to use that people don’t feel the need for more ad hoc solutions.

  5. Wayne B.

    So maybe we should write another entry on why markdown sucks. Because it does. It’s not markdown’s fault, it’s John Gruber’s fault. There was an effort to try to standardize markdown, but John basically has said, I’m king and if you want to mess with my design, bugger off and design your own. But yet, because there is no governing body, markdown has evolved and devolved into different camps and what you are doing may not not be 100% markdown depending on where you are trying to render or use it (now it’s starting to sound like the XML description you provided). Enter CommonMark. Now a group of people are trying to standardize markdown, but of course since John doesn’t want any part of standardization, the can’t call it markdown, so CommonMark it is! And of course that means that what is standardized in CommonMark doesn’t always transfer to markdown. Ah, the joys of and ease of use of markdown.

    So, we’re basically no better off with markdown. In fact, we’re really back to Eliot Kimber’s mantra of “all tools suck… some just suck less.” 🙂

    1. Mark Baker Post author

      Thanks for the comment, Wayne

      Actually, this posts talks about why Markdown sucks (though you do bring up another issue that I did not mention).

      Markdown helps to illustrate why XML sucks as a format to write in.

      But Markdown sucks because of the lack of extensible semantics or enforceable constraints.

      I’m a huge advocate of structured writing, so I would never suggest adopting Markdown generally. But I also recognize that the defects of XML get in the way of people adopting structured writing — not to mention how much they annoy me personally. I want a format that combines Markdown’s ease of reading and writing with XML’s capacity for structure, extensibility, and constraints.

  6. Alex Knappe

    Hi Mark,
    I understand your disgust of writing in XML. Yes it sucks. So does writing in HTML source, in Indesign, Framemaker and Word (using paragraph styling).
    I don’t the markup itself is the main problem here or the Enter key or the readability.
    The main problem we’re facing is the question about what doing first:
    Create the content at large and add semantics/structure later (Notepad approach), create the semantics/structure first and add content later (DITA/XML approach), create content and add structure/semantics simultaneously (DTP approach) or create structure/semantics and add content simultaneously (extended Notepad approach)?
    Note there’s a slight difference between the latter two.
    Each of these approaches is more or less cumbersome.
    The Notepad approach makes typing easy and fast, but makes structuring of large content pieces very difficult.
    The DITA/XML approach makes structuring and creating semantics quite easy, but has mayor drawbacks, when it comes to simply writing stuff up (quoting my boss ‘I just want to write!’).
    The DTP approach is somewhat in the middle ground here but tends to make both sides somewhat cumbersome.
    The extended Notepad approach works with lightweight markup. This is actually my favorite approach, as using simple tags, that have no other requirements than being unique and parseable, are allowing you:
    – to write down stuff without caring much about needed element boundaries
    – being able to pretty-print your draft document for readability
    – being able to place structure and semantics while you type

    Just a small example:
    ($h1)headline goes here
    Some text.
    Some more…
    Some text with a >>>quote<<< here.
    ($)a list point
    )another one
    inline ///graphics.gif/// here.
    printf (‘Hello world’);
    printf (xyz);

    I think you get what I mean. I’ve used such minimalist markup in several occasions. Works with parsers, works with humans (they get a small list of what means what), works for the author.
    All it takes is a small set of rules how to interpret those tags and how to handle new lines. This makes it possible to transfer such a markup into pretty much everything.

    1. Mark Baker Post author

      Hi Alex,

      I agree that XML syntax is not the only cumbersome part of structured writing. All the more reason it seems to me to try to find something less cumbersome, since the cumbersomeness of actually creating structure is not something we can entirely eliminate.

      That said, a lot of the cumbersomeness of the structures we are asked to create is more a result of the abstractions involved than anything else. Abstraction is hard and some of the formats you list, DITA in particular, tend to be heavy with abstractions.

      This is why I prefer to move to more subject-domain structures which are far more concrete, and therefore less cumbersome because less abstract.

  7. Andrea Shanahan

    Hi Mark, I read this on LinkedIn last night and this morning wanted to “Like” it but now cannot find it. Can you assist, please? I also tried connecting but LI won’t let me as an “I don’t know Mark”. I think we might have met at some point in the past — some TW forum or other — but as familiar as you seem, I can’t place it. Are you a speaker at conferences or STC by any chance?

  8. Andrea Shanahan

    (OK — I “followed” you on LI and this might have worked. Don’t know how I came across the article last night! 🙂

  9. Diego Schiavon

    My company wanted to get engineers to write documentation. It had to be structured, but had to happen in a user-friendly tool, so it could not be an XML editor like Oxygen.

    They looked at WYSIWIG editors and rejected them, largely because of what you write: the fact that a tool generates valid DITA does not mean it can create a meaningful structure.

    Eventually they settled for Author-IT, largely because it can publish to XML and can take advantage of XML, but is not XML. It is, as you write, a “middle way”.

    I do not know if it was the best choice they could have made, but it certainly resonates with what you wrote.

    Although, “XML sucks” is a bit of a dangerous way of putting it. Everything in the way we write documentation uses XML. We publish to XML and transform to XSL:FO. Images are SVGs. We import error messages as XML. Giving up XML would mean a leap in the dark, and nothing out there has remotely the same capabilities.

    1. Mark Baker Post author

      Thanks for the comment, Diego.

      Yes, “XML sucks for some things even thought it is very useful for others” would have been a more accurate title. Not as catchy, though. 😉

      I hadn’t thought of Author-IT as a middle way. Of course, the nature of middle ways is that there will be several of them and they will be diverse.


Leave a Reply