Fine chunking and translation apparently don’t mix either

By | 2011/04/29

The one concession I have been willing to make to the fine chunking characteristic of many DITA implementations is that it was a boon to translation. Apparently not so, according to a recent blog post on Content Rules.

The problem is that fine chunking tends to obscure context, making the content impossible to translate reliably. And the real kicker in this problem is that even if the translator is given the means to see the content in the current context or contexts, the source may be reused in new contexts later without the translator being involved again or ever seeing the content in its new context. (This is where the savings are realized, after all.)

There is significant debate about the context issue in the comments on the post. Some argue that if you are doing it properly, the issue of context should not arise because the chunks should be context free. The problem I see with this is that systems like DITA (any system that allows authors to chunk freely) relies on the author of every chunk, and the authors who subsequently reuse that chunk, to ensure that it is actually context free for purposes of translation.

That is a really tough thing for an author to do. It requires a great deal of forethought, a knowledge of translation issues that the author is unlikely to have, and a degree of prescience to work out how this chunk might reasonably be reused in the future. The person doing the reusing, similarly, has to consider if they are using the content in a way and in a context, that it was intended to be used.

All that is overhead. It is writer time overhead, and it generally will involve content management overhead as well. And it is overhead that contributes nothing to the writer meeting their individual goals and deadlines. Which means, of course, they won’t do it.

Reuse is sold as a means to reduce overhead. But the more people do reuse, the more we find that it creates an overhead of its own that can eventually outweigh the savings it was supposed to create. Unwilling to pay the overhead in their current schedule, people put it off. It becomes a form of deficit spending, one that accumulates a growing content management debt, and as the debt mounts, and debt service charges mount, eating into current expenditures, the only options seem to be to borrow more heavily. At lot of pubs organizations, I fear, are turning themselves into the next Greece or Portugal.

Building systems that give people enough rope to hang themselves will inevitably lead to gallows. We need to start taking a more long-term, systems-oriented approach to how we manage content.

2 thoughts on “Fine chunking and translation apparently don’t mix either

  1. kevinmcl

    For what it’s worth, I write WebHelp for cryptographic hardware/software products – HSMs, if that means anything to you – and the second-most-common complaint I get is that my info doesn’t have enough (or the correct) context.

    It started out in the late nineties as the (lately) much-despised 150 and 400-page FrameMaker-created documents (Install Guides, Configuration Guides, Admin & Maintenance Guides, Programmer or SDK manuals…
    About five or six years later, I pulled it all into RoboHelp and broke it up into topics and groups of topics that became WebHelp. And we stopped publishing printed or even PDF manuals.

    I did that because we had been providing a Command-Line Interface to all our products, but had been moving toward a GUI, and I wanted to be ready (finally, hallelujah!) to provide context-sensitive help. It turns out that I made that move about six years too soon (so far) – we’ve still got CLI for everything, and no real GUI. And the products just keep selling. Most customers seem just as happy to view “the docs” with a browser as with a PDF reader. Nobody expects printed, bound manuals any more, and only the rare few even ask if there’s a PDF version that they could print out.

    But… as much as they want the info to be all small and chunk-ish and easy to find with the search tool, they also want it to be explained in the context of their industry or governmental organization.

    On the one side, I’ve got “experts” telling me that I should make all my topics/chunks as context-free as possible, and on the other side I’ve got customers wanting the procedures and explanations put into the context of what they are doing, not what some unrelated industry or government department does. And definitely not so context-free that they don’t quite know why they’d get into this-or-that procedure in the first place.

    I get: You should provide steps and use language, terminology, context that makes sense in the larger overall picture of what each major customer is trying to accomplish, even if that means explaining the same thing several different ways, or repeating a lot of material.
    I also get: You should write a thing once, and then never deviate from that wording. Synonyms and alternate ways of viewing/explaining a topic or a procedure will only confuse and annoy the users.

    Which one is true?

    Some wise-ass will say “They are both true. You should write as many different versions of your customer docs as there are types of customer or as there are different industries and niches. They’ll be happy to pay the additional premium and/or put up with the delay-to-market that that entails.”

    Another wise-ass will say “Are you kidding? These things barely make each competitive market window at our current accelerated (read compressed and compromised) rate of development and testing schedules, and piling on more resources will price us out of the market!”

    So, mostly I just read these sage discussions with all the certainty and smugness and say “Meh. Today one of you is right, tomorrow the other one will be right, and you’ll swap again the day after, and anybody trying to pick which one will be wrong at least half the time, but the delays built into product specification, development, testing, and release will ensure that the choice and the reasons for making it (as opposed to the opposite choice) will be a historical blip that’s at least six months to two years out-of-date anyway.”

    Besides, GoogleWikiPedia will soon make all such considerations obsolete anyway. 🙂

    With that said, I _do_ agree on the “mechanical” observations like “every page is indeed page one to somebody” and “chunking too finely does indeed destroy the utility of metadata”.

    It’s good to see observations that either bring up something nobody has thought of, or state something that everybody sorta knew in the backs of their heads but needed somebody to articulate. I contrast that sort of writing with the evangelical, flavor-of-the-month stuff.

    But then, I shamelessly do both and mix ’em liberally. Hah!

    Reply
    1. Mark Baker Post author

      Hi kevinmcl,

      Thanks for the comment. I think you are right on the money one the issue of context. The recommendation for writing context free topics comes from the desire to enable reuse of content. Naturally, you can’t reuse content in more than one context if the topic established a particular context.
      On the other hand, a topic that does not establish its context leave the reader feeling lost. I assume the the people who build their entire authoring strategy around reuse are banking on the larger context into which the topic is inserted to provide the context for it. I don’t buy this. People don’t navigate into the context of a topic. They follow a search directly to the topic without passing through anything that establishes a context. If the topic itself does not establish context, the reader will not know what the context is.
      This is why the blog is called “Every Page is Page One”. Every topic needs to be written as if it were the first page for the reader — because it is. This means that a topic designed to be an Every Page is Page One topic needs to establish its context.

      Reply

Leave a Reply