I was reading JoAnn Hackos article on easy DITA authoring solutions and it got me thinking about what the word “easy” means in regard to DITA or any similarly complex technology.
Can an editing interface make DITA easy? Some DITA consultants that I know complain bitterly about tools that make that claim. DITA may be many things, they say, but easy is not one of them. To present it as being easy is only going to set people up for disappointment and failure.
If they are right, can any DITA editor legitimately claim to be “easy”?
Two kinds of easy
I think they can, but we need to make a distinction between two different kinds of easy: conceptually easy, and operationally easy. Conceptually easy means the task you are asking someone to do is easy to understand. They know exactly what you are asking them to do, and they can tell at once when they have successfully completed the task. Climbing Everest is conceptually easy. It is operationally hard.
Operationally hard means that the steps you have to take to complete a task are difficult or cumbersome to perform, no matter how well you understand them. Writing XML by hand is operationally hard. Even if you understand exactly what you are supposed to be doing, your chances of getting it right the first time without the assistance of a schema-aware editor are slight.
Some tasks are both conceptually hard and operationally hard. Computer programming is one of these. It is operationally difficult to write code that compiles. It is conceptually difficult to create a robust and efficient computing architecture. Many programmers are proficient at the former and deficient at the latter.
If a task is conceptually hard, you will have a very hard time getting people to complete it successfully on a part time basis. They won’t understand what is being asked of them and they won’t know when they have completed the task successfully. Making the task operationally easy won’t shift the needle very much in terms of effective participation.
Structured writing: two kinds of hard
Classic technical communication — before writers got tangled in publishing issues — was conceptually difficult but operationally easy: you just typed up your manuscript and handed it to a typesetter. Traditional publishing, on the other hand, is conceptually simple, but operationally complex.
Desktop publishing combined classic tech comm and traditional publishing to create a task that was both conceptually hard and operationally hard. This led to many tech comm candidates being hired more on their operational skills than their conceptual knowledge. (It seems to be the pattern that when something is both conceptually hard and operationally hard, operational competence will be preferred in hiring. Perverse, perhaps, but operational competence and its outcomes are easier to measure and we tend to manage for the properties that are easiest to measure.)
Structured writing has, by and large, continued this pattern of being both conceptually hard and operationally hard. In fact, for tech writers it has created the situation where they are expected to do a conceptually hard job (technical communication) using a conceptually hard system (DITA) that is also operationally hard to use.
It is no wonder that many are turning to wiki platforms instead, which are both conceptually and operationally easy. (But don’t confuse operationally easy with operationally efficient.)
No editor, by itself, is going to make DITA (or technical communication) conceptually easy. But given the amount of difficulties authors face, making it more operationally easy is very much worth while.
Making structured writing easier
Is there anything we could do to make structured writing conceptually (and operationally) easier for technical writers? Yes. The thing that makes things conceptually difficult is abstraction. The concrete is easy; the abstract is hard. If we can make a task more concrete, we can make it conceptually easier. (And often operationally easier as well.)
Desktop publishing is highly concrete. The appearance of text on a page, the way lines break over pages, the way table cells align, are all highly concrete things. That is what makes desktop publishing conceptually easy, even if it is often operationally hard.
Most contemporary approaches to structured content are quite abstract. DITA, in particular, relies on a number of abstractions with which the writer has to grapple, and contains a significant amount of imperative markup (effectively, instructions to the processor) which means the writer is effectively coding as they write. This can enable some powerful effects, but it is conceptually very difficult.
It we returned to concrete formatting as our standard, we would restore conceptual ease, but we would lose structured writing and all of its potential benefits in terms of quality and efficiency.
But there is another option: we can create markup that is semantically concrete. Semantically concrete markup — which I usually refer to as declarative markup — simply declares what a piece of content is about: what things in the real world it is describing or referring to. This, it turns out, is enough to power most of the publishing, reuse, and information exchange goals we have for structured content, and to achieve some that have previously been out of reach.
Creating concrete semantics makes the structured writing task concrete, and therefore conceptually easier. Combine this with the capacity of structured editors to make structured writing operationally easier, and you have an approach to structured writing that can be extended much more widely across the organization.
EDITED (Webinar date changed): I’ll be examining how to do this in my webinar Structure and Collaboration with Scott Abel on December 12, 2014. The webinar is sponsored by Oxygen XML (disclosure: Oxygen is both a partner and a client of Analecta Communications). I’ll be talking about how to make structured writing both conceptually and operationally easier for collaborators, and showing how Oxygen’s forms-based authoring feature can make structured writing operationally easier. Sign up now.
Very interesting, Mark. I have always been asked the question “Can we make this easier for [engineers, reviewers, marketing]?” but I never thought to ask “Easier how?” Define your criteria of easier before you decide to buy a solution which claims to make DITA easy. 🙂
However, I still have trouble imagining something. How would we establish semantics for highly versatile content? Should we specialize
<
dl> into all kinds of , , etc.? And if we do, how do we make sure users take the time to review our markup dictionary and select the right tag?
I wrote a bunch of tags in my comment, but they all got lost. 🙂 I meant to say: specialize definition list into all kinds of featureList, partList, portfolioItems, etc.
Thanks for the comment, Pawel
That is certainly part of it, but it needs to be more holistic than that if you want to create a concrete semantic authoring experience for the author.
I don’t particularly recommend DITA for this kind of thing, because it is far more complex than it needs to be in almost every dimension. However, since so many people are committed to DITA systems and need to find a way to make them work, specialization might be one route to concrete semantic markup.
An alternate route is to create the concrete semantics markup independently and translate it to DITA for reuse or publishing. However you approach it, though, the point is to maximize the amount of valid correct semantics you capture, not to use DITA for as many things as possible.
What makes content versatile is not the current format it is in, but how much you know about what it says. Syntactic interoperability is worth very little compared to semantic interoperability. To achieve syntactic interoperability, you need to get the whole world to agree on syntax — which is not going to happen. To achieve semantic interoperability, you only need to ensure that you know what the semantics of your content are. If you you know that, you can translate to any syntax, and to any taxonomy, that anyone needs at a later date.
I suggest that you ease up on emotional superlatives (“[DITA is] far more complex than it needs to be in almost every dimension”). DITA requires only a title at a minimum, and from there on the author only need see what they need to see. Authoring schemas are supposed to do just that–provide only a subset of any larger architecture behind any authoring system.
Pawel has provide a set of names of semantic data structures. They are effectively concrete data semantics–if they happen to map to an underlying data architecture, that should not matter. Ignore specialization; his data model snippet could drop into your one-off schemas or into DITA without a care for the context. Our problem is to represent the semantic data model to the user in a way that makes his or her interaction with it as natural as possible. In the Web world, it’s all about forms. In the world of assisted structured authoring tools, it’s all about how well the tool provisions the hoped for interface for easing the author’s experience.
So in your careful separation of abstract from conceptual, I suggest that you keep your well-known disdain for DITA appropriately abstract as well. If we can define a common semantic data model for part list, then each of us as implementers can apply our domain expertise to mapping that requirement to an “evaluated as appropriate” editing solution, and then compare how even the authoring experience is between both–that ‘s all that really matters. Let’s get to that level and try to stay classy.
And by the way, at this moment we should all be watching the news and glorying in the European Space Agency’s attempt to land a probe on a comet. I’m off watching that for the rest of the morning.
Thanks for the comment, Don.
You are quite correct that you can use DITA to create concrete semantic markup, and that from an author’s point of view that is all that matters.
However, you and I both know that that is not how DITA is commonly used or implemented. To most people, DITA means the topic types that are blessed by the technical committee, which are complex and abstract — of necessity, given the breadth of uses they are designed to support.
From an information architect’s point of view, of course, it matters a great deal how easy it is to define concrete semantic markup, and I submit that there are easier ways to do it than DITA. In the larger scheme of things, however, that is only one factor in an overall technology adoption decision.
I wish I could find a more convincing way to express the idea that a semantic data model does not need to be common. The fact that it is semantic means that it is comprehensible, and therefore translatable, to any domain with compatible semantics. The reason to store data semantically is precisely to make it independent of any particular format. Because it is semantic, you don’t need a common model. To make it semantic, you need a correct model.
The notion that you need a common model to exchange semantics is one of the most damaging ideas affecting content technologies today, because it forces us into abstractions and generalizations that add complexity and make it difficult for people to learn and use. The result of that difficulty is that the quality of the data is often poor. We end up with a common model, but not common semantics, dues to errors and variations in how the model was used.
As for my “disdain” for DITA, I actually find a great deal to admire in DITA, and I have said so many times. DITA showed that if we could package a set of structured authoring ideas that have been around individually for years, we could make it much easier for people to grasp, accept, and implement those ideas. That was a great service to structured authoring, and thus to the content community generally. It is a technology that may of us are going to be using, myself included, for many years. But I will not apologize for calling DITA complex, a sentiment I am not alone in expressing. In fact, it is a mark of DITA’s success that it is now being used by a lot of people who don’t particularly like it.
I would be delighted to keep the discussion classy, which, I think, has to begin with not accusing one another of a lack of class because of a disagreement about the merits of a particular technology.
Point accepted on disagreement not being a matter of class. With great class, I suspect you will eventually find your statement about commonality of data models may be hard to defend in environments where interchange without Nsquared-1 protocol converters is desired.
An ideal data model for a particular domain may be different depending on its use. A model that maps directly to the way a SME thinks about knowledge may be different from a model that an engineer wants to use to analyze performance or that a programmer wants to automate a business process, and even for the publisher who needs to flatten or reorganize all of the previous models for presentation.
Should these all be driven by the same common, one-off data model? I can certainly see that as necessary for some types of data. But data can be so darn messy! Designers often deal with a spectrum of specificity that creates tension between markup designs that are more specific (description of human biometrics, for example) or more general (HTML5 just for contrast).
Your post is about how to make structured writing easier for authors, but there are other stakeholders in that design. Once we interview all the potential actors upon that data, each may require its own protocol converter for the use of that single source, as you suggest. On the other hand, the economics of the overall situation may make some generalization necessary. We can’t draw too hard a line here… the successful integrator will need to tread between principle and politics in proposing a design that pleases both authors and users of that content.
And so I am very anxious for your webinar and subsequent posts where I understand we’ll get to explore some principles of modeling concrete semantics and integrating them with end-to-end tools.
(And speaking of concrete semantics, I was amused to discover that there IS an XML schema that includes the semantics of concrete: “agcXML is a set of eXtensible Markup Language (XML) schemas designed to automate and streamline the exchange of information during the building design and construction process.” It may not be the Rosetta Stone of semantic interchange, but I suppose we could cast one using it.)
Hi Don,
Yes, these are all significant difficulties. The real base problem with semantic interchange is that it is inherently very hard to do. As you note, the SME, the engineer, the programmer, and the publisher all have different concerns, even when dealing with the same subject matter, and therefore care about different semantics.
Creating domain-specific semantics does not have to mean Nsquared-1 protocol converters, however. You can still have a common semantic data model that is used for many semantic exchange purposes.
One of the difficulties with such models is that you have to choose between the very general and simple, which is easier to use, but less semantically specific, and the complex and specific, which is very hard to learn, and which tends to be used inconsistently, compromising the value of the semantics it models.
If you allow each domain to build its own model, it can be both simple and semantically specific, which will make it more precise and easier to use, resulting in better data collection. We can then map this high quality data from the local domain into the global domain in a highly consistent way. That requires only n protocol converters. The quality of the data in the global domain will be much higher — making reuse and repurposing much easier and more valuable. And the authors in the individual domains will be happier because they don’t have to learn a semantics or a vocabulary from outside their domain.
And this approach makes the decision about the global model much easier. If you are not asking any individual person to write in this model, you can make it much more expressive, precise, and complex.
And when you need a level of semantic exchange between domain that the central model does not accommodate, you still have the option to build a more precise protocol converter between the domains that need to share at that level.
There is nothing exotic about this model. You use it every time you go to the ATM or order a book from Amazon. It is how most things work in the technology world. But the content world has long been out of step — expecting authors to work directly in the global model.
And, of course, DITA’s specialization mechanism was designed for this kind of model too. The problem, in my eyes, was that it was also designed to be usable without specialization, and that involved the creation of general mechanisms for things like linking and reuse that involve both abstraction and imperative markup that it is not immediately obvious how to specialize out.
In other words, you can specialize a declarative element to a more precise declarative element, and presumable specialize an imperative element to a more precise imperative element, but how do you specialize from an imperative element to a declarative element? (Especially if you want to retain meaningful fallback processing?)
Publishing is essentially a process of converting the declarative to the imperative. Most processes involve multiple step in which more of the declarative is turned into the more imperative. Standard tools, from DITA to FrameMaker to DocBook, ask the author to do some of that translation from declarative to imperative as they write, which makes the writing/publishing interface more flexible, but at the cost of asking authors to grapple with a set of abstractions that make the task conceptually difficult.
What I am proposing is a more declarative markup, with little or no imperatives embedded in it. This necessarily means you will need an additional layer of software that converts that declarative markup into more imperative markup on its way downstream. (Which is something that the SPFE OT now does, by the way, because it now includes an DITA presentation layer — something I expect will be a part of a common use case.)
I’m completely with you on this one Mark,
in the past years I’ve seen many structured implementations, most of them falling apart in terms of data quality.
My personal Top 10 why things were done wrong are:
1. It was possible to insert that element there.
2. My layout needed this!
3. There’s a special element for this?
4. We always do it this way.
5. This just did not make sense to me.
6. Why would I insert the metadata here?
7. I didn’t know what that element meant to be.
8. That means what?
9. I thought it was supposed to be there.
10. I didn’t care.
This pretty much sums it up, what is wrong with most structured systems.
They’re unflexible (in sense of DTP), semantically diffuse, incomprehensive and lax.
Originally I was one of the guys that loved the idea of removing all layout from content, but I had to figure out the hard way, that this ain’t working this simple.
Layout is to quite some extent important carrier of context and information. Therefore it needs to be respected upon creation of structure.
The same goes for strong semantics. They’re able to provide tons of metadata just by being created intelligently.
Being unflexible creates creativity on side of the author. They will find a way to do what they want (not what was intended).
Being lax only welcomes creativity (in terms of working unintended), lazyness and not being interested.
To avoid all this you need acceptance on sides of the authors. This means a flexible (in terms of layout), comprehensive (in terms of structure making sense), semantically strong and restrictive structure.
I personally dislike DITA a lot – but even in DITA this could be achieved, specializing the hell out of it.
Making things easy means making things understandable and obvious. But ain’t that exactly what our job is?
Shouldn’t it then be in our responsibility to set up structures for our colleagues/ authors that are understandable and obvious?
I think so. And I also think this can only be achieved, if we work with domain specific structures, as our target audience changes from domain to domain.
Generalistic structures simply won’t work out for this. Engineers and Artists don’t speak the same language when they talk about their own domain.
So why should structures try to do this?
Hi Alex,
Yes, that is pretty much what goes wrong, and why is goes wrong. And also pretty much what I think is the only hope of getting it right.
In any documentation, I always look for ‘easy’ reading. Plain English, analogies to the real world, concrete rather than specific, one method rather than many to solve a specific user need. Instructions written with command verbs and good layout.
Only then do I look at the way you put the information together and design it for the ease of the user.
Nick Wright – Designer of StyleWriter – the plain English editor
Editor Software
Trial and demos at http://www.editorsoftware.com
Pingback: 5 reasons why content development vendors have it wrong
Due to technical issues, we had to reschedule the webinar. It is now scheduled for December 12, 2014: https://www.brighttalk.com/webcast/9273/135129.