One of the great hopes of content management is that taxonomy will save us. Developing a consistent and rigorous taxonomy, it is hoped, will remove inconsistencies from how describe and label things, enabling us to find and reuse content much more easily. It is a lovely vision, and it is doomed to failure.
The underlying assumption of this confidence in taxonomy is that differences in terminology are accidental and that if we simply assign clear and well defined meanings to the terms we use, we can all use the same vocabulary and communicate more clearly and with less ambiguity.
The problem is, words don’t actually work like that. Take taxonomy, for instance. It a taxonomy simply a list of words, or is it a formal naming scheme with rules about how to add new names. It is concerned only with assigning names to objects (such as plants or animals) or does it cover all parts of speech? How is it different from a dictionary, a glossary, or a lexicon? (And how are those things different from each other?) Is it flat or hierarchical?
All these possible qualifications of the word cluster around the same underlying idea — words mean stuff — but the differences in implication, the details of what is required to qualify or function as a taxonomy, varies depending on what you want to use it for. And each of us hears the word slightly differently based on our prior experience. We will sometimes object strenuously if the word is used of something that does not have some particular characteristic that it always has in our world.
No taxonomy is going to iron out all of these variations of meaning. There are too many subtle variations, and even if we could sort them all out and assign distinct terms to all of them, no one is going to learn and use those distinct terms when “taxonomy” is entirely adequate to their communication needs most of the time.
The average person knows a lot of words. 50,000 by some estimates, though we probably don’t use all the ones we know. So isn’t there room for a few more so that we can be clear about all the various ways there are of associating words and meanings? Maybe, but what about all the other words with similarly diverse shades of meaning. Are we going to sort out all those as well, create new words for them all, and remember and correctly use all those new words?
Words are rather like phyllo pastry. They are made up of hundreds of layers of meaning, but most of the time, we see them as one thing and eat them whole. Things get awkward and sticky when we start trying to pull the layers apart because they don’t separate easily, and by the time you are done you are left with an unrecognizable mess.
Take the word “DITA” for instance. I, and others in tech comm, occasional say things critical of DITA. We are very often scolded by DITA stalwarts who say, “that’s not DITA”. Whatever the criticism is, they will frequently respond that what we are criticizing is not in the spec, or is only a default, or it is not required by the architecture, or is just the way people have chosen to use it.
Is this response fair? Yes and no. Assuming that their response is factually correct, it demonstrates that the criticism is not correct for one of the thousand layers that make up the phyllo pastry of the word “DITA”. That does not mean it is not true of several hundred of those layers, often ones added by the wider community of DITA users who do not sit on the standards committee or contribute to the open toolkit. DITA to each of them means what they have been taught and experiences of the DITA system they have used as well as all of the ideas around topic typing and reuse that have been preached in the name of DITA.
Pull all those layers of pastry apart and you will find all kinds of disagreements and inconsistencies, and certainly many things that are not required by the spec or the architecture. But at the same time, they are part of what DITA means to the wider world.
One of the harsh lessons of both technology and language is that just because you invent something does not mean you own how it used and understood. As a whole pastry — all the layers together — the word “DITA” means something quite different from what it means to the individual members of the DITA committee.
And I would be willing to bet that each of those individual committee members have a few different layers of pastry in their own definitions.
Most of the time, these subtleties in how we understand words go unnoticed. As long as we all feel that we are broadly in agreement about a subject, we don’t notice subtle difference in what we mean by the words. It is only when we disagree — or when we find that the proposition we thought we all agreed on has led someone to act in a way we disagree with — that we start to notice that we really don’t all mean exactly the same thing.
In this sense, of course, when I criticise DITA, I am criticizing my own particular philo pastry definition of the word, and in that sense my interlocutor is right to say that my criticism is invalid in respect to their phyllo pastry definition of the same word. The broader question is whether my criticism is fair of the shared set of pastry layers in the definitions of the broader audience I am talking to.
And this is why why argue, and why arguing productively is difficult and frustrating and leads so often to heated words and raised voices; because we are so often talking slightly past each other, and having a very hard time bridging the gap.
Taxonomy cannot fix any of this. Taxonomy cannot reduce the phyllo pastry of associations around each word and resolve them into a single cohesive layer. It can’t because every one of those phyllo layers represents some small or large variation in experience and understanding between individuals, a variation that is founded not only in the individual word but in all its associations to other words and experiences that exist uniquely in each individual’s brain. You can’t iron words flat unless you are also willing to iron brains flat.
But if we cannot agree on the precise meaning of individual words, how do we do reached shared meaning at all? Here it is necessary to point out the difference between computer languages and human languages. We are adopting many idea from computer science in the service of content, and we are absolutely right to do so. But a computer program is written for a specific computer architecture, one that is replicated millions of times over in different computers. There are no phyllo pastry definitions of either instructions or data across all those instances of the architecture. They are all exactly the same. (Well, ish, but that’s another story.)
If we could bring that same uniformity of understanding to content, we would be able to do some fantastic things in structured writing and content management. But we don’t write for the flawless silicone of a computer chip. We write for the flakey pastry of individual human brains.
This does not mean that we can’t achieve greater precision that the varied flakey pastry associations each person has with a work like “DITA” or “taxonomy”. But we don’t do it the way computers do it, by defining terms. We do it the way humans do it, by telling stories.
Consider the flakey pastry of associations attached to the word “store”. Is it even a noun or a verb? And if it is either one, there is still a vast array of meanings, a vast set of possible images that might jump into your head.
But then suppose I tell you a story:
Dave went to the store.
That narrows down the associations a bit, there there are still many possible associations. So we expand on the story:
Dave went to the store to buy milk because the baby was hungry.
Now you probably have a picture in your head. Dave has a face. You see the baby, you see the store, the shelves, the fridge, the milk, the cashier, the other shoppers. You might be picturing a corner store, a supermarket, or a gas station. It might be noon or midnight. Dave might be in a suit or in an overcoat pulled over pajamas. You have filled all of that in from your experience to make the picture more concrete.
Your picture is still going to be different from mine, of course. It would need a much longer story to eliminate all the differences. But that is not how we use language. We are more economical than that. We tell enough of a story to get the other person to the point where they will have the reaction we want — the point where further details are not going to change how they react in any way that matters.
Of course, this is not easy. Communication is hard. Storytelling is hard. We can’t always tell if we have told too much story or not enough. And if we tell too much we can introduce details that actually distract the reader from the point we wished them to focus on. Communication is a fundamentally uncertain business that often requires conversation, correction, imagination, and apology.
Taxonomy can’t fix that. Nor can taxonomy classify stories so that the right story is always retrieved and delivered to the exact right person at the exact right time. Why not? Because the way we express what stories we want to hear is by telling stories. Stories are not endpoints, they are the fabric of communication. Telling stories is the only way to increase precision in general communication between human beings. We cannot precisely express the story we need except with another story.
In other words, Wikipedia is not Netflix. Treating stories as endpoints works when you are peddling movies and TV shows that designed for passive consumption, and operate with an internal logic that tries as hard as possible to avoid requiring any external information. But it does not work when you are creating an environment for active research where there is another story behind every assertion that every other story makes.
One of the things that makes social media such a dominant force in the modern marketplace is that is consists of an exchange of stories.
And this is also the essence of what hypertext is: a set of relationships between stories that follow the lines of the story itself and the ways in which it intersects with other stories. Stories connected to stories not as a librarian would catalog them, but as a storyteller would relate them. The connections, in other words, are intrinsic to the stories themselves.
This does not mean that we can’t usefully manipulate content with machines, or that taxonomies have no value in the process. We can, and we do, and we should. But we do need to recognize that these things are not a panacea, and that they do not scale to the complex expression of meaning between people and across domains.