The Age of the Content Manager

By | 2015/09/28

When I started my career in tech writing, it was the age of the writer. Tech writers tended to work independently on a single book for months at a time. Better, for many, they not only got to write the book, they got to design it and shepherd it through the publication process. At the end of the process a book arrived from the printer and you got to keep a copy — I still have several. It was, from beginning to end, your work, your product, your book.

Fewer of us get to work that way today. We now live in the age of the content manager. Writers contribute chunks of text to content management systems that spit them out in various combinations. There is no end-to-end ownership. Not everyone works like this, but it is an increasingly dominant model, and the model which just about every pundit in the industry is urging companies to adopt.

It is not just in the workplace that we see content management predominate. Back in the day, tech comm conferences were dominated by discussions of writerly subjects. They were about writing. They were about rhetoric. They were about learning. Today, they are dominated by content management. They are about findability. They are about taxonomy. They are about reuse. They are about databases and CMSs.

Writerly virtues, alas, often take a back seat to content management virtues. Improved percentage of reuse becomes the goal to strive for. These are the things we are judged on in the age of the content manager.

I bear a portion of the credit/blame for this situation. My 1995 paper on Component Based Information Development (Proceedings of SGML 95) called for a move away from managing documents to managing components of documents. Maybe I just like to blow against the wind, but as hard as I campaigned for component content management in the age of the writer, I now find myself increasingly campaigning for writerly virtues in the age of the content manager.

Make no mistake. We need to manage content. We don’t necessarily need to make percentage of reuse our primary metric, but we do need to collaborate on the production and management of content. I would never advocate for a return to writing books in cubicles. But I do feel that writers have lost out in the way content management is practiced today, and that content has suffered in the process.

What I had always hoped for is that structured writing and content management would become part of the writer’s toolkit and the writer’s mindset — that we would achieve a content management driven by writer’s values and writer’s appreciation of the subtlety and even the beauty of communication. Instead what we have is a content management driven by a database mindset, a mindset of rows and columns, a mindset of taxonomies where words are lined up like soldiers on parade. It is an orderly world, and internally it is a very efficient world. Sometimes it improves the actual efficiency of the content process, by whatever measurement of efficiency matters to the business.

Sometimes it does not. Content management projects often fail to deliver the hoped for productivity. Sometimes the messiness of the content refuses to submit to the orderliness of the content management system. The CMS mindset blames this on the writer for failing to fit their work to the prescribed structures. The writerly mindset blames the CMS structures for failing to represent the real messiness and complexity of the world they are trying to write about. Both may have a point. The larger point is that, 20 years after my Component Content Management paper, writers and content managers are still strange bedfellows.

Part of the reason that content management did not become part of the typical writer’s own toolkit was that the technology was simply too hard. SGML and RDBMS (the tools of choice in 1995) have given way to XML and CCMS, but they are still highly complex. There are nicer interfaces now, but they don’t hide the need to understand the underlying structures and the algorithms that process them in order to build a structured writing environment. Writers were able to learn to set up Frame’s Complex stylesheets and to use its cross reference mechanisms and TOC and index generation features because they were conceptually simple enough and fit within a domain of knowledge that writers understood. Structured writing and content management are far more abstract and require algorithmic thinking.

Thus the tools that writers use are designed and built largely by people with database and programming backgrounds. And while there are a few of us with a foot in both worlds, it does not seems like writers and content managers are much closer to understanding each other’s mindset than they were in 1995.

A big part of the problem lies in the diversity of structure that we find across the content spectrum. Nowhere is this more evident than in technical communication. I think we can usefully divide technical communication content into four categories (other forms of content may fit them as well):

  • Labeled data: This is a collection of data points with labels attached to them. This category includes tables with labeled rows and columns, lists (such as parts lists, or spec sheets) and hierarchical lists. Labeled data needs stories to explain what data means. Every label in labeled data is a reference to a story that explains the data.
  • Narrated data: The same information — data points — as with labeled data, except that rather than being presented as labeled fields, it is written as sentences. Sports stories and annual reports are full of narrated data. Many Wikipedia articles contain a data box beside the main text which contains the same data points in labeled data format as the text contains in narrated data format.
  • Structured stories: Stories are different from data points. Stories build worlds by appealing to stories that the reader already knows. They are always subject to a degree of misunderstanding because different readers understand their allusions to other stories in different ways. Structured stories are stories that follow a fixed or shared pattern for a particular type of subject. Recipes are structured stories.
  • Unstructured stories: Unstructured stories are stories that do not follow a fixed or shared pattern. While many unstructured stories could be structured, the structure of structured stories, like the labels of labeled data, is a reference back to a story, and if you follow that chain of references back, you will necessarily reach unstructured stories. Structure, in other words, is itself a story: an unstructured story that establishes a structure for telling other — structured — stories.

I have noted before that the DIKW pyramid, which makes data the base, is an inversion of the truth. Stories are the base, and data rests on a pyramid of stories, without which it would simply be noise. Yet the DIKW pyramid is an idea that often comes up in content management circles. In many ways, it expresses the content manager’s view of the world: capture the fundamental data in an orderly system in which every word means one thing and one thing only, and build everything else from that.

Systems built on that premise can solve real world business problems and save companies lots of money. But these successes are not consistently reproducible. The whole content universe cannot be reproduced from data, because this system is an inversion of the reality that stories, not data, are the base on which everything else is built.

The logic of the content management world view is that the model should be extensible across the enterprise. If stories can be created from data, then stories of every kind, to meet every need, can be created from data across the whole spectrum of content.

(I’m not suggesting that every content manager holds this view. I’m painting the contrast between the content management and the writerly view in deliberately stark terms to demonstrate the tension between them. There is a tug-of-war between these views and many people in the industry, like myself, have pulled on both sides of the rope at different times. This sort of tension is inevitable when you try to reconcile two different approaches because you want the benefits of both.)

1904 tug of war.jpg
The logic of the writerly world view suggests something very different. If stories are the foundation, and if data is precariously balanced on a pyramid of stories, doomed to become noise, or at least to be misinterpreted, by anyone who does not know the exact set of stories that explains it, then the idea of data as the solid foundation on which all content rests seems much less supportable.

Does this mean that we should go back to writing books in cubicles? Not at all. The economy of language relies on our presenting information in a way that can be used efficiently by people who know the stories on which it depends. A content world consisting entirely of unstructured stories would communicate with dreadful inefficiency. We need to produce structured stories, narrated data, and labeled data. We probably need far more of these than we currently produce. Structured writing and content management techniques are often useful tools for creating and managing them.

However, the part of this spectrum that content management has the hardest time with is actually structured stories. Content management can deal with unstructured stories by storing them as a blob and attaching a metadata record to them. This is inadequate in many ways, from the writerly point of view, and from the point of view of the foraging reader, but it meets the needs of content management to create manageable objects. The unstructured story sits inside a container with a labeled-data metadata record attached to it, and the content management system interacts with the metadata record. It has no need to involve itself at all in the authoring of the unstructured story.

With structured stories, however, things are different. The structure of a structured story is actually labeled data, but there is a story relationship between the data fields. The structure of structured stories is not generic. It is highly particular to the intersection of the subject matter and the task. Thus a recipe is highly specific to the intersection of food and the task of cooking. Current content management practice tries to reduce the specificity of structured stories down to a few specific types. Task, concept, and reference are the most common types. But structured stories come in hundreds of different types.

The content manager can choose to ignore the structure of structured stories and treat them like unstructured stories. But that leaves the author without the tools to create and manage structured stories efficiently and reliably. It also makes content management more complex and less reliable by reproducing the metadata in an external label that already exists in the structure of a structured story. (It is an iron law of the universe that if you have metadata in two different places, they will be inconsistent with each other.)

But there is another aspect of stories that needs to be managed: the relationship between stories. As we noted, stories tell stories by referring to other stories — stories that the reader is presumed to know, but often does not, or does not know well enough, or does not know by the same name. Placing stories into categories and locating them in hierarchies does nothing to express this vital story-to-story relationship.

But in the process, we need to remember that these more structured forms are dependent on a complex base of stories and of the connections and relationships between stories. The full complexity and subtlety of these relationships is beyond what any formal data structure can express (the base is always unstructured stories). But there is one form of organization that can come closer than any other to modeling those relationships and making them navigable, and that is hypertext.

Hypertext is definitely part of the world of stories. But it is not the old world of books. It is not even the old world of storytelling. It is a new world of story sharing. Because it is story-based, hypertext is less precise than a database table or a metadata record, but its mechanisms are also capable of handling labeled data. More importantly, hypertext is capable of seamlessly expressing the relationships between labeled data, narrated data, structured stories, and unstructured stories, which in turn makes the more structured elements at the top of their precarious pyramid more accessible and more understandable to the reader.

I have argued before that hypertext and content management do not currently see eye to eye. Hypertext models the ad hoc and imprecise relationships between stories that content management does not know how to deal with. But I also believe that we can learn to see hypertext as a different approach to content management — a form much more in tune with a writerly world view, a view in which communication is fundamentally about stories and the relationship between stories.

6 thoughts on “The Age of the Content Manager

  1. cud

    As always, interesting. Am I right in thinking you’re leaning toward a Cabalistic view of the Star of David, where two pyramids — inversions of each other — indicate the transcendence of matter into spirit, and the distillation of spirit into matter… A kind of bi-directional flow of mysterious forces? Not sure if that’s what you’re going for. But I’m very much open to considering knowledge and information in that light.

    I always like to scan for an inflection point in articles like this — one I found is this:
    “…data is precariously balanced on a pyramid of stories, doomed to become noise, or at least to be misinterpreted, by anyone who does not know the exact set of stories that explains it…”

    I’m afraid I can’t follow there because there is no exact set of stories. The point of creativity is unique combination of parts. The point of text (be it mathematical or linguistic) is unique combination of sub-texts. In your parlance I believe this would be the creation of unique stories, something that’s often possible without needing to add new texts (re-use). This property of text is responsible for our technological progress.

    1. Mark Baker Post author

      Thanks for the comment, Cud.

      The star of David idea is interesting, but for the moment at least I’m not going there. I’m going for straight inversion of the pyramid. Data is a distillation from stories, and you have to be very wise to extract data from stories reliably, and also very wise to interpret data reliably. Stories come from experience, not from data.

      What I think the DIKW pyramid might be getting at is that wisdom disciplines experience. If we regrard experience as producing data, wisdom recognizes that sometimes the experience is misleading and produces incorrect data. So raw data does not mean much until you run it through the process of turning it into information, knowledge, and wisdom.

      The problem with this view, though, is that is data is unreliable, the wisdom that detects that is is unreliable cannot be derived from the data. Garbage in, garbage out. Our suspicion that our data is unreliable does not come from anomalies in the data because on data is anomalous in itself. It is anomalous with respect to a story. Our suspicion that data may be anomalous arises from inconsistencies between the stories we tell about experience. Data is something we pull out of stories to examine them more rigorously to try to resolve the inconsistencies.

      This, of course, gets us into deep epistemological waters. Whence our ability to detect inconsistencies between stories? Is reason prior to experience?

      But those issues are beyond the scope of this blog. My concern here is the primary role of stories in communication, and the limits of compatibility between the story world of the writer and the data world of the content manager.

      You have a point about there being no exact set of stories, but I’m not sure I see how that is consistent with what you say about creativity and reuse. If there is a finite set of texts and it is possible to be creative by combining existing texts in new ways (reuse) then there is an exact set of resulting texts that can be mathematically circumscribed. But does text = story?

      The story I have been telling about stories over the past several posts is that stories are told by tacit reference to other stories, that language itself is simply a collection of words and phrases that are tacitly understood to refer to stories. And the difficulty of this is that people in different domains tacitly associate different stories with the same words, making misunderstanding easy, but difficult to detect. This means that the same story has to be told differently in different domains, but also that the writer can never detect the domain of understanding of the reader inerrantly, and so reader has to take part of the responsibility for detecting the domain mismatch between themselves and the author, and then either finding a different author or leaning the stories of the author’s domain.

      All of which is to say that the same text does not tell the same story in all domains, and that therefore reuse of a text across domains is not the right way to tell the same story to different audiences. Which is something writers have always understood, but content managers seem reluctant to acknowledge.

      1. cud

        Well, assuming we know exactly what a story is, then I suppose for any set of text you could say there is an exact possible set of stories. Personally I don’t think we know that much about stories. And anyway, the set of possible stories that can be pieced together from a rich body of text is bound to be huge. Further, since story depends on cultural context, and since culture constantly drifts and changes, then the set of discoverable stories would tend to drift as well.

        Drifting into drivel myself, let me pull this back… I think there is value in managing content such that you can discover units of the text that can be combined into unique stories. Yes, this is what authors have always done. Adding content management, if done correctly, just makes it easier for authors to carry on.

        The dangerous tendency is to assume that by adding in content management, some magic will “occur” that suddenly improves quality and shortens the cycle. I’ve seen lots of departments make this mistake — I think we agree on this point.

        And sadly, I really can’t agree that story comes first. I think it’s possible that we have lived so long with stories that it no longer matters which came first. But if you enter a new domain the first thing you have to do is collect data, and then see if you can piece together a story. If it’s scientific method, you piece together a hypothesis. You then design a method to collect more data (an experiment), and test that data against your story. It the story stands, you have a theory. But without any data (say, the positions of the planets), you can’t make the hypothesis. We lived for centuries with the story of a universe that revolves around our world. The data was always there — it was reliable. What was faulty was the story… And people died trying to make that point.

        And what is reliable data anyway? If data == recorded observation, then reliability is a function of our instrumentation (or senses), and our honesty in recording the observations. Talk to engineers and you’ll find it’s not easy to prove reliability of data — I’ll grant you that. But engineers are willing to settle on a reasonable tolerance and then get work done. I’ll go out on a limb and assert that it’s easier to prove reliability of data than to prove reliability of a story. If for no other reason, you have to prove reliability of data AND story in order to prove a story. I’m guessing that is one foundation of the scientific method.

        1. Mark Baker Post author

          Well, so much lies in “if done correctly” doesn’t it? My point is not that content management is bad — I trust I made that clear — my point is that there is a conflict between the writerly view of “done correctly” and the database view of “done correctly” that is very difficult to resolve.

          I think we must mean something different by “data”. Sense impressions are not data. Nor does the brain resolve sense impressions into data, but into stories. Thus when we see a mirage, the brain resolves the sense impression into a story it understands: water on the road ahead. It is only after repeatedly not finding water where we thought we saw it that we realize the need to reconcile conflicting stories and go looking for data on the refraction of light.

          Nor can I agree that hypotheses stare with a bunch of data points. They start with experiences, which we interpret to ourselves as stories. When ancient astronomers observed the heavens, they discovered a story: certain stars wander through the heavens rather than staying in one place. That observation is a story, not a data point. A hypothesis is a story that attempts to explain the stories that we tell ourselves about our experiences. Data is something we purify out of stories to test the hypothesis. Having observed that the planets are wandering stars, astronomers began to measure their wanderings to attempt to explain them.

          As for proving the reliability of data, you are quite correct that no proof can ever be absolute. We have to settle for reasonable tolerance and get work done. But what is the proof of the reliability of the data? It is a story. It is a story of how the data was obtained, under what circumstances, and by what method. Data with zero reliability is not data, it is noise. Data without a story is not data, therefore, but noise. Stories, in other words, create data.

          The analysis of data can indeed lead to new hypotheses, and therefore to the formation of new stories, iteratively building up a system of knowledge. But it is a mistake to interpret this iterative process are originating in data. As the problem of demonstrating the reliability of data demonstrates, there is no data without story. The iterative process of knowledge building, therefore, begins with stories, not data.

          1. cud

            I think we mean something different by “data”. Our stories don’t agree. 🙂

  2. Pingback: The Age of the Content Manager | M-learning, E-...

Leave a Reply