The War Between Content Management and Hypertext

By | 2015/06/15

Summary: As content consumers, we love hypertext. As content creators, we still believe in content management, even after years of disappointment. Content management disappoints because it does not scale for culture. It is time to embrace hypertext instead.

I should know better. Every time I put the word “hypertext” in the title of a post, my readership numbers plummet. Hopefully “content management” will help pull them up this time, because as content professionals we need to come to terms with hypertext.

Here’s the thing: we are locked in a war between hypertext and content management. We’re losing, because we are on both sides.

As consumers of information, we are firmly on the side of hypertext. We use the Web to find stuff. We use search. We follow links. We skip from one resource to another unconscious of institutional boundaries. We share, we Tweet, we like.

As creators of information, however, we are firmly on the side of content management. Actually, we are firmly on the side of the idea of content management. We hate all the actual implementations. Everyone hates their CMS. Most people are planning to fix or replace their CMS with a new system that they will rapidly grow to hate in its turn.

Despite all the bad experiences we have with content management, we still seem to believe that the only way forward is more content management, and content management on an ever larger scale. We may consume content bottom-up, but we are determined to keep creating and managing it top-down.

The problem of scale

I had an interesting conversation with Michael Priestley (one of the inventors of DITA) following his presentation to the Toronto STC. We were talking about maps, and whether it is possible to do DITA without them. It is, in fact. As Michael pointed out, maps were not part of the original DITA spec. And then Michael said, “But I still prefer maps, because bottom-up does not scale.”

For some time now, I have been saying just the opposite: that the problem with top-down approaches is that they do not scale, and that beyond a certain point, bottom-up is the only approach that does scale. I very commonly cite Wikipedia as an example of an information set that is huge, easy to navigate, and has virtually no top-down structure. So Michael’s statement was a marked point of contrast for me. He embraces maps for the exact reason that I reject them: scale.

The difference between us, I suspect, is more about what processes we are trying to scale. I take it that Michael was talking about content management (I’m looking forward to follow-up conversations to validate this), and if so, I agree. Bottom-up content management does not scale. I’m talking about hypertext. Bottom-up hypertext does scale.

Why you hate your CMS

The problem is, as I see it, that top-down content management does not scale either. Which is why you hate your CMS. More specifically, it is why you loved it in the pilot project when things were small and simple, and hate it now that it is in production and things are big and messy.

A serendipitous piece of hypertext brings a perfect example to my inbox even as I was writing the paragraph above, in the form of Larry Kunz latest blog post, which is on the Library of Congress. (Yes, email subscriptions to blog feeds are hypertext, and bottom up.) The Library of Congress, you would think, should be good at content management. Not so much, it seems, with vast parts of the collection sitting uncatalogued in warehouses. The obvious slant on this it to call for more use of technology, and to a certain extent, that might help. But the story that Larry cites also mentions a string of technology failures, and “string of technology failures” is the sort of phrase that a good CCMS would definitely flag for reuse in stories about content management issues. Maybe the technology problems lie not in the machines but in the approach.

Larry says that the Library of Congress “houses more knowledge than any other institution in the world.” That depends, I suppose, on how you define “knowledge” and “institution”. In fact, as Leslie Johnston points out in a post on the Library of Congress’s own blog,  “library of congress’ has become something of a unit of measure for very large information sets — in the sense of how many times they are larger than it. (Data is not the same thing as knowledge, of course.)

I found Johnston’s  article on the Web, using the search string “size of the library of congress vs size of the web” in Google. It took me less than a minute to formulate that query and get an answer. Any guesses how long it would have taken to find the same information in the Library of Congress?

By Karora (Own work) [Public domain], via Wikimedia Commons

By Karora (Own work) [Public domain], via Wikimedia Commons

I’m not sure if we should count the Web as an institution, but its collection absolutely dwarfs the Library of Congress. And yet, you can find stuff on the Web. You can often find stuff on the Web with absolutely ridiculous speed and ease.

And yet, somehow, we still believe that the answer is content management. Why?

Perhaps the answer lies in that old bugaboo, information overload. The problem with hypertext, in many ways, is that it works too well. It delivers far too much information far too quickly. It is like turning on a drinking fountain and getting hit with a firehose.

Drinking from the firehose

In the paper days, the drinking fountain produced a nice trickle of water. The problem was, it could take days, weeks, or months to find the right drinking fountain.

Today the drinking fountain is always immediately available, but it delivers the pressure and volume of a fire hose.

What we hope for is a ubiquitous drinking fountain that delivers just a drinkable stream of the finest purest water. We hope that content management can deliver it.

Stuck at the beginning

Michael was talking about just this in his presentation to STC Toronto: enabling content flow, and specifically content reuse, across the enterprise, with particular reference to IBM, where he works. Someone asked him where IBM was in implementing this. His answer, essentially: at the beginning.

For the last 20 years, at least, we have known how to create large content sets around large products and systems using structured writing techniques and content management (though widespread adoption has only come in the last 5 years or so). For all that time, we have seen horizontal extension of these techniques as the next logical and necessary step.

And where are we, after 20 years? Where is IBM, one of the largest companies in the World that has been a leader in the practical implementation of these ideas, and which developed and donated to the world the most commonly used tool (DITA)? At the beginning — forming a committee, trying to get departments to commit. And from what I have seen and heard, that is where most other companies are too.

Scaling for diversity

Why? Because content management does not scale. While it can scale, to a fair extent, for mere size of the data set, providing it is highly consistent in structure and vocabulary, it does not scale for diversity of content, subject matter, structure, or vocabulary.

Indeed, content management is the declared enemy of diversity, preferring to preach standardization and uniformity — necessarily so, for without these things it can’t function.

We have been trying to standardize terminology within and between industries for 20 years or more. Where are we? With a few notable and specialized vertical exceptions, at the beginning. Why? Because the things that people talk about across the many departments of an enterprise, and many walks of life are highly particular to the local concerns of those domains. There are no universal categories of thought to which we can assign universal vocabulary. It is stories, not words, that have meaning, and everyone tells their own stories in their own ways.

The many meanings of “content management”

Take, for example, the words “content management” as I have used them in this post. You have doubtless been asking since almost the beginning how content management is in conflict with hypertext. (And thank you for indulging me thus far without a satisfactory answer to this obvious question.) And, of course, any time you create a piece of content, or a set of content, and do anything at all to manage it, you are doing content management, and this would apply to the creation of hypertext as much as anything else.

But “content management” today means something much more specific than this. It means a specific approach to managing content that is based, fundamentally, on top-down control and hierarchical organization. It is the imposition of traditional management styles and traditional paper-derived forms of content organization on the production of content, even when that content is intended solely for the Web.

The broader meaning of “hypertext”

My contention here is that this approach does not scale, and also that hypertext is not merely a word describing text with links, but a management style and organizing principle that is very different from this conventional content management model. This does not, of course, mean that there are no systems that support this model. In many ways, wikis are just that. But wikis are seldom if ever called content management systems, despite being used to manage content.

And this subtlety of usage is one of the reasons that content management does not scale well for diversity.

Diversity and the role of stories in communication

The word “content management” as used today, encapsulates a story about top-down management and top-down organization of content. But while we have spent the last 20 years, and more, seeking top-down ways to allow people to tell, organize, discover and understand each other’s stories, while not advancing beyond the beginning of the quest, the world has quietly gone and found a different solution — one that is simple, inexpensive, and available to everyone: hypertext.

Yes, the Web tends to act like a fire hose when we would prefer a drinking fountain. But content management makes it so hard to find the drinking fountain, and so hard to operate it when we think we might have found it, that most of us have learned that it is easier to fill a cup from the fire hose and drink from that. As David Weinberger says, and I frequently quote: “Include it all, filter it afterward.”

Hypertext works because it works with, rather than against, the immense diversity of stories, of interests, of vocabularies that are the real stuff of how we communicate. Content management tries to squeeze it all down and make it fit a limited set of categories and structures. Hypertext takes it all in and provides multiple filters, both social and algorithmic, to help us sort it and make sense of it.

And this is the vital point here. Making sense of all this diversity is hard, and always will be hard. The world is vast and complex and we all have our specialties, which shape not only our vocabulary but our categories of thought, the way we see the world and think about it. Understanding other points of view in other fields is genuinely hard. It takes time, discipline, experience, and patience. Communication is genuinely hard. A taxonomy or a cataloging scheme  is not going to fix this. You may need to drink from the firehose to get a real understanding of someone else’s world.

The problem isn’t culture, but it is

For as long as we have been stuck at the beginning of implementing cross-functional content management, we have been saying that the problem is not technical, but cultural. And while we have invented a bunch more technology over those 20 or more years, we are still stuck on the culture part. It is time to turn a more skeptical eye on that part of the content management proposition.

In saying that the problem is cultural, what we tend to mean is that the machine works fine; the problem is getting people to use it. I think that if you have been trying for 20 years to get people to use it, there may be something wrong other than their reluctance to change. (People have made some really big changes in how they use machines over the last 20 years — many of them related in one way or another to hypertext.)

But yes, the problem in content management is cultural. But it is a far deeper cultural problem than any amount of “change management” can address. It is the diversity of culture, and the diversity of how people in different fields and with different background think, write, and share information — ways that are optimized for the roles they play in life, and are therefore not open to tinkering for the sake of making content management technology work. Thus any actual system that they are asked to use ends up at odds with how they work, think, and communicate. And they hate it.

Breaking silos doesn’t bridge cultures

We talk about breaking down silos, as if everyone would be able to transparently talk to and understand each other if only we pushed all their desks together in one big room. But that doesn’t work. It doesn’t work because the differences between us are deep, based on years of focus on different domains of experience. And they are necessary, because those different domains of experience are based on real jobs that need doing, and the language that we develop that enables us to do them. Diversity of language may be a barrier to communication, but it is an absolute necessity for us all getting our respective work done. (Diversity of experience, not language, being the real barrier.)

We should be focusing, therefore, not on breaking down walls but on building bridges.

Bridges built with content, not content management

But if you want to build bridges between diverse fields and departments, you do it with content, not content mangement. And the best way to build those bridges is often to let the people who wish to cross build them for themselves. And this is what hypertext does: it lets people build bridges organically, and it builds up and strengthens those bridges that more people cross.

Not back to chaos; forward to hypertext

Am I saying, go back to the status quo before you bought your CMS? Not at all. A bunch of word docs on a server is not a hypertext. Unstructured hypertexts like the Web provide tremendous value, but even there, the content that works is hypertext content, not a pile of documents. But I am not advocating an unstructured approach to hypertext. I am advocating a highly structured and disciplined approach. The alternative to content management is not chaos, but disciplined hypertext.

If we want to enable people to build the bridges that they themselves want to cross, we need to give them the tools and training to do so: not content management tools and training, but hypertext tools and training.

We have assumed that structure has to mean top-down, that it has to mean content management. We’ve tried it. We hate it. We are stuck at the beginning, still blaming “culture” for the inability to move forward. It is time to take a structured approach based on hypertext. Because, you know, it actually works.

26 thoughts on “The War Between Content Management and Hypertext

  1. Barry Schaeffer

    Good stuff! I would suggest that much of the problem we have always faced with “Content Management” and its attendant systems is the fact that virtually all of the systems started with a particular tool set and technology approach in mind then worked backward to the content to be managed. Having worked with a number of firms that did this, all household names more or less, I know it happens although no one ever talks about it. What we end up with is a software environment looking for a problem… and shoehorning any problem it finds into its mold (sales after all must make next quarter’s numbers.)

    Hypertext, because it is based on tool sets and not whole systems (although some vendors would have you believe otherwise) so one may begin with the content problem and work toward a solution.

    Actually, I agree with Mark that CM and Hypertext shouldn’t be adversarial but should be two aspects of the same set of challenges. Another aspect that we should force the vendor community to acknowledge is the fact that content challenges are first and foremost functional, with architecture and technology acting to support the desired functions. I wrote a piece in 2001 detailing my view of this situation: amazingly it’s still live and may add a few crumbs to Mark’s powerful message:

    http://www.ecmconnection.com/doc/navigating-the-content-management-jungle-a-su-0002

    Reply
  2. Mark Baker Post author

    Thanks for the comment, Barry.

    I have long said that most of the tools and the systems out there were built from the publishing problem back, rather than the authoring problem forward, with the result that they make authors jump through hoops to serve the logic of the publishing system. I still think that is valid, but your formulation may be closer to the mark. In some cases, the point of focus is not publishing but some other systems function, such a reuse.

    But whatever the case, the architecture tends to be built around enabling system functions rather than enabling authoring and natural content relationships. This is why we end up with markup and/or interfaces full of systems stuff that have nothing to do with the content and its relationships, and everything to do with the system and its operations.

    This means that even when the systems support things like linking, they are architected around the idea of standardizing links between system resources rather than modeling the way stories relate to other stories in the real world.

    Reply
  3. Larry Kunz

    Hi, Mark. I can’t gainsay anything you said about CMSs — including (especially) the part about how people hate them. But is it really just hypertext that allows the web to work from the bottom up? Perhaps this is just a matter of semantics: I’m not sure I grasp what you’re getting at when you describe a “broader meaning” of hypertext.

    You used a Google search to find Leslie Johnston’s article. That suggests to me that another means of cataloging and managing content – specifically, metadata – came into play. To me, that’s neither CMS nor hypertext; it’s a third thing. You say “a taxonomy or a cataloging scheme is not going to fix this.” Not by itself, no. But taxonomies – especially those that are thoughtfully designed so that they can scale – help bring order from chaos, enabling hypertext to do its magic.

    I look forward to hearing your recommendations for a structured and disciplined form of hypertext.

    Reply
    1. Mark Baker Post author

      Thanks for the comment, Larry.

      Yes, I am defining hypertext broadly, to mean non-linear relationships between text, however created. This implies that connections created by other are as much part of a pages participation in hypertext as the links it creates itself. But this was always the case, really, since a pages participation in hypertext has always included the pages that linked to it — pages that that page, and its author, might have no knowledge of.

      In other words, hypertext has never been an artifact created by a single mind. It is, in its very nature, a social construct, meaning that you can participate in hypertext merely by making yourself available to be found and linked to. Which means, in turn, that search is an integral part of hypertext. Hypertext requires discovery of things to link to, and search is an engine of discovery.

      However, search does not have to rely on metadata, and in fact usually does not, except in the sense that the search engine’s own index constitutes a derived catalog and derived metadata.

      Re: “taxonomies – especially those that are thoughtfully designed so that they can scale – help bring order from chaos, enabling hypertext to do its magic”.

      Here’s the problem with this: the limit on the scale of taxonomies is not the thoughtfulness with which they are constructed, but the scope within which words actually have a common meaning across communities. The issue is not order from chaos, it is uniformity from diversity. Diversity may look like chaos to the zealous taxonomer who does not understand how different domains communicate internally, but destroying the diversity between domains actually means crippling the ability of domains to communicate effectively inside the domain. And this is a big part of why people hate CMS, and why it is hard to get them to sign on to cross domain content management initiatives.

      The magic of hypertext is that it does not rely on consistent taxonomies to do its work. It does not require it because of the inherently distributed nature of hypertext creation which I mentioned above. Hypertext can reach across the boundaries of domains and across the variability of domain taxonomies — not to make understanding simple, because that is not possible between divers domains, but to make it easier.

      There is still a role for taxonomy driven hypertext, of course. That is why I am such a big promoter of soft linking. It is why I make a distinction between structured and unstructured hypertexts. But I also make the point that unstructured hypertext is actually the primary, and very necessary form of hypertext, and that the reason for creating structured hypertext (with taxonomy driven links) is to participate more effectively in larger unstructured hypertext systems.

      But even with soft linking, I am careful to create markup that expresses the namespace of the term that is being annotated, because a taxonomy cannot grow bigger than the domain in which its meaning is commonly understood. Useful structure is local. Hypertext bridges the locales in ways structure cannot.

      Reply
  4. Michael Priestley

    I’m afraid you misunderstood my talk, Mark.

    You write:
    “And where are we, after 20 years? Where is IBM, one of the largest companies in the World that has been a leader in the practical implementation of these ideas, and which developed and donated to the world the most commonly used tool (DITA)? At the beginning — forming a committee, trying to get departments to commit.”

    After 15 years (counting from the formation of the workgroup within IBM that led to the development of DITA), we are much further ahead than that. We have DITA broadly adopted – for product documentation – across hardware, software, and solutions for the vast majority of IBM products. I’m talking millions of managed topics in a CMS, tens of millions of published pages with reused content.

    We are at the beginning of enterprise content management across not just products but also disciplines or content domains – such as marketing, training, support, services, and expert content such as developerWorks or Redbooks. Most of those teams aren’t in DITA yet – but they’re all looking at it, and at various stages of investigation including pilot projects with executive support.

    I think you’ve also misunderstood my opinions about maps – I explicitly said that Knowledge Center is actually a meet-in-the-middle scenario, with many hand-authored maps collected by common metadata into a higher-level navigation scheme. So it’s a combination of hand-authored organization – which remains the most reliable, in my opinion, when sequence matters – with automatic aggregation and navigation at the product level, based on a common product taxonomy that is used across all of IBM (not just in Knowledge Center).

    I hope this helps clarify.

    Reply
    1. Mark Baker Post author

      Thanks for commenting Michael,

      It is always a high wire act to blog about a brief conversation, especially when you have other conversations immediately afterwards. Both memory and understanding have to be suspect, so I hoped you would comment to clarify if I got any of it egregiously wrong.

      But let me clarify in turn. I was not meaning to imply that IBM is at the beginning when it comes to domain vertical structured writing and content management. I regret if I did not make that clear. I was saying that it was at the beginning when it comes to cross domain and cross functional content management, ” across not just products but also disciplines or content domains” as you say.

      My point is that domain vertical structured writing and content management has been a solved problem for more than 20 years. There were already many successful large scale projects working well before DITA was conceived. I can think of John McFadden’s concept of “microdocument architecture”, for instance, which OmniMark Technologies implemented for several clients, which was structured text in a database with component based reuse.

      And even back then we were talking about how to extend these techniques horizontally.That is what I mean by saying that after 20 years we are still at the beginning. I was not meaning to imply that DITA, within the space in which it has been deployed successfully, was at the beginning. The comment was not about DITA at all, but about cross domain content management.

      What DITA has done, and has done well, it to provide a mechanism for implementing the component content management model. For that, it deserves much credit. But my argument is with the extensibility of that model across domains, not with the technology chosen to implement it.

      And as far as our conversation goes (and I hope we get to continue it) my point is that my objections to DITA are not about wanting a different tool to do CCMS with, but wanting a different model altogether, and wanting tools optimized for that model. Without establishing that, we risk wasting time discussing things at the wrong level.

      Thanks also for clarifying your point about maps — though it does not answer my objection to them. Clearly you can use algorithms to compile lower level maps into higher level ones. But while I do regard the hand-assembly of maps as a problem, for a number of reasons, it is not the root of my objection. The root of my objection it to an external hierarchical approach to navigation, as opposed to an internal hypertext approach. Generated maps are still maps.

      You say hand assembly of maps is important “when sequence matters”, to which I would reply that sequence should never matter between topics. (This is the first principle of Every Page is Page One.) If sequence matters (in the literary sense) then that material should be one topic. (Sequence of operation is another matter, which might be addressed by workflow topics, for instance.) The rationale here is simple, if people arrive at a page via search of a link, and that page is in the middle of a sequence, either they have to figure that out and get to the beginning of the sequence (which is a pain) or they don’t realize, and potentially act incorrectly because the information they have is incomplete.

      I believe that I remember you saying that you did not expect that people would actually use the Knowledge Center TOC for navigation, and that I asked you what the point of having was then, and that we were interrupted at that point and I did not get to hear or explore your answer. I how we can pick up at that point sometime soon.

      Reply
  5. Michael Priestley

    Thanks for the clarification. That helps. I do think we are at the beginning with IBM of full-scale enterprise content management – on the other hand, I think we wouldn’t be where we are without continual growth and outreach. We began coordinating with support, for example, on a coordinated content lifecycle with tech docs years ago. We are always at the beginning of something new, and the something new is always larger than the last thing.

    So I may have given the impression that it’s been 15 years of one thing followed by a new initiative for the first time – but in fact this is the blossoming of an effort that’s been years in the making. And what’s made it possible is in fact shifts in culture. There are many cultures involved here, not just one. And I agree that this implies a need for many tools and environments, not just one. As I said in my talk, you can’t solve the silo problem by building a bigger silo. This is a CMS mistake, and I agree it’s a mistake.

    But the alternative is not, in my opinion, simply hypertext. It’s distributed, and integrated, content management: across silos, across channels, across organizations.

    Moving on to the question of maps: my years as an information architect are long behind me, but even then we knew that search was the primary (but not only) access mechanism, and that every topic had to stand alone. But we also knew that our job was to bridge the user’s understanding of the problem with the product’s understanding. Typically we would perform HTA (hierarchical task analysis), a common UX practice to map user tasks to system tasks. These user tasks (typically at a higher level) then linked down to the requisite system tasks.

    When the user searched on a user task, they’d get a hit – on a page that summarized the system tasks they would need to perform to achieve their goal, in the sequence they would need to achieve it. Different user tasks might share the same system tasks across sequences, and there might also be different views for different types of user.

    Our task analysis didn’t stop with product ship, but continued with analysis of search results (when available – not everything was web published back then) so we could identify gaps, and work them into the architecture. The TOC was not just a navigation artifact – it was a representation of the user’s task model as it mapped to the product’s domain model. If someone arrived at a lower-level product task via search, presumably it was what they were looking for – and they had everything they needed on one page. But if someone searched for a higher-level user task, they would get a result for that too – and yes it would link to other pages, but supported by a synchronizing TOC and next/previous links that would guide them through the sequence.

    I’m not saying we always achieved greatness with this technique, but we sometimes approached it 🙂 One of the products I worked with moved to full task orientation at the same time they moved to DITA (the CTR content types were designed to support a task oriented approach). They saw dramatic improvements in customer satisfaction with the docs in just one release (from fairly negative to ecstatic).

    With respect to the Knowledge Center navigation: I said I expected very few people to start with the KC homepage and drill down the product taxonomy to get to a page. It was much more likely that they would search for the product, or for a subject within a product collection. Where the TOC does become useful is once someone is within a product collection, and can then browse to see what content is available, and expose and navigate the task model to see how it maps to the product capabilities, or browse the list of APIs, etc.

    If you know what you’re looking for, search is always the best answer. But if you don’t then some form of TOC or sitemap (or mini sitemap) is absolutely still relevant and necessary.

    I think having a TOC for all of Wikipedia would be crazy. However, having mini-TOCs for specific parts of Wikipedia makes lots of sense, which is maybe why people are doing it. (I can’t remember what they’re called now, but it’s a type of page in wikipedia that does nothing but list and organize links to other pages – in basically a TOC).

    Knowledge Center is not one collection, but a collection of collections. Navigation at the highest level – above the product – is not where people spend their time. It’s inside the product that they want, and need, a TOC.

    We have had products that experimented with other models – like a wikipedia model, driven by search and metadata. They had to take it down and rewrite the wiki to add TOCs. If you look at common search terms for IBM products, they often include “infocenter” or “knowledge center” – meaning that people are searching for the collection, intending to browse or use it as a whole – instead of just googling a specific search term.

    Search is great – but it’s only half the story.

    Reply
    1. Mark Baker Post author

      Thanks for continuing the conversation, Michael.

      Several interesting points to chew on here (and I may miss a few and come back to them later):

      “We are always at the beginning of something new, and the something new is always larger than the last thing.”

      Indeed, and that is why I raise the issue of scale. Because it is very natural and appropriate to take a model that works at one scale and try to see if it can be made to work at the next scale up. But for many models, there is a point at which you can’t scale it up successfully anymore.

      In fact, there are two points, I think. One is the point where you can scale it, but a different model would be more efficient. I believe we have passed the point in scaling up the top-down content management model past the point where hypertext would be more efficient.

      The second point is the one where it cannot be scaled up any further due to the inherent limits of the method. I believe that significant cross-domain content management it that breaking point, and that we have been stumbling at that breaking point for some time, while hypertext has roared on to ever greater success. But you are going to attempt to prove me wrong, and I’m sure I can’t dissuade you from the effort.

      It bears saying, though, that institutions tend to suffer from model myopia. They become the preeminent experts in one model, have the best and most complete toolsets for implementing that model, and they become blind to alternative models, which are often in a far more crude state of development. Hypertext is eating content management’s lunch, but content management has a much shiner lunchbox.

      “The TOC was not just a navigation artifact – it was a representation of the user’s task model as it mapped to the product’s domain model.”

      This is one of my most fundamental quarrels with the content management approach. Something as important as the user’s task model and it mapping to the product domain model is far too important to be relegated to the TOC. It is, in many ways, the most important piece of content there is, and takes far more explanation and illustration than a TOC can provide.

      It also has a highly complex set of relationship to related subjects and therefore to related content. It is exactly these complex set of relationships that a TOC cannot model well, and hypertext models brilliantly.

      The point about hypertext is that its texts are not leaves on a tree; they are nodes in a network. The don’t express individual ideas as endpoints; they relate ideas to each other in complex ways that reflect the complexity of the real world.

      And this, is seems to me, is precisely where top down information design fails most conspicuously. It does not have a way to express pages as nodes, and so it is forced to represent complex relationships through tables of contents that that have neither the structure nor the substance to do the job properly.

      Nobody reads a table of contents to discover the mapping between their task domain and the product domain. We need to stop sending a TOC to do a page’s job.

      However, having mini-TOCs for specific parts of Wikipedia makes lots of sense, which is maybe why people are doing it.

      I agree absolutely. I written often about the importance of lists in hypertext, and I frequently point to Wikipedia as an example for their use and importance. But I would point out that such lists are not tables of content in the strict sense. That is, they are not bound to a container and the do not exhaustively list the content of any container. They are lists of topics on a particular subject. They are subject bound, not container bound, and they can, and sometimes do, list topics in other collections on the current subject.

      Search is great – but it’s only half the story.

      Agreed. The question is, what is the other half? To which I respond that the other half is linking. From any given page, you are likely to want to navigate along lines of subject affinity — to pages on related subjects. Many such pages will not be nearby in any one ToC. Every one of them can be nearby through a link.

      We see this model very clearly at work in Wikipedia, but also in Amazon, YouTube, and even Facebook.

      This comes back to the idea that in a hypertext, every page is navigational. You don’t need a separate navigational superstructure, because every page provides navigation. That is the fundamental difference between the content management approach and the hypertext approach. Content management sees back boxes to be labeled and managed and separates navigation from consumption. Hypertext sees objects that naturally connect to other objects and form (and reform) networks as those objects themselves change. Navigation is an inherent part of consumption.

      And this, fundamentally, is why the hypertext model scales in a way that the content management model does not. Hypertext is self organizing. That self organizing property can be driven by structure and algorithms, by social sharing, or by algorithms that attempt to comprehend unstructured text, but because all of these self organizing operations can integrate with each other without top down control, we get a system that scales in a way top-down management and control cannot.

      Reply
      1. Michael Priestley

        I think maybe I need to start by explaining where I sit in this argument: I do not think DITA and content management are synonyms.

        DITA is a markup model and reuse architecture, and is as much a hypertext systems as HTML is (perhaps moreso, given its richer and more extensible linking semantics).

        When you say content management doesn’t scale, maybe that’s true – while we do have several million topics in FileNet, we probably have just as many in other repositories across IBM. And the net result – of many separate content management efforts, following a common standard, and reusing across both team boundaries and system boundaries – is 20+ million pages of coordinated published content.

        So – it clearly scales. But that doesn’t mean I necessarily disagree with the point that content management doesn’t scale. One of the key slides of my presentation made the point that you can’t solve the silo problem by building a bigger silo.

        You need coordination across silos.You need content that can flow across silo boundaries – hence the title of my talk “Let Your Content Flow”.

        You make a number of arguments against using the TOC to provide the task model.

        One of the key ones seems to be that linking is a better way to express and navigate a task model than a TOC. I actually don’t disagree with this, but I also don’t think it’s either-or. In fact with DITA maps we typically generate the TOC and the links from the same model.

        There’s an ancient paper I wrote back in 2003 describing the process I followed in creating the first DITA users guide (parts of it still in use within IBM, for better or worse). http://xml.coverpages.org/PriestleyACMSIGDOC-2003-DITA.pdf

        We even had a tool called Task Modeler, originally created specifically for hierarchical task analysis, that became a reasonably robust and usable DITA map editor (later rebranded as the Information Architecture Workbench, still available on developerworks I believe).

        You write:

        “But I would point out that such lists are not tables of content in the strict sense. That is, they are not bound to a container and the do not exhaustively list the content of any container. They are lists of topics on a particular subject. They are subject bound, not container bound, and they can, and sometimes do, list topics in other collections on the current subject.”

        OK, so evidently DITA maps aren’t TOCs in the strict sense. And TOCs generated from DITA maps aren’t either. All the statements you make about Wikipedia lists apply to DITA maps as well.

        You write:
        “And this, fundamentally, is why the hypertext model scales in a way that the content management model does not. Hypertext is self organizing. That self organizing property can be driven by structure and algorithms, by social sharing, or by algorithms that attempt to comprehend unstructured text, but because all of these self organizing operations can integrate with each other without top down control, we get a system that scales in a way top-down management and control cannot.”

        Given that DITA is a hypertext system too, I think really you’re using hypertext as a shorthand for loosely bound linking. And I’ve already explained that we are getting to scales like Knowledge Center’s 20 million topics not with top-down management and control, but with product-level control that is then automatically linked in based on (map-controlled) metadata.

        The interesting is that there’s nothing in DITA’s architecture that requires linking to be tightly bound. In fact one of the major use cases going forward with DITA at IBM with marketing is about loosely bound linking. But there’s room, and reason, for both.

        For example, if I’m creating a case study, I may want to quickly reuse certain aspects – like feature descriptions, company descriptions, etc. But I don’t necessarily want those links to be loosely bound – I may want to be informed, as the owner of a case study, when a new feature description becomes available, but I don’t want the link to automatically refresh and fetch the content for me without intervention. Because the new description might not serve the purposes of the case study. In fact, it might be describing it in ways that don’t apply for historical reasons to the context of the case study.

        On the other hand, if I’m constructing a reading list for someone interested in a product, then I want to be able to get the latest and best content for them – but I want that decision to be based on a lot more than just subject and metadata. I want it to be based on measurements of the content component’s performance in similar situations (was it read to the end? was it shared? did it help lead to a purchase?) and measurements of the reader’s past behaviors and interests.

        My point is that DITA can play in either of these situations. I’m not arguing content management vs hypertext. I’m arguing in favour of both.

        Just because we are sharing content across content management system boundaries doesn’t mean we no longer need content management. And just because we can automatically suggest links based on subject and metadata matches doesn’t mean we no longer need human oversight and authorial control, in at least some contexts (and warranted product documentation, with all its legal liabilities, may be one of those contexts).

        Reply
        1. Mark Baker Post author

          Michael, I have thought several times about how to respond to this, and the more I try the more it seems that we are fundamentally at cross purposes and largely talking past each other.

          Putting millions of topics in a repository has nothing to do with making content management scale in the sense I am talking about. Its not about how many bits you can store and retrieve, but about how you communicate across boundaries of interest, experience, and culture.

          DITA is not remotely a hypertext system in the sense I am talking about; it is a publishing system that can generate hypertexts. But hypertext is more than the even the sum of individual published outputs. It is the capacity of hypertexts to merge into each other which is at the heart of the scalability of hypertext. The Web works because it is one hypertext, not a collection of many. It works because central coordination is not required to merge hypertexts.

          And yes, DITA’s architecture is indeed extremely flexible and general, which is, in my view, one of its defects, because it makes it harder to understand, and harder to use for each of the specific purposes that it is used for. The DITA that most people use is really a product built on DITA by various vendors.

          But DITA, as you say, is not a content management system. I did not mean to imply that it was. But where I see people using DITA is as a format to make the component content management model work. Outside a few people with markup engineering background such as you and myself, it is pretty clear that this is how DITA is seen by the world.

          And my critique is of the component content management model, and of the content management model more generally, and one the hierarchical and top down approach to both content organization and content development that seems inherent in those models. DITA certainly enters the picture in this argument, because it is a key enabling technology of the CCMS model.

          But you don’t have to convince me that DITA can be used to build other models. I am aware of the generality of the model. I have looked at it as a candidate for the kind of system that I want to build, and have found it over-engineered and over-generalized for my purposes. But that is an argument of an entirely different kind for an entirely different place. I wouldn’t mind having that argument — indeed, I would find it useful and stimulating — but this post is not the place for it.

          One further thought that is perhaps on topic for this post: you say that we need to make content flow. I find that the wrong metaphor. We need to make information flow, to make understanding flow, and to make coordination of action flow between cultures and functions. In that role, content is the bridge that readers can cross between cultures. Bridges don’t flow; they carry the flow.

          Reply
          1. Michael Priestley

            As you say, I think we’re still talking past each other.

            I am also struggling with where to start my response. I obviously am tempted to respond to your general criticisms of DITA, but as you say this isn’t the place.

            I think I’ll start here:

            You wrote:
            “DITA is not remotely a hypertext system in the sense I am talking about; it is a publishing system that can generate hypertexts. But hypertext is more than the even the sum of individual published outputs. It is the capacity of hypertexts to merge into each other which is at the heart of the scalability of hypertext. The Web works because it is one hypertext, not a collection of many. It works because central coordination is not required to merge hypertexts.”

            In fact the web is a collection of many hypertexts. They link to each other, but that’s a far cry from merging. The ibm.com website is quite measurably distinct from, say, wikipedia.org, and the IBM Knowledge Center is a distinct part of ibm.com. And within Knowledge Center, there are many distinct product collections, all sharing the same interface and capabilities but with distinct organization and linking structures.

            DITA allows the management of hypertext collections at varying sizes. They can link to each other, or nest to include each other. They can be built out to different page models, different deliverable models, and different linking models.

            It is obviously not a hypertext system in the way you are talking about it, but is an architecture for managing hypertext relationships in a scalable, author-friendly way.

            DITA’s map model doesn’t derive from CMS component models, nor from document models exploded from books. It derives in fact from practical experience managing the websites and hypertext relationships for a number of IBM products. When we rolled it out to the writers across IBM, many of the authors who were already doing information architecture had an immediate recognition of structures such as relationship tables – they had been documenting the relationships among their HTML pages meticulously using spreadsheets. The idea that they could start by planning what they wanted, and then generate the links to match the model, was welcomed. It was a major timesaver, and allowed the linking architecture to be continually maintained and evolved to match the content, instead of being a throwaway design artifact that was only refreshed once every few releases.

            But leaving aside the question of how or whether DITA does hypertext, I want to address your understanding of my phrase (the title of my talk) – let your content flow.

            You write:
            “you say that we need to make content flow. I find that the wrong metaphor. We need to make information flow, to make understanding flow, and to make coordination of action flow between cultures and functions. In that role, content is the bridge that readers can cross between cultures. Bridges don’t flow; they carry the flow.”

            When I say “let your content flow”, I mean you need to be able to move authoring control of content from one place to another. That’s not about a flow of understanding – it’s about a flow of content as an authorable, manageable unit from one systtem to another. It could be from a CMS to a file system, or from an LCMS to a wiki. The point is that control of the content needs to change hands as it evolves within or across organizations, and in order for it to change hands it must change systems. This is where DITA plays a vital role in IBM’s content strategy, as an open standard for modular content that separates reusable chunks from collection contexts. It allows for content coming from one part of the organization to be consumed by another.

            And this is where the question of cross-cultural reuse comes in – if DITA wasn’t a standard, it wouldn’t be neutral ground. And if it wasn’t flexible enough to support the different requirements of training,support, product documentation, sales, and marketing teams, then it wouldn’t pass the test of actual adoption.

            That’s the grand exercise we’re attempting within IBM – all of these teams are already producing web content, and some of it even links to other content. But it’s not coordinated, and content can’t flow to where it’s needed, so we have perverse incentives to rewrite or create duplicate content simply because it’s easier than reusing. These aren’t problems hypertext can solve on its own. But the separation of concerns that DITA brings to the problem – content from context, chunk from collection, specialization from specialization, and most importantly content from system – is a vital part of our evolving approach to the problem.

          2. Mark Baker Post author

            Ah, now this is helpful in understanding why we are talking past each other.

            You say:

            “In fact the web is a collection of many hypertexts. They link to each other, but that’s a far cry from merging. The ibm.com website is quite measurably distinct from, say, wikipedia.org, and the IBM Knowledge Center is a distinct part of ibm.com. And within Knowledge Center, there are many distinct product collections, all sharing the same interface and capabilities but with distinct organization and linking structures.”

            And that is the publisher’s eye view of it. The top-down view. The corporate view. It is the books on glass model, where you are still producing, and you imagine that readers are still consuming, individual information products. In this view, the Web is just a delivery medium, like a library of a bookstore.

            What this entire blog is about, is the bottom up view, the reader’s view, the Web view. In this view, the reader’s normal entry point to content is in the form of a listing in a Google search, which produces an dynamic semantic cluster of individual pages, usually from different sites — thus “Every Page is Page One” because any page can be the entry point to your content.

            Equally important, hypertext is an environment, a field. It is not just an internal feature of your “book”. The reader gets to your content through a Google search, which you did not create and do not own. That is hypertext. Or they arrive via a link in a tweet or a forum post, neither of which you created. That is hypertext.

            This is the essence of the hypertext revolution. The organization of content belongs to readers, not to publishers. As the Cluetrain manifesto succinctly puts it, “Hypertext subverts hierarchy.” This is also one of they key reasons why hypertext can scale to manage far larger content sets than hierarchy or content management can.

            This, I feel certain, is the ground on which our views diverge, and given this point of divergence, we are going to talk at cross purposes whenever we debate things higher up the chain.

            And that is why we disagree on content flow. You say:

            “When I say “let your content flow”, I mean you need to be able to move authoring control of content from one place to another. That’s not about a flow of understanding – it’s about a flow of content as an authorable, manageable unit from one systtem to another. It could be from a CMS to a file system, or from an LCMS to a wiki.”

            You are taking the top down view, of a collection of independent books in which some of those books may want to include the same pages. So you want content to flow from one author to another as they create independent “books”.

            In a hypertext environment, this makes no sense. Publishing the same page in multiple collections is pointless. Every Page is Page One, so there is no point in putting the same page in more than one place. Link, don’t duplicate.

            Of course, there is still some room for reuse technologies, for two reasons:

            • There is a corporate interest in creating content in separate channels, in an attempt to control what the customer receives. The existence of the hypertext field, which you don’t control, make this much more difficult to pull off, but there is still a lot of demand for it, and you can still succeed to some extent.
            • There are other uses for the set of technologies that we call “reuse” that could also be called “variance” — things that create variation of content to produce genuinely different pages.

            But the desirability of these things does nothing to make the content management model scale across cultures.

            You say:

            “And this is where the question of cross-cultural reuse comes in – if DITA wasn’t a standard, it wouldn’t be neutral ground.”

            And that is exactly why the content management model does not scale across cultures. Because there is no neutral ground. The problem with standards often is just this: the failure to distinguish between neutral ground and common ground. Common ground is meaningful to all; neutral ground to none.

            You can always create an artificial neutral ground, but it will not carry the burden of communication. You can only find common ground where it exits, but when you find it, it will carry communication. The only way to bridge between cultures is to expand the common ground. The common ground between cultures is common stories, so you make common ground by telling stories.

            Which stories you have to tell, and how you have to tell them, is a matter for exploration and experiment. This is where the hypertext field shines. This is why hypertext, rather than content management is the key to bridging these gaps.

            But hypertext is democratic. Hypertext subverts hierarchy and robs writers and publishers of their power to control the message. Which is why, since we are both readers and writers, there is a war between content management and hypertext, and we are on both sides.

          3. Michael Priestley

            I wrote:
            “In fact the web is a collection of many hypertexts. They link to each other, but that’s a far cry from merging. The ibm.com website is quite measurably distinct from, say, wikipedia.org, and the IBM Knowledge Center is a distinct part of ibm.com”

            You respond:
            “And that is the publisher’s eye view of it. The top-down view. The corporate view. It is the books on glass model, where you are still producing, and you imagine that readers are still consuming, individual information products. In this view, the Web is just a delivery medium, like a library of a bookstore.”

            And then you go on to say a lot more about books, but I want to start by questioning your assumption. Is the existence of multiple websites on the web really just an artifact of the book publishing model? Does the publisher of wikipedia.com really suffer from a legacy book publishing mindset? Is a website really a book, if it has any distinct identity?

            I don’t think you mean that, I hope you don’t mean that.

            So if you start by going through your whole post and replacing “book” with “website” does it change the meaning for you? Because it sure does for me.

            Website design is absolutely a hybrid of top-down and bottom-up. And if you think that the role of information architect, a la http://www.amazon.ca/Information-Architecture-World-Wide-Web/dp/0596527349 is really nothing more than legacy book-authoring-thinking than we really do fundamentally disagree.

            On reuse, you say:
            “In a hypertext environment, this makes no sense. Publishing the same page in multiple collections is pointless. Every Page is Page One, so there is no point in putting the same page in more than one place. Link, don’t duplicate.”

            If it’s exactly the same page, that’s true. But if you’re bringing together a new combination of components for a customer in one country or industry that has key differences compared to the same combination for some other country or industry, then it’s not the same page.

            You go on to say: “There is a corporate interest in creating content in separate channels, in an attempt to control what the customer receives. The existence of the hypertext field, which you don’t control, make this much more difficult to pull off, but there is still a lot of demand for it, and you can still succeed to some extent.”

            IBM’s attempt to publish content that helps their customers be productive is not some crazy bid for totalitarian control. Our success metrics are customer success. They say they want better, more tightly integrated, high-quality content. Every strategy we have is driven by customer requirements, and a vision that we have validated with customers.

            The fact that customers link to our web pages does not undermine us, and is in fact something we value as a metric.

            You write:
            “And that is exactly why the content management model does not scale across cultures. Because there is no neutral ground. The problem with standards often is just this: the failure to distinguish between neutral ground and common ground. Common ground is meaningful to all; neutral ground to none.”

            The common ground – as I said in my presentation – is a near-identical set of requirements for managing structured modular content across multiple collections with taxonomy-controlled classification. The neutral ground is what privileges DITA as a solution compared to the other ones in use.

            For example, we have training, support, documentation, marketing, and sales teams, all producing content. Only product documentation is currently using DITA. But marketing, sales, and training all independently discovered it and are interested in adoption. The requirements are common ground. But we are not going to jam all the content from every part of IBM into a single silo like an LCMS or a WCMS.

            So instead we push for those silos to support import/export to/from DITA, so that content can flow across the organizational boundaries. And we also push for runtime reuse engines that control the reusable components of the web experience to act as a standards-based broker for content, using a hybrid of DITA and JSON.

            The fundamental question though comes down to this:

            Do you admit the existence on the web of content collections, and accompanying experiences, above the level of a page, that are not books or attempts to replicate a book?

          4. Mark Baker Post author

            Oh dear, we’re back to DITA again. I’m not going there.

            I put “book” in quotes specifically to suggest that the traditional understanding of book does not apply. I was referring specifically to thinking of a “website”, if you like, as something existing and consumed in isolation. But “website” really does not work either, because “website” is a really fuzzy concept. What is the unity that define the limits of a “website”?

            Fundamentally, the web is not made up of sites, but of pages. There are collections of pages that are more tightly coupled to each other than to other pages, but to uniformly call such collections “websites” is confusing to say the least.

            Do I admit the existence of collections above the page level — that is, collections of pages — on the Web? Of course I do. That is what they entire blog is about. But my point is that in a hypertext world, what binds a collection of pages to each other is not membership in the same collective, nor listing in the same TOC or taxonomy, but the ways in which they connect to each other.

            The structured of a hypertext is created by the connections between hypertext pages. As such, which there certainly are collections (and bottom-up information architecture is all about creating such collections). But those collections exist within a hypertext field that also associates those pages with other collections, sometimes statically, sometimes dynamically, and often in ways that have nothing to do with how the author conceived of the collection they were creating. Thus a page in a bottom-up information architecture should both associated itself with the other pages in its collection, but should also be able to function when pulled into other collections by other people.

            The collaborative process of building hypertexts is, of course, a kind of content management, but a democratic and decentralized form of content management significantly different from content management as it is commonly practiced today. (Then again, content management practice is highly varied today; one can find examples of several different models and hybrids between them, which is entirely as it should be at this stage of development.)

            Hypertext can also seem disconcertingly messy. But it works. And as soon as you place your content in a hypertext field, it gets incorporated into it in this way, whether you planned it be experienced as a collection or not.

            Do people experience the collection as a collection? Sometimes, perhaps, but far more often they experience the collection they create by the paths they choose through a hypertext field, a fact which is true whether they stay within the bounds of your collection or not (something they are often unaware of).

            We communicate in stories, and each story we hear leads to the next story that matters to us. The collection of stories that meets our needs is not formed in advance by an author; it is formed as we go, by ourselves as readers, assisted, often, by the signs left by other readers along the road,or by the active an immediate contribution of others through social media.

            In terms of information foraging theory, we can create rich information patches that are easier to hunt. But that in no way confines the reader to our patches, nor does it assure they will prefer every morsel in our patch to the stuff in the patch next to it.

            None of this, of course, has anything to do with the internal publishing dynamics of a publishing oriented content management system. You can move chunks of markup around as much as you like on as large a scale as you like. How it impinges on such systems is that at a certain point, writers trying to find and use content in such as system have to go beyond their own knowledge and their own culture, and then they have to behave like readers. And then the question becomes not what scales for the publishing algorithm, but what scales for the exploring reader.

          5. Michael Priestley

            You write:
            “Oh dear, we’re back to DITA again. I’m not going there.”

            Mark, you KEEP going there. It’s why we’re talking. You have a set of opinions about DITA, and you express them, frequently. I’m trying to correct some of the assumptions you seem to have made about its purpose and history.

            One of the key points seems to be that you think DITA was made to manage books. I know that it was made to manage collections of content with multiple outputs, including websites, or collections of pages within a site, or even books (but not primarily books, not books first or only).

            But based on your last post, it seems like our disagreement is even more fundamental than that, and we may have finally reached the point where I can just point at another (non-DITA) resource and say, take your arguments there.

            Because if you think websites are irrelevant, and not worth designing, and that only pages matter, then I am very happy to punt to Rosenfeld an Morville, from their preface to “Information Architecture for the WWW”:

            “Thinking in terms of web pages or home pages too easily limits your field of vision to the trees and not the forest…. So from here on, think in terms of [web] sites first and foremost.”

            This is from the 1998 first edition, so pretty oldschool. I note that the most recent edition still has chapters on organization systems and navigation systems.

            Now, DITA can absolutely be used in ways that don’t mesh with information architecture. But historically, they’ve been very closely linked at IBM, including special-purpose tools like the Information Architect Workbench, as I’ve mentioned previously in this thread.

            So I think at this point, you’re not really arguing with DITA, or with the book model. You’re arguing with Rosenfend and Morville, and I’m just a middleman.

            So if you want to continue the argument about what makes for a usable web experience, I’ll suggest you start with a review of their book. I’m happy to cite them as one of the influences on DITA’s design, and even more closely on the influence of DITA’s usage within IBM.

          6. Mark Baker Post author

            Ah, finally, I have managed, however clumsily, to convey my point. Yes, I disagree with Morville. And I could quote you all kinds of sources on the decline of the significance of the home page over the years. 1998 was a long time ago, and the hypertext field — the Google algorithm, social media, mobile, people’s trend toward search orientation, have left us with a very different Web today.

            You overstate my views on sites, despite the clarification I attempted to give in my last answer. (The issue here is whether “site” imples “structured collection” or “common domain name”.) But yes, I regard the page as primary. Thus the title of this blog.

            But the page is primary not simply because people are only interested in pages, but because pages are the connecting tissue of a hypertext. Links don’t exist apart from pages, but within them. Numerous efforts have been made to reverse this (relationship tables being one of them) but the model persists. Pages are not endpoints in a hypertext, pages are the network. The organization of a hypertext, therefore, is the organization created by its pages.

            Organization is very necessary, but it is created by pages, not above them. I would refer you on this to my earlier post on structured vs unstructured hypertexts. /2015/06/09/structured-vs-unstructured-hypertexts/

            “I’m happy to cite them as one of the influences on DITA’s design, and even more closely on the influence of DITA’s usage within IBM.”

            Yes, I get that. That is certainly one of my points about DITA, that its design and its usage favor this model. But this post is not and never was about DITA per se. DITA comes up only because it is a common instrument for implementing this model.

            There is a fascinating post by Gerry McGovern today, about moving from a focus on creating to a focus on connection. http://www.gerrymcgovern.com/new-thinking/moving-world-producing-world-connecting This seems very apropos to my theme of the war between content management and hypertext. Building systems to move content around and produce more of it at less cost seems to me out of touch with the spirit of the age. We should be looking at ways to reduce the amount of content we create and better connect the content we have.

          7. Michael Priestley

            You write:
            “I could quote you all kinds of sources on the decline of the significance of the home page over the years. 1998 was a long time ago, ”

            The quote from 1998 was actually about the decline of significance of the homepage, so I guess some things stay the same 🙂

            I did refer you to the latest edition, but I think we still come back to the question of whether you can design a content experience – at the site level, or at the collection level within a site – using a variety of navigation schemes, or whether you should stick to search to find the first page, and links thereafter.

            Since you’re citing McGovern, I’ll return the favour:

            http://www.gerrymcgovern.com/new-thinking/navigation-and-search-are-twins

            So, I continue to believe that structure and navigation matter, above the page level.

            And what I find interesting is that the same belief is common across the different disciplines and cultures that I’m communicating with. So your belief that an overarching structure is the enemy of collaboration and communication is actually the reverse in my experience: the common belief in the need for shared structures is actually the motive for collaboration.

            Every page is of course page one. But it’s also page two, and page three, etc. The content journey doesn’t stop with landfall, that’s just where it gets interesting.

          8. Mark Baker Post author

            “your belief that an overarching structure is the enemy of collaboration and communication is actually the reverse in my experience”

            “And what I find interesting is that the same belief is common across the different disciplines and cultures that I’m communicating with. So your belief that an overarching structure is the enemy of collaboration and communication is actually the reverse in my experience: the common belief in the need for shared structures is actually the motive for collaboration.”

            It’s been awhile now, be if you go back to the beginning of this post, you will notice that I began by saying that this belief is common across the content producing and managing communities. So its existence is hardly a counter-argument.

            My contention is not that top-down structure is the enemy of collaboration and communication (I sidestep “overarching” because I’m not sure if that should be understood as synonymous with top-down), but that top-down structure simply does not scale well. Hypertext facilitates a very different approach to collaboration and communication, one that happens to scale better (though not without issues of its own). The opposition is between these two approaches, not between either approach and their common goals. It is important to avoid the trap of supposing the if someone opposed a method they also oppose its goals. More commonly, they prefer a different method for achieving the same goal.

            As readers, we use hypertext to work around the failure of top-down organization to scale. As writers, we keep trying to make top-down work, keep pouring more hopes and more resources into it. Thus we are at war with ourselves. Which is pretty much why you and I are having this argument.

            And the point of “Every Page is Page One” is not that people only read one page, but that there progress through a collection of pages is not from a page that the author designated page one to a page the author designated page two. The second page that the reader consults is not, therefore, page two relative to the the first page they consulted.

            Put it this way: if I visit the houses of three friends, it does not follow that the first friend lives in a house numbered 1, the second in a house numbered 2, and the third in a house numbered 3. The sequence of my visits is unrelated to any sequential relationship between the houses. Each house is built to be a house in its own right, with its own front door.

          9. Barry Schaeffer

            I wonder if the proper term might be “consensus.” Surely if there is no agreement on how data should be structured, there will be little ability to collaborate or communicate freely (like everyone speaking a different language with no guaranteed ability to translate among them.)

            However, consensus is achieved in my experience, through a mix of leadership and participation. I do not accept the early Web 2.0 philosophy that everyone is his own leader and the results just emerge from the chaos. But achieving consensus, while requiring some leadership, is an informal negotiation among the players, looking for what the delivery side needs balanced with what the provider and manager sides can reasonably be expected to provide. The results of that negotiation might be defined as a working consensus on which the environment may be built.

          10. Michael Priestley

            “top-down structure simply does not scale well.”

            And as I’ve said elsewhere, it doesn’t have to. It should, however, scale as far as it is useful to. The fact that you cannot organize every page on the web into a single hierarchy doesn’t mean you shouldn’t organize pages into hierarchies (or groups, or other patterned relationships) when it benefits the user.

            “The sequence of my visits is unrelated to any sequential relationship between the houses. Each house is built to be a house in its own right, with its own front door.”

            But it sure does make it easier to find those houses if there are street names and numbers. The houses ARE in a sequence, and that sequence is useful to the visitor, even if it does not determine the browsing/visiting order.

            That said, I’m not advocating universal sequences. I am advocating browse sequences, where for example (as I said a million characters ago) you can map a series of low-level system tasks to a higher-level user task, giving context to the system task and a search landing page for the user task.

            I think we’re talking in circles.

            The main lesson I’m taking away from this is that your objection to the use of DITA lies not only in its capabilities for directed reuse but also in its capabilities for organization and imposed relationship structure. That’s honestly not something I anticipated going into this, so I have learned something.

            I don’t know if there’s anything you’re taking away from this, but if there’s one thing I hope it will be a recognition that DITA is not a book-bound paradigm – in fact it was developed for web content first, although with single-sourcing a close second. Its success in book-based publishing is a testament to the underlying principles of topic orientation and information typing, which do of course pre-date the web.

          11. Michael Priestley

            Barry, I absolutely agree.

            At IBM we’ve recently formed an Enterprise Content Strategy Council, with membership representing marketing, sales, services, product docs, training, support, and expert content organizations.

            I’ve got existing relationships with many of the groups, based on past conversations, proofs of concepts, and pilot engagements. We even had a previous council, focused more narrowly on the post-sales customer content experience. But we’ve never had this wide a group driving this urgently towards a common goal before.

            Like you said, it’s a combination of leadership and participation. And I’d add that there are different leaders for different subjects – we’ve got a diverse enough group that we all have something to learn from each other.

  6. Leigh White

    Hi Mark,

    After an admittedly brief read, the thought I come away with is that you are absolutely right that content management should not be about controlling the ways in which people access information. There is a lot of focus on that–how do we build the “perfect” help system or the “perfect” knowledge base. The fact is, each person will look for answers in the way that makes the most sense to him or her and sometimes that way is simply calling Support first to talk to a real person! A system that allows people to build their own connections between information is really the right approach and hypertext is one representation of such a system. But I think you miss the larger point of content management which is ensuring that the information that’s available to people is accurate and complete, not outdated and contradictory. The Web is a perfect example of unmanaged content. Yes, it’s very searchable and the amount of information is almost infinite, but…how many dead links are out there? How many pages put together in 1996 by a 15 year old cataloging his CD collection? How many fried rats, doctored photographs, rumors, crazed political rants from folks living in bunkers with 10 years’ worth of canned food? This is the firehose you speak of and it is the inevitable result of uncurated content, which is fine for a public forum like the Web, but not the kind of thing we want our product documentation or marketing material to degenerate into. That’s why we manage our content–trying to organize it and present it in the most useful or logical way is really a sidebar (IMO) to our real purpose, which is keeping it accurate, clean, complete, and free of fried rats.

    Leigh

    Reply
  7. Mark Baker Post author

    Thanks for the comment, Leigh.

    It’s not so much that I miss the larger point about content management as that I gloss over it in my attempt to contrast the overall approaches.

    In this piece, I make “content management” stand for the top-down approach, which it largely does today. I make “hypertext” stand for the bottom-up approach. But, of course, a structured and disciplined hypertext approach which set out to “ensur[e] that the information that’s available to people is accurate and complete, not outdated and contradictory”, would also be content management in the more general sense of that phrase.

    The problems I see today, and which I am complaining of, are:

    • the idea that hypertext approaches are inherently unmanaged and unstructured, and that hypertexts themselves are inherently unstructured
    • the idea that unstructured hypertext does not work

    In my last post, I made the distinction between structured hypertexts and unstructured hypertext, and while I defended unstructured hypertext (seriously, when was the last time a Google search led you to a page put together in 1996 by a 15 year old cataloging his CD collection?), I also made the case that we need to create structured hypertexts that integrate seamlessly into the hypertext world the reader is living in, but provide a superior experience in our local subject area.

    But my point about top-down content management is that it does not actually do a very good job of ensuring that the information that’s available to people is accurate and complete, not outdated and contradictory, and that its attempt to do so creates cumbersome barriers for content contributors to surmount — more reasons people hate their CMS.

    My contention is that a managed, structured hypertext approach would do a better job of providing accurate, complete, up to date, consistent content with much less difficulty for authors.

    Hypertext, both structured and unstructured, works. It does, however, involve some different ideas about organization and management that do not necessarily come easily to traditional corporations, though some companies are definitely developing cultures in which hypertext would be a more natural approach.

    Reply
  8. Barry Saiff

    A very interesting discussion. Thanks to Mark and everyone else who commented. I’d like to try to make the issues here a bit more concrete. I see two issues (and I may be missing others).
    1) Content development: While the intention of a CCMS and its users may be to create a richly hyperlinked set of content that can be navigated by users without understanding the hierarchy, for the authors that is not the case. To find information you must know the hierarchy, and as it changes you must keep up. It’s an odd mismatch between the intended user experience and the authoring experience.
    2) Linking: Perhaps I’m missing the boat, but it seems like the crux of the “content management vs. hypertext” dichotomy is the richness of linking. If authors only put in the links that make sense based on the hierarchy of topics, users get a more limited experience. If authors create a rich set of links in each topic that follow more than the most predictable routes, then users get a better experience.
    I’m thinking that 2) must not be that simple – I’m guessing there is a lot more that you were trying to point to, Mark. Perhaps you can clarify that.

    Reply
    1. Mark Baker Post author

      Thanks for the comment, Barry.

      On your first point, I agree, One of the interesting issues here is that writers are often looking for “content”, while readers are looking for information. That is, writers are looking for artifacts, whose existence is assumed and whose location should be predictable. Readers are looking for information which may be contained in content, but the reader generally has no idea about what content units are supposed to be available or where. This difference between locating artifacts and locating information is one of the traps of the content management approach. Writer’s thinking in terms of artifacts can easily come up with systems that feel very satisfying to them while not being particularly helpful to readers.

      On linking, yes, richness of linking is key to hypertext, but I would make two additional points.

      The content management approach tends to an organization and navigation schema that is external to the content. Hypertext has no external navigation structured. Organization and navigation are created by topics linking to other topics, creating a network of connections based on subject affinities. It is important to understand the linking approach in this light.

      It is not just a matter of creating links. It profoundly affects how you write the topics. A topic should be written to be encountered in any order in the reader’s progress through the content set (Every Page is Page One). A topic is not an end point of an external navigation schema, but a node in a network. It should be written as such.

      Reply
  9. Diana Chapman

    I haven’t read through all the discussion points yet but I had to stop to comment that I am so psyched I found this. (Also I have to stop to do some work.) The timing is perfect for me because I have a bit of a blank slate as far as content development goes with decisions to be made about CCMS, CMS, eliminating (or bridging) silos, taxonomy – the whole enchilada. I’ve never been in this position before where I get to decide. And this discussion is a huge help.

    Reply

Leave a Reply