What is the right word to describe a node of a hypertext?
What should we call the basic unit of information that we present to readers? Is it a page, a topic, or an article? (I’m going to take it as read that the answer is no longer “a book”. If you disagree, that’s what the comments are for.)
I raise this now because of Tom Johnson’s latest blog post, DITA’s output does not require separation of tasks from concepts in which he makes the distinction between topics as building blocks and articles as finished output:
One reason so many people mistake the architecture of the source files with the architecture of the output files is because the term “topic” tends to get used for both situations. I prefer to call the output files “articles” rather than topics. An article might consist of several topics. Each of those topics might be of several different types: concept, task, or reference.
Both in my book and in this blog I have made the case for the word “topic” as the unit of output, and have criticized DITA for muddying the waters in just the way that Tom describes. During the development of the book, Tom was good enough to serve as a technical reviewer, and one of the questions he asked was, why don’t I just use the word article rather than topic? It’s a fair question, and I want to try to address it.
Certainly, Tom is not the only one to choose the word article for use in this context. In the book I quote Scott Nesbitt on the subject of Google’s article-based approach to documenting Chrome:
One of the first things that I noticed was the way in which the documentation was described. Help articles. Yes, articles and not documentation or user manual or online help. That’s a very subtle (or maybe not) distinction. But it’s a distinction that can be psychologically powerful.
I also make extensive references to Wikipedia in the book, and I use the word article frequently to describe Wikipedia entries, which I often use as examples of good Every Page is Page One topics.
So why not just adopt the word “article” and be done with it?
Or, for that matter, since my book title and my catch phrase is “Every Page is Page One”, why not “page”?
Why do I keep insisting on using “topic” as the right word to describe a unit of output for the reader?
If you are thinking “sheer cussedness” or “because you are a crank”, okay. I can live with that. I am a crank. Sometimes, at least. But whether the point is worth the struggle, there is at least a point here, and it has to do with the nature of hypertext.
First, why not “page”? The phrase “Every Page is Page One” was coined to describe a break with the past. Pages, at least in the paper sense, were inherently linear things, related to each other in linear ways. And in common website designs, also, pages were designed with hierarchical relationships: there are home pages and “inside” pages, as if the site were a magazine.
What “Every Page is Page One” proclaims is that this idea of linear and/or hierarchical relationship of pages in which there is a particular page one, the head of the linear or hierarchical organization, is meaningless in the age of the Web where people navigate by search or links and can land on any page regardless of its place in the sequence or hierarchy.
“Page” has too much of that sense of linearity or hierarchy about it for my ear. I feel the need for a word without those associations: thus “topic” and my frequent use of the phrase “Every Page is Page One topic”.
“Article” certainly has the sense of independence that I am looking for. Articles are independent items that (except by an accident of print technology, when printed in journals) have no linear relationship to other articles.
The problem, to my ear at least, is that “article” has too great a sense of independence. It implies a work that is entirely independent, without relationships to other works at all. When we use the words “collection of articles”, the implication is not that the articles were written to work together, but that they were written entirely independently and later collected by another hand.
That is not what we are talking about when we discuss something like a documentation set. There may indeed be articles written about a technology by many independent authors, but when we talk about “the documentation” we are talking about something more planned and organized than that.
The point of “Every Page is Page One” is not that the items of content have no relationship to each other, but that their relationships are neither linear nor hierarchical. They are organized bottom up, not top down. Every item is equally a place to start, and every item is a hub which connects to other items along lines of subject affinity. In short, they are a hypertext.
“Page” implies the wrong sort of relationships. “Article” implies too little relationship. Topic seems like the closest word for the thing I mean. I just can’t get comfortable with another.
And I think it is important to highlight this aspect of hypertext, the organization of topics in a hypertext, and the writing of hypertext topics. I am often accused of being too optimistic about search. The truth is quite different. I am not at all optimistic about the ability of search alone to deliver readers the content they need.
Unlike many others, however, I don’t believe that the solution to the inadequacies of search lies in creating other forms of organization or information retrieval. For reasons I explore in detail in the book, I believe search is here to stay as the dominant method by which people seek information.
Read the book for the full argument, but the essence is captured in a phrase coined by David Weinberger: people now prefer to “include it all and filter it afterwards“. The old model of information seeking was this: first find an authority, then ask your question. The new model is: first ask your question, then verify the authority of the answer. People search because they want to consult multiple source with a single query. No navigation scheme that any one of those sources can come up with can do much to change that.
Readers are search dominant, and becoming more so. That is a fact from which we cannot escape. The right question to be asking is not “what can we offer them instead of search” but “what can we offer them after they have found us by search”.
This is why it matters that we write in an Every Page is Page One style, so that the page they land on when they search works for them. But because search is not always particularly accurate, it matters what happens when they are not on the right page, and what happens when they are on the right page, but they don’t have the right prerequisites.
Pages and articles will be found by search just as well as EPPO topics. But if what the reader lands on is simply a page, or simply an article? If they are on the wrong article or don’t have the prerequisites to understand the current page, they have little recourse but to search again. If they have landed on a hypertext topic, however, they have other options. If the topic is a good hypertext, if it is a hub of its locale in subject space, full of rich links to topics on associated subjects, then the reader can proceed to find what they need following accurate links, rather than submitting themselves to the vagaries of search once again.
Links and search are the bones and sinews of hypertext. Links are a hypertext tool for authors. Search is a hypertext tool for readers. Search alone is not enough. We need to create effective hypertexts that can deliver the reader the last mile to the content they really need.
So, the word for the unit we need to create (to the ear of this crank at least) is not page (too linear) nor article (too independent) but topic. But language is a democracy, and mine is just one vote. The comment box is below. Have at it.