Findability continues to be the bete noire of technical communication. This may be a parallax error, but it seems that findability is more of a problem in technical communication than in other fields. The reason, I suspect, is that many technical documentation suites are too big to browse but too small to search.
I have commented before on the somewhat counter-intuitive phenomenon that on the Web it is easier to find a needle in a haystack (The Best Place to Find a Needle is a Haystack). This may be counterintuitive, but it is easy enough to explain: search (if it is any more sophisticated than simple string matching) is essentially a statistical analysis function. A search engine works by discovering a statistical correlation between a search string and a set of resources in its index.
All statistical calculations are dependent on the amount of data available. We are reminded of this every time we see a poll reported on the news. “The results are considered to be reliable plus or minus 5% 19 times out of 20.” The more people who are polled, the smaller the poll’s margin of error is considered to be. (Most of the informal polls that people post on the web or in forums have a margin of error the size of Texas.)
What Google is saying, when it presents you a page of search results, is: based on billions of documents, and billions of search strings, and what billions of people read after entering those search strings, these are the top ten pages that correlate most strongly to the search string you have entered. This works astonishingly well when you have billions of data points. It does not work at all well when you have a thousand pages and virtually no history of search strings or page selections. Intelligent search requires billions of data points to perform at its best. Individual help systems on desktop machines are too small to search.
People complain constantly that the search engine of their docs delivery platform of choice does not work well enough. Improved search performance is a perennial feature request made to vendors. But things don’t get better, and I suspect that there is very little more that can be done to improve the search function of most tools, given the two fundamental problem that limit what it can do:
- The average doc set simply isn’t big enough to provide enough data for the statistical correlations to be meaningful (even if you used this type of search engine, as opposed to a simple string matching engine).
- The typical documentation deployment is so small (often a single-user desktop) that the search engine does not see nearly enough query strings to profile them successfully, and so can’t correlate search strings to content with any accuracy.
There is only one way out of this problem: make the doc set part of a larger data set so that it can be profiled more accurately, and put the search portal somewhere where it will receive enough search strings so that they can be meaningfully profiled.
It a doc set is too small to search, what remains is browsing. But browsing starts to become very cumbersome when the content reaches a certain size. It may be feasible to browse an ordinary book, but then you have the problem that the user has to browse multiple books, and that is an inconvenience that the Google generation does not have the patience to tolerate. They want all the information in one place.
So, you put all the information together in one place. There are then thousands of pages of content in a single container, and the only way to make it feasible to browse that much content is to arrange it somehow, and the usual method chosen is to make it into a hierarchy.
I think we have lost site of the fact that a book is not hierarchical by nature. The structured text movement has taught us to use hierarchy to encode books for processing, and from that we have seemed to made the leap to thinking that they read hierarchically. They don’t. There is little more frustrating that having to constantly be going up, down, back, or sideways, while trying to read. If there is anything to the theory that links are a threshold event that triggers the reader to forget what they have just read (as discussed in Are We Causing Readers to Forget?) then reading hierarchically is also going to seriously impair retention.
So, hierarchy is not something that is natural to an information set, but something we impose on it in an attempt to make a larger information set browsable. But the problem with hierarchies is that they impose artificial subordination of concepts. When you choose the groupings at each level of a hierarchy, you make an assumption about how the reader will narrow down their query in their own minds. The more items at each level, and the deeper the levels become, the worse the problem becomes, and you inevitably end up hiding information from the reader who has a different idea of what the primary concepts are, as well as sundering information that is closely related, but in a way that is low on the list of groupings you have chosen.
The artificial imposition of hierarchy on the reading experience, combined with the maze like browsing experience that is created by a large and complex hierarchy, is what turns so many doc sets into Frankenbooks.
People continue to agonize over how to improve the findability of these Frankenbook help systems, but I think the fundamental and unavoidable truth is that Frankenbooks are too big to browse; too small to search.
The way out of this dilemma:
- Create Every Page is Page One topics that address a single user need in a single linear topic.
- Richly link those topics so that people can browse and surf the locale around those topics (that is not too big to browse).
- Put those topics on the web where they will become part of a content set that is large enough to search effectively.
- Where long narrative exposition is genuinely needed, write a conventional book.