The Long Tail and Why Docs are Frustrating

By | 2012/10/29

It is often a matter of some perplexity to technical writers that more and more people seem to prefer searching the Web rather than looking for information in the documentation. It is perplexing because information found through a Web search is of variable quality, sometimes hard to navigate, lacking in authority, and has to be picked out of a big pile of fluff.

Why would people prefer to search the sprawling mess that is the Web when they could look in the neat, authoritative, well organized documentation set? Shouldn’t they, at least, look in the docs first before turning to the Web?

Long Tail Distribution

Picture by Hay Kranen / PD

The reason that people increasingly prefer to Google may be found in the notion of The Long Tail. The long tail is a statistical distribution in which an unusually high number of items appear far from the center of the distribution. In other words, rather than the traditional bell curve, you get a distribution that looks more like an L shape, with as many items away from the center of the distribution as there are in the center.

Part of why docs often fail to provide answers is doubtless that the people who write the docs often don’t write about the right things, either because they are obsessed with novices at the expense of mainstream users, or they simply don’t have the background or the opportunity to write effectively about the right things.

These problems are fixable. They are not easy to fix, by any means, but we can certainly put together a plan for fixing them, if we have the knowledge, the will, and the means. But if user information needs follow a long tail distribution, meaning that everybody want something from a vast collection of information items each of which is seldom referenced, then there is no way we are ever going to be able to meet that need in docs. No one item is the long tail is going to be important enough to justify the resources it would take to create it, even if half or more of our user’s information needs fall into the long tail.

The idea of the long tail is that there are all kinds of products that a few people want, or problems that a few people have, which, taken together, add up to just as large a market as the few thing that many people want. The many things wanted by a few, in other words, weight the same as the few things wanted by the many.

Look at it this way: if Dave wants to buy bread and sardines, June wants to buy bread and Gorgonzola, and Pete wants to buy bread and apricot jam, bread may be the product in high demand, but none of them will want to shop at a store that stocks only bread.

Even if we identify and provide all the high demand content, it is likely that each individual user will want low demand content half the time. And if they go to a source that provides only high demand content, they are going to be disappointed half the time.

Do information needs follow the long tail distribution? Actually, it was in the field of information delivery that the commercial importance of the long tail came to light. Studies showed that Amazon was realizing an increasing volume of sales from obscure books, and that as access to the long tail gets easier, demand in the long tail picks up. So, it seems more than likely that demand for technical information follows a long tail distribution.

Do docs actually disappoint half the time? Personally, it feels like more than half the time. After all, you would have to document all the high demand material perfectly just to cover half the demand. And I suspect that the percentage feels even worse than it is. Not finding something takes way longer than finding it — you have to exhaust many possibilities before you give up and conclude that the information just isn’t there. Even if you succeed half the time, it will feel like you fail most of the time because most of your search time will be devoted to searches that fail.

It is not really surprising, then, that people often feel disappointed by docs. They have to be optimally researched and written just to be useful half the time. And since the resources are seldom available to research and write them optimally, chances are that even the best docs you could create with the time and resources available will disappoint readers more than half the time, and that it is going to feel to them like they disappoint even more often than that.

Little wonder then that people often prefer to Google rather than reading the docs. We abandon the dry well even if the reliable well is deeper. The Web may not cover the high-demand content well, but it tends to cover the long tail fairly thoroughly. Most people buy their bread from the grocery store rather than the bakery because they can pick up their Sardines, Gorgonzola, and Apricot Jam at the same time.

The solution? Get your content onto the web. You can provide high quality high demand content on the Web, and, if it is good content, Google will find it and bring it to the top when your users search for it. But Google will also provide content from the long tail, the vast array of low-demand content you could never possibly hope to provide, but which, taken together, supplies half your user’s information needs. To work well on the web, and as the target of a Google search, your content should work well as page one for every reader.

Every Page is Page One topics on the web. It’s how you stop disappointing your users.

22 thoughts on “The Long Tail and Why Docs are Frustrating

  1. Alex Knappe

    I would like to agree with you on this one Mark, but I simply cannot.
    If the users (or the non-users) of documentation would work this way, all you said would be perfectly correct. But the base assumption, that people use Google for their search on certain topics, works only for a few users at all. “Google-Fu” still is an art unknown to most of the people.
    I guess you can divide the user base into four categories:

    1. The interested old fashioned ones (primarily reading printed docs)
    2. The “Google-Fu” artists (primarily searching Google)
    3. The elevator music lovers (primarily waiting in some hotline queue)
    4. The Web 2.0 roamers (primarily asking the same question in about every known forum they know of)

    Actually, documentation only reaches the first two types directly, ever.
    The latter two types get their information either by mavens or by support staff.
    So, if we take a closer look on the first two types, you’ll see that we serve two different needs of documentation.
    The first one is the printed documentation with all its standard or non-standard approaches.
    The second one is the web based documentation, which you describe in your article.
    But don’t even try to write for the other two types. They simply will never look into any documentation presented to them.

    1. Mark Baker Post author

      I don’t know, Alex. In 2011, people made 4.7 billion searches per day, up from 3.6, 2.6, 1.7, and 1.2 billion in the preceding years. ( That’s an awful lot if people are not very good at it — and are therefore disappointed by it. I think it is easy to underestimate just how good people’s search-fu is becoming.

      Also, as the generations turn over, we have to ask how people’s online search-fu stacks up against their paper search-fu. Searching Google inefficiently may still be a lot less frustrating than searching books inefficiently.

      In any case, the mere fact that the Web reveals how long the long tail or information needs is shows how disappointing looking stuff up in docs can be, no matter whether people are turning to search or to social means of support as an alternative.

      I definitely agree, though, that many people will not read at all, either on paper or on the web. Your four categories could be further distilled down to two: those who solve by reading, and those who solve by asking.

      As you point out, the web provides a new venue for both solve-by-reading and solve-by-asking, so even if people don’t search, the Web has other means to deflect people from docs.

  2. Gordon

    Interesting. I partly agree but already there is an issue with the sheer number of terms returned by Google (I agree with Alex on this, most people aren’t that good at “Googling”).

    So putting your content on the web is a tiny part of the answer, the content needs to be part of a sensible structure that reveals itself accordingly. If someone using your content lands on a topic and can’t start to build a mental model after a few clicks, then yes, the docs have failed.

    Personally, I think this is a lot more about ‘know your user’ rather than anything else.

    1. Mark Baker Post author

      Gordon, agreed — know your user is fundamental, and, today at least, your audience is more likely to Google first if you are documenting a new programming language than if you are documenting crochet tools.

      But I think the “sheer number” argument against Google has been shown to be hollow. As David Weinberger notes, the evidence from use is that people prefer to start with abundance and filter afterwards. In the end, 1 in 1000 is better odds than 0 in 10.

  3. Kok Hong

    It’s interesting that you’ve made the connection between technical communication and the long tail. (And I’m glad I’m not alone in my thoughts!) And yes, your conclusion is the right one–the problem with much enterprise content strategy is that they keep it all locked up behind their firewall. Opening up content to the web is a good idea because it not only makes it more accessible, it also exposes weaknesses in content (this is where analytics come in) and allows authors to refine their work.

    One point you didn’t say explicitly in the article: it’s as much a problem of information architecture as much as one of content quality. I’m not a technical writer by any definition, but I do know that I hate all Intranets that I’ve ever had to use. And half the problem is finding the content.

    So this is my crazy idea (and no I’m not a Google employee): optimise our technical content for Google, and install Google Enterprise Search!

    1. Mark Baker Post author

      Thanks for the comment. Yes, putting content on the web has the additional benefit of generating better analytics, thought in this regard I do worry about a naive interpretation of the analytics. What the long tail teaches us is that just because something is used seldom does not mean that it does not play an important role in overall customer success and satisfaction. If the long tail contains half the weight of information demand, then it is just as important as the high demand content, and docking the long tail in your own content could be a big mistake.

      I definitely agree on information architecture. So much of current practice seems to be an attempt to apply paper-world devices to the navigation of digital content, and most of those devices do not fit the media and do not scale to the volume of online content. Trying to make a website work more like a book is a big mistake.

      And while I do have a bit of conflict of interest, since I have done work for the Enterprise Search division, I agree with installing an enterprise-class search engine as part of an enterprise information architecture. It is not the whole answer, but it is a foundational component.

  4. Lief Erickson

    I’ve been fascinated by the idea of implementing faceted search into my documentation. I’m just not sure how I would go about doing that beyond the top level. I think faceted search would go a long way towards reducing the length of the tail, and it would get users to the content they want more quickly or it would be more readily apparent to them that the content they seek is, in fact, not in my help set. They may not be happy with that, but at least they would have reached that conclusion quicker, thereby reducing at least part of their frustration.

    1. Mark Baker Post author

      Lief, thanks for the comment. The problem with faceted search in general is that it only works for clearly faceted subject matter. Used car sites such as make great use of faceted search because there are very definite and well known facets to a car: body style, tansmission type, fuel type, number of seats, color, etc. Not only are these facets clear and well defined, they exist in the user’s mind prior to their coming to the site to search. The faceted search therefore helps the users find things according to the search strategy they already had in mind.

      To make an effective faceted search for content, you would need the same conditions: the facets should be clear and well defined, and they should correspond to the facets that are already present in the user’s mind when they look for help on something.

      Alas, neither of these conditions is met in every content set, and the result is that faceted search for content is often implemented using facets that are dreamed up by the writer or information architect not because they are natural or obvious, but because the author or IA has been told to come up with facets. These facets are often not a good fit for the content itself, and virtually never represent how the reader thinks about their problem. The result is that they do more to hide content than to reveal it.

      I agree that any system that helps the reader conclude quickly that the information they seek does not exist would reduce frustration, but I don’t think, except in exceptional circumstances, that a faceted search is the way to do it.

      Also, the larger problem is that people’s information seeking habits are formed by the totality of their information seeking experience, not by the performance of any one help system, which tends, sooner or later, to lead them to Google as their default.

      The best way to reduce their frustration in finding, therefore, is probably to make the content accessible to Google. The best way to reduce their frustration with the content once found is to write good Every Page is Page One topics, so that the content works for them no matter how they find it.

  5. Pingback: Why do readers prefer Google over Technical Docs? « Dateline Houston

  6. Mysti Berry

    So, I wonder if it has nothing to do with the docs at all, but simply that because people use this easy portal Google for all kinds of things, now they use it for doc needs. There are few search applications embedded in doc that are as powerful as Google. Back in 2005 I railed that of course our customers used google, our navigation and browse tools weren’t great. Now I wonder about that, and about any cause other than people would rather to search than navigate or browse, regardless of the content…

    1. Mark Baker Post author

      Thanks for the comment, Mysti. I think that is increasingly the case. It has nothing to do with the quality of any particular doc, nor of technical documentation in general, or the quality of their search function or index. The Web contains the long tail of everything. It is reshaping people’s information seeking behavior in ways no one company or industry can influence.

      I do still think there will be specialized databases that people who need their specialized information frequently will learn to use in preference to Google. (In one sense, Amazon is such a database.) But even then, people will find such databases via the web and access them via the web. Offline help systems can’t compete with the Web. (It is interesting to note that MS Office help is now simply a portal onto the web and pulls in third-party content as well as MSs own content.)

  7. Ian E. Gorman

    One reason for using a web search in preference to reading a document is that many large documents either do not have an index, or have an index generated by automation instead of being generated from items marked by someone who has the viewpoint of the end user. In such a case, a user can zero in on what they want more quickly with a short sequence of google queries.

    A second reason may be that the documentation was written by an excellent writer who is nevertheless unable to place themselves in the shoes of the end user.

    1. Mark Baker Post author

      Hi Ian. Thanks for the comment.

      I agree, both those factors clearly lead to a preference for Web searching. Both of those factors could technically be addressed by improving the quality of docs and indexing, but that would be very expensive, and would not change how the users other information finding frustrations lead them to prefer Web searching.

      The way that the long tail is different from these problems is that it is even more clear that one company could not cover the entire long tail of information about their products, no matter how much money they were willing to throw at it.

  8. Paul Sholar

    Mark, if the “tail” for the documentation subject matter of an IT-related product is *that* long, the product likely has too many features with a poorly defined path(s) among those features. This is a symptom of poor product design and (cheap) product management (“throw it over the wall” and “ship it!”), just as poorly defined and written user assistance is.

    1. Mark Baker Post author

      Thanks for the comment, Paul. I’m not sure I can agree with you on this. If we accept the commonly expressed belief that the iPad is a well designed product, and then look at the long tail of information that has grown up around it, the correlation of poor product design to long tail of information demand does not seem to hold up.

      And then there are IT-related products like programming languages and APIs that have very long tails, but it is hard to see how you would design out of such an inherently open and flexible product the need for a long tail. And would reducing the flexibility or the numbers of use paths be desirable for the user?

      Another example that I have come across recently is Ikea furniture. It would not seem like there is a lot more to be said about Ikea furniture than how to put it together, but it turns out that people like to hack Ikea parts to create their own furniture designs (see for example). Is this something Ikea should have anticipated and documented? It is something they should have designed out of their products?

      I don’t think the world is so neat that we can realistically expect every product to be so self-contained and lucid that there is no need for a long tail of information about it.

  9. Myron Porter

    In my experience, users want the equivalent of an expert to answer their questions. An expert system could be built, but not within the time and cost constraints. Perhaps (in Google searches) we are witnessing the first glimmers of that becoming an eventuality.

    1. Mark Baker Post author

      Hi Myron. Your experience agrees with mine. People want to hear from experts. More precisely, I think, they want to hear from people with experience. Tech writers often assume (or at least claim) that clarity is everything, but in reality trust plays a huge role in communication and we won’t act on an instruction, no matter how clear, unless we trust it. And we trust experience.

      David Weinberger points out that one of the effects of the Web is that it gives us access to the experienced as well as the credentialed. Tech docs coming from a company have credentials (even if the person who wrote them has no experience), but it seems that people would rather hear, or at least want to also hear, from people with experience. The Web gives them that.

      1. Alex Knappe

        I think the mere existence of the long tail and its appearance is strongly connected to three mayor factors:
        – the amount of customers using the product
        – the product category
        – the number of product enthusiasts

        While you find a sheer mass of long tail information (1.9 billion Google hits) for products like an iPad (large amount of customers, computer/net related product category, large amount of enthusiasts), the mass of long tail information decreases, if one or more of the above factors aren’t positive.
        If you take a look on the other end of the extreme you will find products like a special rock crushing machine (few customers, product category heavy machinery, few to no enthusiasts). There simply is no long or short tail whatsoever (0 Google hits).
        On the other hand, if one of the factors is outstanding positive, you will once again find a massive long tail.
        On example here would be a Boeing 747 (relatively few customers, product category airplanes, lots of enthusiasts). Here we have only one factor being positive – but it makes up for the other two. You’ll find loads of information (4.7 million Google hits) because of the many enthusiasts.
        So, considering these three factors, you can more or less precisely anticipate how large of a long tail a certain product produces.

        1. Mark Baker Post author

          Alex, that seems like a really sound analysis. I wonder if security would be a fourth factor. Information about certain products or procedures is tightly controlled for safety or security reasons. I suppose some of these might have a long tail if people are trying to hack or subvert the security in some way. But prima facie it would seem like safety and security could be factors limiting the long tail.

          What this clearly points to is that any content strategy has to pay attention to whether or not there is a long tail for information on the subject area, and plan accordingly.

          1. Alex Knappe

            Security doesn’t factor in regarding quantity of the long tail. It only factors in regarding the quality of the long tail.
            Lets take Windows as an example. High amount of users, computer net related, high amount of enthusiasts.
            This produces an immense quantity of long tail information as expected.
            But as security on the source code is pretty high, you won’t find much quality information on the processes working inside.
            For Linux on the other hand, you can find tons of insider information, despite its lower user base and lower overall long tail quantity.

  10. Genevieve Renoir

    I think that interacting with the customers directly would be the best way to address the problems created by a product with a so called long tail distribution.

    If a customer is searching for product information they are probably trying to answer a specific question. If Google search results are insufficient, it seems more likely that a customer will get the information by submitting a question to a help desk or forum rather than to resort to searching for it in an offline document.

    1. Mark Baker Post author

      Thanks for the comment Genevieve.

      Interacting with customers directly would certainly be a good way to address their problems in many cases (though there are clearly cases where other users have seen and solved problems that the vendor has not seen or solved yet). But interacting directly with customers is also expensive. On of the virtues of the long tail of customer-written information is that it answers questions that the company then does not have to pay money to answer. This is actually a reason that companies should embrace the long tail, and should welcome it when customers turn first to Google for help on a product question. Any time you don’t have to pay someone to answer the phone, it’s a win.


Leave a Reply