PDF in a Bottom-up Information Architecture

By | 2015/05/15

This is another in a series responding to questions from my TC Dojo series on Bottom-up Information Architecture.

Q: We are still frequently requested to deliver PDFs. What is the impact of this new way of writing when the deliverable also needs to be PDF?

A: One of the things we have discovered about documentation preferences is that what people ask for and what they use are often different things. So the fact that you are still being asked for PDFs does not necessarily mean people are still using them. On the other hand, they may be, if they have special needs or you are not providing a better alternative.

JoAnn Hackos reported that one of her customers did a study in which they moved their content to a linked topic-based architecture. They found that users still asked for PDFs, but when they looked at the metrics, they found most people were actually using the online topics, not the PDF.

I think there are several possible explanations for this:

  • People may not be clear on what you are asking. If they hear “documentation” as “manual” they may think their choices are “PDF” and “paper”. In this case, most people will opt for PDF. But people today are often less interested in whether a product has a manual than if it is “Googleable” — that is, if a Google search turns up good information about it. PDFs don’t tend to make for Googleable content, but you asked about a manual, right?
  • If you ask “PDF or HTML” they may think you mean handing them a CD with static HTML pages on it — which means HTML with no search engine and probably little or no linking. If that is what they think you mean, they probably want PDF so they can search it. (Every time I have asked a customer why they wanted PDF, the response has been “so I can search it”. That tells me they assume all the other options are harder to search, which may be true if your content is not where Google can see it.)
  • If your content is organized like a book, it may be easier to use when presented as a book. This may be true even if people are trying to use it bottom-up. Content whose architecture is all top-down is very hard to use when presented as a collection of fragments.
  • If you are creating highly fragmented content (as is often the case when writers take DITA’s task/concept/reference structure too much to heart), it frustrates readers who want to scroll for related content. A PDF is scrollable. But if you created content that was a useful length, it will usually support any search-by-scrolling that the user wants to do.

In short, you may be being asked for PDF either because that is what users are used to asking for, or because what you are delivering as an alternative is hard to search and scroll. Presenting content with a well-thought-out bottom-up architecture will usually remove the desire for PDF, though the habit of asking for it may continue for some time.

That said, if your customers really do want PDFs, what does that mean?

  • Is it a way of saying that they really want a book — a linear narrative to read from beginning to end? That’s unlikely. John Carroll found that even people who think they read like this really don’t, but sometimes you have to give the customer what they think they want, not what they will really use. On the other hand, many useful books are actually collections of independent articles. You can certainly collect EPPO topics together into a collection like this and issue it as a PDF. This gives them what they think they want and what they will actually use. (But if you have verified that they really really want a single linear narrative, write that.)
  • Is it a way of saying that they want content that they can use off line because they do not have connectivity in their work location. In this case, chances are they don’t want to travel with a truck load of manuals. They want the docs they need for the work they have to do today. A well planned EPPO topic set can let them just print the topics they need for today. You can either provide a PDF of each topic, or a PDF of a related collection of topics from which they can print an appropriate selection.
  • Is it in order to meet the requirements of some regulation or process that they do not have the ability to change? If so, create a PDF by creating a collection of EPPO topics in a reasonable order. Yes, this may introduce some redundancies in the context-setting material for each topic, but in this case redundancy is good, as people are not likely to actually read the entire PDF. Also remember that the PDF format does support linking, so you can still richly link your topics to each other.

So, if your customers are asking for PDF, you can certainly meet that request within an Bottom-up Information Architecture and an Every Page is Page One information design. However, it is important to try to establish why they are asking for PDF. Is it just an old habit, a misunderstanding of the question or the options, or is it a legitimate need that should be met in a specific way?

The world of content is in a time of transition. Neither readers nor waiters nor regulators nor managers have all fully accommodated to the changes that the Web has brought to how information is located and consumed. As such, both the book model, and book model formats such as PDF may be with us for a while yet.

For a long time now, however, we have been producing content on the book model, using book-model tools, and then doing a best effort transformation to Web output. It is time (and past time) to reverse this priority. Even if we still have to produce book-like outputs as well as Web-like outputs, it is time to put the Web first in our design decisions and our tool selections.

Series Navigation << Search ranking and bottom-up architecture
Category: Content Strategy Technical Communication

About Mark Baker

I am an aspiring novelist and former technical writer and content strategist. On the technical side, I am the author of Every Page is Page One: Topic-based Writing for Technical Communication and the Web and Structured Writing: Rhetoric and Process. I blog at everypageispageone.com and tweet as @mbakeranalecta.

5 thoughts on “PDF in a Bottom-up Information Architecture

  1. John Russell

    What exactly is the methodology for determining that “most” people used the online info? A person who downloads the PDF disappears from the radar of the web site logs after a single download event, but they might read the whole thing cover to cover, refer to it many times in future, and become an expert without ever coming back to the web site. Whereas if the content is split up into many too-small topics, people might view many different pages as they browse around looking for info and having an unsatisfactory experience, or getting answers one at a time (one per page). That doesn’t mean that the readers are expressing a strong preference for the online format, or that the value of the online content is correlated with the number of page views. (Combine the HTML files into fewer, larger pages and you can expect page views to decline. Which might be preferable from the reader’s perspective.)

    1. Mark Baker Post author

      Thanks for the comment, John.

      I don’t know the methodology used in the study I mentioned, but you raise a fair point. There are methodological problems in studying unlike behaviors, especially when one leaves more of a footprint than another.

      On the other hand, web analytics can give us a lot more than just page views. We can track visits and the number of pages that each person viewed during a visit. So making topics bigger should not in itself throw off the number. (But I am with you that readers would generally be better off with larger topics than they are often given today.)

      Directly testing the proposition that people are downloading your PDF, reading it thoroughly, and referring to it often, may be difficult. (Though basing your content strategy on the assumption about behavior you cannot test is hardly ideal.) But we can ask if the scenario you propose is consistent with other things we know about user behavior.

      What we do know, from all kinds of studies, is the information foraging, and information snacking, are increasingly dominant user behaviors. People do not store information themselves because they can always get it on line. They are less inclined to study up on a subject before attempting a task because they can always find help the moment they get stuck.

      I ask people about their information seeking habits all the time, and what I here, almost invariably, is “Oh, when I have a problem, I just Google it.” I have literally never heard anyone say that they create a cache of PDF to read later. (Caching, as a behavior, only makes sense in anticipation of scarcity, and information is never scarce today.)

      This is not to say that no one ever studies a book anymore. There are strategic tasks, in particular, where long thought supported by long reading may be necessary. But overall, we now live in an info snacking world, and while you can certainly make snack-sized PDFs, they are not the most natural format for information snacking.

  2. Michael Thomas

    PDF distribution can be managed better than web access. This is useful not only when the customer might not have network access, but where the documentation is confidential.

    Creating a private HTLM package would be a better long-term option, but that has access and standardization issues.

    PDF is still around because it is so common : )

    1. Mark Baker Post author

      Thanks for the comment, Michael.

      Those are indeed use cases for PDF, though lack of internet access is becoming less common all the time.

      One of the things that happens as technology progresses is that technology that was mainstream becomes relegated to niche applications over time, which is what I think is happening to PDF.

      But some of those niches may still be important to us for a long time to come.


Leave a Reply