A Reference is Not a Topic

By | 2012/08/14

Continuing my reconsideration of concept, task, and reference as cardinal topic types, this post is about reference. I planned to call it “A Reference is Not a Table”, as I promised in The Tyranny of the Terrible Troika, but thinking more about it I realized that the issue is really much broader than  that. The real issue is that a reference is not a topic at all, it’s a database.

To be sure, there are some types of references that look very much like a collection of topics. An API reference, for instance, consists largely of a set of pages, each of which describes one routine in an API. Each routine page follows the same pattern: a number of standard fields like return type, arguments, etc, and then a general description of the routine, usually with examples of usage. It is quite likely that a reader will read through the page, at least the first time they use the routine, so the individual reference entry pages do indeed read like topics.

But those topics are really just one way of presenting the reference data. There are other ways that people might want to look stuff up. With the Web becoming the default media for more and more information, we are not confined, and should not confine our readers, to paper-based ways of looking up information. Compare the experience of looking up products in a paper catalog with looking them up in an online store such as Amazon.


The Baedeker was a classic reference work, but on paper, reference is confined to a single static presentation. References are really databases, and today we should treat them as such. Photo: Wikimedia Commons.

A paper catalog always organizes its wares in one particular way. Perhaps it is by genre, or by author, or by title, or by format, or by ISBN number, or by sales rank, but it is always one of these, and if you want to access the content along another axis — all the large-print books in a catalog organized by genre, for instance — the best hope you have is that the catalog provides an index along the axis of interest to you. Even if it does, you have no way to collect all the entries together, so you end up with a catalog bristling with bookmarks or with your fingers awkwardly stuck in several pages.

And no paper catalog can do what Amazon does routinely, which is correlate its catalog with your previous purchases, and the purchases of people whose buying habits overlap yours, and give you a  customized list of title that may be of interest to you.

We know that how content is organized and presented plays a big role in how easily the reader can access, consume, understand, and act on the content. But users have different needs at different times, and different users have different needs altogether, and no one arrangement of content can be optimal for all people at all times. A database has the capacity to organize and present content in a way the suits the present moment of the present reader. Paper can only present one generalized compromise organization for everyone. Now that we are creating content electronically and delivering it on line, there is no reason (and no excuse) to continue to bind ourselves to the limits of paper.

The term “database” is often associated specifically with relational databases and relational database management systems, but these are only one type of database.  A database is any collection of data that is structured so that it can be queried reliably. While you could certainly use a relational database to structure and store your reference data, a suitably structured XML file, or set of files, can also be a database that you can query with XPath or XQuery.

From a structured writing point of view, then, you should not be modeling your reference content as a series of topics according to a single predefined static organization scheme. You should be modeling it as a database from which any number of useful presentations can be generated, either dynamically in response to user input, or to publish reference material in multiple ways.

Creating reference content as a database will greatly enhance your options for reuse. For instance, while a printed API reference is typically organized by library and by routine, it can be very useful for readers to be able to get a listing of all the routines that take or return a particular data structure. That list may include routines for more than one library, and only some of the routines in a library, so if the reader wants to know what routines are available to process such structures, a listing like this is very useful — so useful, in fact, that it is not uncommon for programming guides to include such lists in the text.

But in a doc set written in books, and equally in one written in topics, assembling such lists generally has to be done by hand, and the lists have to be maintained by hand when things change. If the API reference data were maintained as a database, those lists could be assembled on the fly by the doc built scripts. An XPath query for this might look like:

/api/library/routine[argument/type = $type or return-type = $type]

This is pretty simple, and the individual writer would not even need to write this. Rather, the topic type definition for a programming task topic could include an empty element that caused this query to be run when the topic is built, something like:

<api-list-for-type type="user-identity-record"/>

Or, the processing script could be coded to run the query and add the list for all programming task topics that mention this type in a structured way (such as in topic index markup or markup in the text). In this case, you would get significant, useful reuse of content with zero effort on the part of the author of the task topic. And, of course, the links from the listed routines to their entries in the API reference would be generated automatically. None of this is in the least difficult to implement. In fact, this is trivial functionality that could be implemented quickly providing the source data is structured to support it.

The limits of automation are never in what can be coded; they are always in how the data is structured. If data is structured to support a form of automation, the implementation is generally easy; if it is not structured to support it, the implementation will be difficult or impossible. This is why the data structures underlying information typing matter.

Another advantage of treating reference content as a database is that you can merge authored content with information extracted from source code. One of the greatest challenges in maintaining reference material is keeping it in sync with the product development. In some cases API reference material can be generated from comments in source code, but there are many other reference types, from configuration settings to error codes, where the source code contains only partial information that must be supplemented with authored content.

If you can extract partial information from the source code (and source code is a form of structured text and can be processed to extract data) you can extract half the reference from the code and combine it with additional material written in XML, thus halving the work and giving yourself a way of monitoring every change in the source that requires a corresponding change in content. Again, structuring reference material as a database rather than as topics opens up possibilities for improved efficiency, accuracy, and reuse.

I am straining the bounds of a blog post now, but there other benefits that deserve mentioning. Creating references as databases rather than topics facilitates soft linking, so that every mention in a topic of something for which there is a reference entry can automatically become a link to that reference entry.

When a reference is structured as a database, you can choose which fields to show in a generated reference topic, meaning you can store information relevant to multiple products in one database and generate different references for different products just by varying the query terms used to pull information into the published reference — another big reuse win.

When a reference is structured as a database you can run queries that let you look at the data in different ways and thus audit it for completeness, consistency, and accuracy.

Given the importance of the web for technical communications today, there is no reason to keep creating reference content in a form designed only for publishing. It should be created in a form that supports interactive access. Even if you are not planing to provide such access today, you will surely be asked to provide it in the future, so you might as well start getting ready now. And, as a side benefit, you will find that structuring reference content as a database provides immediate benefits for your publishing process, particularly in the areas of accuracy, consistency, timeliness, and reuse.

To be absolutely clear, my point is not that a database is a nifty way to create a reference. The purpose of a reference is to allow people to look stuff up, and a database gives you more options than a paper layout for looking stuff up. A reference is a database by nature. A paper layout of reference information is an adaptation of the database to the limits of paper, both as a method of composition and a method of presentation. But we are no longer bound by the limits of paper, either for composition or presentation. Reference is not a topic; it’s a database. It’s time to start treating it that way.





Series Navigation << The Tyranny of the Terrible Troika: Rethinking Concept, Task, and ReferenceEverything Else is not a Concept >>

2 thoughts on “A Reference is Not a Topic

  1. Techquestioner

    This is an eloquent explanation of the use, structure, and interactive evolution of refernce information. Thank you. To apply it, I have a lot to learn about database tools and operation.

    1. Mark Baker Post author

      Hi Margaret,

      Thanks for the comment. When looking at database tools, don’t forget that an XML document or a set of XML documents can work as a database. Relational databases are not always the best or only tools for creating content as a database.


Leave a Reply