8 thoughts on “The Meaning of Metadata

  1. Tom Johnson

    Mark, thanks for writing this post. You take metadata to an entirely new level here and expanded my understanding. There’s so much to learn regarding this topic. I’m wondering if there’s a standard metadata approach for tech comm (is it simply DITA?). Also, do you have any recommended reading on metadata?

  2. Mark Baker Post author

    Hi Tom

    There is a fundamental problem with the notion of standard metadata. Metadata is used for many different purposes, and your need different types of metadata, and different granularities of metadata for different purposes. Metadata is expensive to collect, so people are not apt to collect more metadata than their need for their own immediate purposes.

    Consider that there is not ever an agreed metadata standard for collecting such simple piece of data as an address. Some forms ask you to break your address down into multiple fields (but different fields in many cases). Others just give you a single box that says “address”. Why the difference? Some collectors of address information want to pre-sort bulk mail so they can get a better rate for their mailing. Some want to gather data on the geographic distribution of their customers. Others just want to print a few mailing labels.

    Collecting and validating line by line address information is more extensive than just getting a single “address” field. So people collect addresses not in a standard way, but in a way that is optimal for the use they want to make of it.

    The same is true of something as simple as names. Some organizations need to break names down into multiple fields: first, last, middle, salutation. For international applications, several other pieces of name metadata may need to be collected also. But some organizations just want a name to print on a badge.

    Standards work well when everybody has the same needs. But if the standard is inadequate for one purpose and burdensome for another, it won’t get used, or it won’t get used properly. It certainly seemed to be the case with Docbook, for instance, that no one used the whole standard, which was far too large for writers to learn, and that many people added things to it to meet needs it did not cover. The result being that there was no guarantee that I could take your Docbook file and process it through my Docbook tool chain and get a usable result.

    Suppose you were using metadata to define the facets of a faceted search. Naturally, you want metadata that expresses the facets of your content that people might want to search on. The metadata that enables faceted search of a used car site probably isn’t going to work too well for a shoe store or an electronics retailer.

    It might seem like there is a reason for all the used car dealers and all the shoe stores and electronic retailers to get together in their respective industry associations and come up with standard used car, shoe, or electronics metadata.

    But if content strategy is king, as we are now being told, using standard metadata has two big disadvantages. First, industry associations take so long to set standards that by the time they agree on used car metadata we may all be traveling by helium powered bicycles. Second, your competitors have exactly the same metadata you do.

    If my content strategy is my competitive advantage, I can’t afford to wait for an industry association to provide the metadata definition that is going to drive my content strategy, and the last thing I want to do is to share that strategy with my competitors. I want to be there first, and I want to be always one step ahead of them.

    Getting your metadata strategy right is the key to productivity and quality in creating, managing, and delivering content. Adopting the same standards as your competitors is probably not the best way to achieve and maintain your competitive edge.

    As for DITA, I don’t know that I would call it a metadata standard exactly. I think of it more as a neutering of XML. XML itself provides you with limitless capacity to define metadata schemas and to capture and encode metadata. It also demands that you then write code to process that metadata for all the purposes you want it for.

    DITA, through its specialization mechanism, says, if you are willing to give up much of the flexibility of XML and just use specialization of our base topic types, we will reduce the amount of code you have to write (at least out of the box). It is, if you like, XML with training wheels. It limits your speed and maneuverability, but it keeps you from skinning your knees while you are a beginner.

    The notion of content as data is new to most technical writers, and so they tend to think of metadata as something entirely separate from content. I have a blob of content and I attach a blob of metadata to it. But the real genius of XML is that it allows you to integrate data and metadata in a single object, and to apply metadata not just to the whole object, but to every level of it.

  3. Marcia Johnston

    Great reminder, Mark, of the larger sense of the word “metadata” (headers and footers, for example). Thanks for the clear analysis.

  4. Pingback: metadata | Early Novels Database

  5. Paul K. Sholar

    XML enables serialized expression of inherently hierarchical datasets. There are lots of datasets (that is, those that are inherently hierarchical) that aren’t good candidates for being expressed using XML. (An aside: Back in the days when SGML was the state of the art in markup language, I remember reading about the controversies among SGML users in the humanities about the inadequacies of using SGML to markup the content of a physical manuscript that has hand-written marginalia.)

    Regarding the meaning of “metadata,” I fear you aren’t quite making the right point, especially when you immediately veer into discussing XML-style markup.

    Yes, metadata is “data about data,” but when it is useful to examine *any* data with also having additional information that describes what that data is intended to describe or represent? For example, in any conventional two-dimensional table, the column headings (and/or row headings) are metadata because they describe how a human being should interpret a given list of data items.

    My POV is that metadata is what provides the *context* for a set of data. That context can be simple, like a tag, word, or phrase places next to other text, or it can be complex, such as a statement (formal or not) of all the relevant factors that were the case when a set of data (such as physical measurements) was produced.

  6. Pingback: What is XML Really About? : The Dynamic Publisher

  7. Pingback: Qu’est-ce réellement que le XML ? : The Dynamic Publisher

  8. Pingback: Worum geht es bei XML wirklich? : The Dynamic Publisher

Leave a Reply