In a recent blog post, Tom Johnson writes on The Importance of Chunking for Sorting. He also acknowledges that finely chunking content can cause problems when you try to retrieve that content with a query:
[I]f you pull together all topics that have specific metadata, such as all topics related to scheduling events, you may get an unordered collage of topics. The order of the topics may not reflect any kind of sequenced or arranged reading. The list of topics no longer forms a larger, well-written chapter that contextualizes each topic, but rather may seem like little scattered objects here and there.
The problem he is describing is really caused by the fact that all those finely chunked pieces no longer merit the metadata that is attached to them, an issue I recently discussed here. To understand the problem better, consider what would happen if you took apart a piece of machinery like, for instance, an old fashioned alarm clock.
The clock itself we can attach several pieces of useful metadata to. It is a device for telling time. It is a device for waking you up. It is (possibly) an items of home decor. It is (potentially) a movie prop for a period piece. There are a number of useful properties that you could assign to it that would help you find it when you needed something to fulfill these various functions.
Now start taking it apart. First you will disconnect various assemblies: the case, the clock mechanism, the ringer. Some of these, at least, could still have interesting metadata attached to them. But you continue, taking each of these assemblies apart until what you are left with is a collection of screws, gears, and several pieces of bent metal about which you can say nothing meaningful other than that they used to be part of an alarm clock.
The screws and gears could have metadata attached to them. As individual pieces they have definite characteristics, such as thread count, circumference, or number and spacing of teeth. None of this was metadata for the alarm clock, and none of the metadata that applied to the alarm clock properly belongs to the individual gears and screws. They are potentially reusable for all sorts of other projects, none of which have anything to do with the alarm clock. Putting alarm clock metadata on the gears and screws would only obscure their proper screw and gear metadata, potentially hindering their reuse for other purposes.
Suppose you do attach alarm clock metadata to these parts. Then if you query that metadata what you will get back is not an alarm clock, but a pile of gears, screws, and bits of bent metal. That is not an alarm clock, it does not serve any of the purposes of the alarm clock, and it does not merit alarm clock metadata. Such metadata is not accurate or reliable. It is not useful to anyone but the person who applied it. Specifically, it is not useful to anyone who needs an alarm clock.
The point is this: useful metadata can only be applied to useful things. Useful content metadata can only be applied to useful units of content. Break content down into chunks that are smaller than is useful to a reader, and you cannot attach metadata to it that will be useful to the reader.