It is often appealing to think of technical communication as a process of answering user’s questions. The difficulty with this view is that one answer can have many questions. If you answer each of those questions, you would be providing substantially the same answer over and over again.
This is very easy to see on StackOverflow, a question and answer site for programmers. Privileged user of StackOverflow can mark a question as a duplicate of another question. Here’s an example:
The question here is “How to check if a variable is a dictionary in python”. This is a question that programmers are going to ask themselves many times. It is a specific instance of a more general question, which is, “how do you check if a variable is of a specific type in Python?”
The programmer may know that their question is an instance of the more general question, or they may not. This depends on how much they know about types in programming languages generally. Even if they do know about type theory, however, they may not think to generalize their question (or to do a search for the more generalized question). Generalizing a question requires mental effort and a person struggling with this problem may not have a lot of additional mental energy to spare. Stress makes us dumb.
Whether they realize it or not, the answer to this question is largely the same whether you want to find out if your variable is a dictionary, a tuple, an exception, or a number: you use either the type() function or the isinstance() function.
Since these functions work somewhat differently, you will need to decide which to use. And this means that the answer to this question is substantially similar to the answer to another question: “[What are the d]ifferences between isinstance() and type() in Python[?]”
And because of this, someone has marked the question about finding out if a variable is a dictionary as a duplicate of the question about the difference between isinstance() and type(). That question is shown in screenshot below:
Now let’s be very clear about this. “How to check if a variable is a dictionary in python” and “[What are the d]ifferences between isinstance() and type() in Python” are not remotely the same question.
- The only words they have in common are “in Python”.
- One is asking how to do something; the other is asking about the difference between two features.
- You would have to know the answer to the first question to know that the second question was in any way relevant to what you are trying to do.
- A person asking the first question is not likely to think that the second question might contain the answer they are looking for. A person asking the second question is not likely to think that the first question might contain the answer they are looking for.
- No search engine is likely to identify one as the answer to the other either, though the answer to “How to check if a variable is a dictionary in python” will at least contain references to type() and isinstance(), which might give it a clue.
They are, in short, not duplicate question at all.
They don’t even really have the same answers. Their answers contain substantially the same information, but that does not make them the same answer. A good answer relates information to the question that was asked, and to the user’s experience and vocabulary. The two questions are asked by people with different levels of knowledge, experience, and vocabulary. Their respective answers must take account of that. They are therefore not the same answer, though they may contain some of the same information.
And yet, someone has marked these questions as duplicates.
Fortunately, StackOverflow does not delete a question or its answers just because they are marked as duplicates. The question and the existing answers remain on the site where they can be found by people who are actually asking the first question and have no idea that the second one might contain the information they are looking for.
Unfortunately, StackOverflow does not let you add a new answer to a question once it has been marked duplicate, which is potentially a problem, since there might still be an opportunity to add a better answer to the specific question being asked.
Why do questions like these get marked as duplicate when neither the question nor the answers are actually duplicates (just the information cited in the answer)? A big part of it is surely our old friend the Curse of Knowledge. Once you know a little bit about how types work in Python, you mind goes straight to the choice between type() and isinstance() for any question related to discovering types. You know the actual answer so well you skip over it entirely to talk about the interesting differences between the alternative approaches.
The interesting thing about the curse of knowledge is just how fast it strikes. Once we learn something, we are under the curse immediately. Thus the person who asked the question about the difference between type() and isinstance() edited their question to link to another question, saying “This seems to be discussed already”. The question they linked to is “What is the best (idiomatic) way to check the type of a Python variable?”
Now, this is actually a different question again. It is asking which method is most idiomatic, which is not an issue that either of the other questions raised. A good answer to this question should address the question of idiom, which the others might not address. So it is not the same answer either, though again the answer will contain substantially the same information.
And yet, once the person who asked the second question had received their answer, the curse of knowledge immediately took hold and they then regarded their own question as a duplicate of the third question.
The kicker here is: The third question had also been marked as a duplicate of the second one. Which, again, it is not, despite its answer containing substantially the same information. What we see, therefore, is not a hierarchy with the canonical question at the top and various deprecated variants below, but a pattern of random and often conflicting accusations of duplication, probably based on which question the particular person saw first.
Despite containing the same information in their answers, all three questions have attracted many thousands of views, and doubtless will continue to do so as other people continue to ask these three very different questions.
Despite the fact that they are actually distinct, however, marking these questions as duplicate does have a useful result. It serves to link the questions together, which is useful exactly because their answers do contain substantially the same information (despite not being the same answers). This makes a wider pool of information, more varied forms of expression, and more diverse code samples available to readers, all of which increases the chances that they will get the best solution to their problem.
It would be better, therefore, if StackOverflow made a distinction between duplicate questions and similar answers. Genuinely duplicate questions should certainly be marked as such. But providing a way to mark distinct questions with similar answers as such would go a long way towards avoiding falsely labeling questions as duplicate when they are actually very different questions with similar answers.
The curse of knowledge might interfere with people’s ability to make the distinction, but having the two categories available would help people make the distinction correctly, and might go a long way to address some of the disputes you find over whether or not questions are duplicates.
This isn’t just a problem for StackOverflow and sites like it, though. The differences between distinct questions with similar answers is one that matters to all of tech comm and directly affects our reader’s ability to find the answers they are looking for.
Whether they browse a TOC in a paper book or type a query into Google, people do not search for answers; they search for questions. Literally, they type their question into the search box and hit Enter. They search for questions because they don’t know the answer. They search for what they don’t know in terms of what they do know, because there is no other way to do it.
When we organize and categorize it is very easy (and very convenient) to forget this fundamental fact. We organize and categorize content based on our full and blinding knowledge of the subject matter and of the content. If we imagine our user’s asking questions, we get the questions wrong because we already know the answers. If we look at actual questions, we tend to group and to paraphrase the questions into consistent forms that we recognize, and in so doing lose all hint of the original confusion and ignorance that went into forming those questions.
The reader’s path cannot be made straight. We must not imagine that we can lead every reader directly to every answer. Rather, we must provide many paths through the wilderness, pausing over and over to reorient the reader as they work their way through the quagmire of sense making.
Spot on! My empirical data and conclusions from my research confirm your claim “one answer can have many questions”.
I’ve found that the same type of information source (for example an image showing an exploded view of a machine equipment) can be used to satisfy different information needs (such as the need to know whether the machine equipment has a certain spare part and the need to know where a gasket shall be mounted).
This conclusion leads to an information design dilemma. If we know what information users need, how do we know what information is suitable to satisfy an information need?
Furthermore, many technical communicators work in parallel to product development. Thus, they must predict the information needs of users (=the questions) since the product has not yet reached any user.
So, how do such technical communicator know what information users need? I’ve discussed that there are two ways of defining the concept “information need”: the expert perspective and the individual perspective. The discussion is found here: http://excosoft.com/how-do-you-define-a-users-information-need/
For those interested in designing technical information from a perspective of satisfying individuals information need, I’ve written some thoughts in an article in ISTC Communicator (available from http://excosoft.com/wp-content/uploads/2015/06/Comm1506Web_JL.pdf)
Mark, if you happen to be in Sweden on 29th of January, I invite you to take part of the public defense of my licentiate thesis. Based on my empirical findings, I put forward a discussion on information design dilemmas when considering users information-seeking behavior from a viewpoint of individuals information need.
Thanks for the comment, Jonatan. I knew I would hear from you on this one! I am glad we are on the same page.
I’m more and more thinking that we should be moving away from findability and focusing on navigability. In many ways they are the same thing, but findability tends to make us think as the problem of one of retrieval. There is a perfect piece of content that exactly meets the reader’s need and our job is to make sure they can retrieve it directly from the repository.
This model is very appealing, but it does not work. First, there is no perfect piece of content. Second, the user does not have enough information to fully describe, and therefore accurately retrieve, that perfect piece of content even if it did exist.
The real quest for information starts in ignorance (where else could it start?) and proceeds through several pieces of content, experiments, and human interactions to slowly dawning understanding of what you actually need to do to solve your problem.
Focussing on the navigability of content (both internally through consistent structure and externally by a consistent approach to linking) is a much better way to aid the reader as they struggle from ignorance to realization.
Thanks for the invite. Sorry I won’t be able to make it.
“Findability” and “Navigability” are just semiotic terms. What matters is how we define them. Your definition of navigability looks interesting. I kind of agree.
I believe that good content must at least support two fundamental aspects: 1) help the user judge the relevance of what is found and 2) help the user navigate to (discover) related content in case what was found was not relevant.
Regarding helping users to judge relevance, I did put down some thoughts here: http://excosoft.com/help-users-judge-relevance-content/.
A richly interlinked content set of EPPO pages helps users to navigate to related content. This supports users discovery approach.
I think your term “navigability” implies that users are learning by discovering, which is a fundamental view of users in minimalism.
This, leads me to suggest that users are not only learning a product by discovering it in a sense-making aspect. They also discover the content environment following the same discovery learning approach.
User: “I think I shall click this link, wonder what happens? Ooops I got wrong; how do I get back to try the other link?”.
So instead of “navigability”, maybe “discoverability”?
I like discoverability. In fact I think we need to make a firm distinction between retrieval and discovery when we talk about findability (I’m planning to post on this soon.)
But the problem we face here is getting people to focus on the fact that readers discover a path from ignorance to understanding by moving through content, not to content. Navigability is not just about linking, it is about writing the content to support navigability through the subject matter and the content simultaneously and in concert.
Unfortunately, the word “discoverability”, like “findability”, today implies retrieval, as if the reader were doing the equivalent of looking up a number in the phone book. It completely ignores the state of ignorance and confusion in which the reader begins their quest and the difficulties that quest entails. (Here I think I am saying much the same thing you did in your article on intelligent content (http://excosoft.com/intelligent-content-good-idea-technical-communication/).
So, in an effort to focus people’s attention on the importance of assisting the reader to move through content in their complex quest for understanding, I’m going to stick with “navigability”.
Discoverability and findability tend to imply that these are problems that can be address independent of and outside of the content. Navigability, I hope, implies that the problem must be addressed in the content itself.
I think your claim makes a lot of sense: “But the problem we face here is getting people to focus on the fact that readers discover a path from ignorance to understanding by moving through content, not to content”.
Content is the “thick jungle” you must cross to come to your destination (a “location” where you are enlightened and understand). The act of making your way through, is the act of discovering and learning. You discover something every time you cut down a tree to be able to move forward.
In fact, you do not really know where you are heading as the jungle is very dense. You know when you see it. As someone said: “I don’t know what I want, but I’ll know it when I see it”.
This is different from viewing the thick jungle as only the retrieval process when moving “to content”; struggling with key words and TOCs. Once you get through, you have found some piece of content (relevant or not).
Hi Mark—
A great point, as always! One of the ways I’ve seen this problem addressed (unsuccessfully) was in the old ServiceWare KB application*, which documented symptoms/questions as separate elements in the KB article content. This was, in theory, a great method to enable customers to find solutions since it presented different symptoms related back to a single root cause. The underlying data model didn’t allow these relationships to be managed very easily, unfortunately, and the user experience was very confusing because it presented answers and solutions as independently searchable variables (along with a number of other elements). This complexity confused and frustrated both customers (and support engineers), who much more comfortable with a binary content offering (Question + Answer) than with a multiplicity of choices with which they must attempt to select their solution from a field of inter-related symptoms. Google has trained us well 😉
Allan.
ServiceWare merged with Knova to form Kanisa in 2005, which was in turn bought by a holding company that became Consona Corporation, which renamed itself as Aptean in 2012 (after merging with another company).
Thanks for the comment Allan,
Indeed, that sounds like a classic case of confusing database retrieval with human problem solving. Alas it seems to be a mistake that many in the industry are intent on repeating.