Baby It’s Scold Outside

The latest target of the scolding classes is a Baby It’s Cold Outside, a pop song from the 30s that is suddenly being “banned” from radio stations on the grounds that it condones rape, and, specifically, that the line “What’s in this drink?” is a reference to a date rape drug. 

The accusation is absurd. As this article explains, the song is actually about the woman trying to talk herself into staying the night in the face of a list of a social taboos against her doing so, and “What’s in this drink?” is a common trope of the pop culture of its time, used to excuse saying something that violates some social norm. You are blaming your words on the booze, in other words, and the joke is that there is usually nothing in the drink.  read more

Turning a page

I am turning a page. An elderly metaphor, but still apt for a writer. My kids are grown. My mortgage is paid. My savings are adequate. I only need do the work I most want to do. I have said the things that I truly thought it important to say about technical communication and content strategy (in Every Page is Page One, and Structured Writing).

This happy state is not without its emotional perils, however. For one thing, you start to ask yourself questions like, if I were to be given a fatal diagnosis tomorrow, what would I regret not having done in my life? When I ask myself that question, I find there is only one item left on the list: publishing a novel. And since anyone can publish a novel on Amazon today let me clarify that I mean publishing traditionally using someone else’s money, because that means someone beside me has some skin in the game.

So I am going to take a run at the novel thing. This is something I more or less abandoned when I acquired the kids and the mortgage and needed to pay for them, and found tech comm to be the most reliable way for a writer to make a steady paycheck. But I’ve dabbled along the way, taken classes, gone to conferences, and has a few stories published — and received enough encouragement to conclude that this is not entirely a pipe dream. Probably a pipe dream, no doubt, but not certainly. But that is what bucket lists are for: chasing the improbable when the probable is done.

Being the pathologically analytical cuss that I am, this is not going to mean I just write fiction from now on. My approach to tech comm has been hopelessly meta, and I am certain my approach to fiction will be just as hopelessly meta. In other words, I have some thoughts about the way stories work and how language works (which, incidentally, I think apply to both fiction and nonfiction), and I expect that I will be writing about those ideas. What I am not going to do anymore is work on new tech comm and content strategy projects and ideas for their own sake. If the new stuff I write about has crossover appeal, so much the better.

No turning of the page is ever quite clean. While completing the latest book, does allow me to complete some thoughts that have interested me about technical communication and content strategy for over 15 years, it does not  bring every project to a final end. Everything seems to spawn something else, and so no work is every really finished. The page must turn with something still unwritten.

This means downing tools on some projects I have been working on for a long time. In particular I have a couple of software projects that are very much related to my ideas on technical communication and content strategy. By their very nature, software projects are never really finished. So I can’t say that I have drawn a line under either the SPFE project or the SAM markup language project. Both projects were created, principally, to work out and to illustrate the ideas expressed in Every Page is Page One and Structured Writing.


Most current content development tools are designed for content that is linear or hierarchical in structure. Every Page is Page One describes an approach to information architecture that is neither linear nor hierarchical but is based on search and linking (Wikipedia being the most obvious example). Wikis support this model, but they don’t support structured writing or automation. The SPFE project was designed to show how a structured approach to a bottom-up information architecture and and Every Page is Page One could work.

In particular, SPFE supports a different approach to link creation and management. Links are one of the most expensive things to create and maintain in current tools, and I believe this has been a key factor in inhibiting the move to in Every Page is Page One information design.

But while the SPFE project is functional and shows what is possible, it is by no means ready for general popular use. If it’s going to go for forward, it needs to be adopted either by an open source community or by a tool vending company.

One of the aims of, Structured Writing, was to explain in detail the kind of structured writing approach that would be appropriate to use with something like SPFE and to explain the comprehensive benefits that such an approach would bring. I hope that that book will increase interest in this type of approach whether or not it increases interest in SPFE itself. (Structured Writing is a survey of the entire range of structured writing practice. It includes, but is not limited to, those that SPFE is designed to support.)


I have long complained that XML sucks for writing, and that this causes real problems for the type of subject-domain structured writing that I advocate in Structured Writing. In the course of writing the book, I developed an alternate markup syntax which I call SAM. Initially the purpose was simply to show the structure of examples without cluttering things up with angle brackets. But I quickly realized that SAM was robust enough and complete enough that I could write the book in it. (The process involved is described in the book itself.) It is also suitable for a wide range of structured writing applications.

I put the language specification and the parser I wrote for SAM up on GitHub. It seems to be fairly reliable for command-line usage at least, and it has an interesting direct-to-HTML mode that could make it a more semantically rich and constrainable alternative to MarkDown.

Both projects were designed to explore and illustrate ideas rather than to be finished working software. While both projects do work, SPFE, in particular, lacks the polish, optimization, and documentation that would be required for general adoption.  Both projects are currently available on GitHub, here and here.  I will continue to respond to issues and pull requests, but if any open source community or commercial vendor is interested in further exploring the ideas or actually developing the specific projects they may be assured of both my consent and my cooperation.

Even if no one decides to take these projects further, however, I think both retain significant value as an illustration of different approaches to the structure and management and creation of content. If anyone is interested in them in that context, they should also feel free to contact me with any questions they have.

This blog

Another important artifact of my years in the tech comm and content strategy space is this blog. I haven’t posted much of late, but, despite this, the blog gets a steady stream of visitors. It seems to have become a useful resource for many in the content strategy and tech comm communities. Because of this, I certainly intend to leave it intact for the foreseeable future. It is also the place for anyone to contact me about the two books, both of which I will continue to support, if anyone has any questions or comments concerning them. This blog will remain the logical place to do that.

The question I’m struggling with is this: Since I will clearly need someplace to write about my thoughts on language in story and fiction and drama in the years to come, do I continue to use this blog for that purpose, or start a new one?

This has never been a personal blog. It has always been a business blog about technical communication and content strategy and about the communication topics that matter to those industries. Communication being as general a human activity as it is, this has meant that I have often dealt with ideas that were far broader than technical communication and content strategy. In particular, my thoughts on the nature of language as a structure made up of stories is something that applies across the board and will continue to be something I think about and write about going forward.

So there is a continuity of subject matter between what I’ve written about in the past and what I expect to write about in the future. However, there will also be a discontinuity in emphasis and a transition to the the more imaginative rather than commercial aspects of communication.

There seem to be three options available to me.

  • Stop posting to this blog and create a new one and grow its audience from scratch.
  • Start a new blog, but cross post or cross link from this blog to the new one whenever I think there is crossover appeal, thus hopefully pulling some of the audience for this blog over to the new one.
  • Just keep posting in this blog and let the audience sort itself out over time.
  • read more

    Experts read more than novices

    If you let one of your houseplants completely dry out and then try to bring it back to life by dumping a large amount of water in the pot, you will end up with water all over the floor. Dry soil cannot absorb moisture quickly so the water you pour in will run through the soil, out the hole in the bottom of the pot, into the saucer the pot is resting on, over the sides, and onto the floor. Damp soil can absorb moisture far more quickly that dry soil. Readers are like that too. Experts can absorb much more information much faster than novices. Thus experts read far more than novices do.

    Unfortunately, this is not generally how we create content for them. In a recent post, What is a quickstart to you?, Sarah Maddox looks at the different definitions that people have of the document types “Quick Start Guide” and “Getting Started Guide”.  Maddox suggests this way of differentiating between the two:

    • A quickstart guide is for domain experts.
    • A getting-started guide is for domain newbies.

    The domain expert, Maddox argues, already understands the task and the tools and just needs some basic information about setup and configuration, whereas the newbie requires:

    • Detailed descriptions of the concepts relevant to the domain and the product.
    • A detailed walkthrough of a complex use case.
    • Explanations of why they’re performing each step.

    This is a reasonable distinction on the face of it. The newbie has a greater information deficit than the expert, so it follows that the newbie should be provided with more information than the expert.

    The problem is that the newbie will not actually read all this extensive information. This is well known, and John Carroll demonstrated it amply in the experiments that he records in The Nurnberg Funnel. Rather than sitting down to read to remedy their extensive information deficit, new users will dive it to the problem trying to complete a specific goal and will only look at instructions when they get stuck. Even then, they will not use the instructions systematically, but will dip in briefly and execute the first thing they see that looks like something they recognize as a coherent instruction, even if it is nothing of the kind.

    This behavior Carroll ascribed to what he called “the paradox of sensemaking”. In essence, people did not have the experience and the mental model to grasp what the large volume of content was saying to them. Like dry soil, they have no capacity to absorb anything. They can only take up small snippets of information at a time, and it generally takes some experience that cracks open their preconceptions about how things work to open them up to receiving the information, let alone to instill in them the patience to read it in the first place.

    Novices, in other words, can only take up information slowly. Experts can take up information rapidly because they have the experience to understand it and the mental models to accommodate it. Based on this observation about the relative capacities of novices and experts, we would be forced to come to the opposite conclusion, that the novices need brief information, since that is all they can absorb, whereas experts should be provided with extensive documentation because they alone can absorb and take advantage of it.

    Can we resolve this paradox?

    Part of the answer is to be found in the word common to both the information types that Maddox describes, the word “guide”. The word “guide” implies that the person writing the “guide” is taking charge of the learning experience, that they are dictating what will be learned, in what order, at what pace, and by what means.

    If you have ever been on vacation with a tour guide, you know what that experience is like. Thanks to the guide’s experience and contacts, you move through the major tourist sites of your destination very efficiently. You never get lost. You never worry about parking or admissions or research. It is decided and provided for you by the tour guide. And if your aim is to tick as many sites off your bucket list as possible in as short a time as possible, it is a great way to get that job done.

    On the other hand, if you decide to wander around on your own and explore at your own pace, you will probably have a much more relaxed time and form much more vivid memories. You may not see all the official tourist sites, but you might have a conversation with the owner of a small out-of-the way cafe in a fascinating building in a less travelled part of town that you remember far longer and with far greater pleasure than your ten second glance of the Mona Lisa or the Sistine Chapel as your guide hustles you on from place to place. Guided tours, for the most part, are simply not the best way to form lasting memories.

    So perhaps the way out of the paradox of long guides for novices who can’t absorb them or short guides for experts who alone are capable of reading long ones, is to jettison the idea of the guide altogether.

    This, certainly, is what Carroll recommended as a response to the paradox of sense making: create a collection of discrete information modules that are designed to be read in any order. Some of those modules will doubtless be read earlier in the average user’s engagement with the product, and some doubtless will be read later. But no two readers will, over the course of their engagement with the product, read all of them in the same order. Novices will probably read fewer of them with less frequency, because that is all they have the capacity to absorb. Users growing in expertise will read more of them faster because they have gained the experience and mental models necessary to absorb them. True masters will read fewer because they will have already have absorbed all they need.

    Dry plants need to be watered slowly. (Dehydrated and starved prisoners, similarly, have to be fed and hydrated slowly.) Novices, no matter how great their information deficit, have to take in information slowly. Starting them off with a big thick guide stuffed with all the things we think they need to know will only give them indigestion. (Though, knowing this, most will not attempt to read it.)

    The problem, then, is not to distinguish between the needs of novices and the needs of experts, but to deal with the inherent problems with the concept of guide. Debating which kind of guide to give which audience is beside the point when it is the guide format itself which is the problem. Novices and experts need to drink from the same well of knowledge, but each has to be allowed to drink at the pace their systems can handle.

    The way you do that is with an Every Page is Page One approach to information design.

    Is Single-Sourcing Dead?

    Neil Perlin poses the question (and answers in the negative) in a response to my post “Time to Move to Multisourcing“. Perlin raises a number of points that deserve discussion. But first, a little clarification is needed.

    The term “single sourcing” is used to mean a number of different things in tech comm and content strategy. Among them:

    • Producing PDF and Web pages (or help) from the same source file or files.
    • Content reuse (in other words, storing a single source for a piece of information and publishing it in various places).
    • Using a single repository/file format for all content.

    These three things are independent of each other, though they may well be used together. For instance, you can produce PDF and help from the same source files stored locally on different workstations running different software, and you can use a single repository and file format without reusing content or outputting to more than one format (as many people using a Web CMS or Wiki do).

    My original post was about the third of these, using a single source repository and file format for all content. This has been a common approach in technical communication and content strategy over the past decade particularly, and I argued that it was time to move away from it. By itself, my argument had nothing to do with either the first or second meaning of “single sourcing”.

    Nonetheless, the ideas are related because the first and second meanings of single sourcing have been common reason for arguing for the use of a single repository and or file format for content. If you wanted to perform these operation, the argument went, you need sophisticated tools and content management systems and that is best achieved by a single source format and a single repository.

    That single source format/single repository model has several significant disadvantages, however. I outlined those in my original post on the subject. But since the single format/repository model was used in part to enable multi-format delivery and content reuse, does that mean that those things are dead if we move away from the single format/repository model?

    In a word, no, since they can manifestly be done independent of it. But we have to think seriously about how we do them if we don’t have a single source format and a single repository. Going back to everyone using their own independent desktop tools and storing the files on their own hard drives has all sorts of well documented disadvantages, and is a non-starter if and when you move to an integrated web strategy, semantic content, or Information 4.0. So if the single source/single format approach isn’t the right one either, we have a legitimate question about how we do multi-format publishing and content reuse going forward.

    Perlin divides his post into two parts, first reviewing the major pain points of single sourcing and suggesting solutions, and then questioning whether these pain points can be addressed in a multi-source, shared-pipes model. Since those points clearly need to be addressed in any alternate model, I’ll look at each of them in turn. The headings below mirror those in Perlin’s post.

    Inappropriate tools

    Perlin argues that most companies won’t spring for high-end single sourcing tools like MadCap Flare for all authors and force them use Microsoft Word which, he says, is “lacking the flexibility and output options of full-power single-sourcing”.

    The solution Perlin proposes is simple: Buy the appropriate tools for everyone who needs them.

    But there are a couple of problems with this, beyond the unwillingness of companies to pony up the cash. First, these tools are unfamiliar to most of the people who would be asked to use them and they are conceptually more complex than Word. That introduces an training overhead and adds complexity to the writing task every day. And if the contributors don’t use those tools full time, they will forget most of their skills they are trained in.

    Give everyone more complex tools is not really a sustainable strategy, nor is it one that is easy to sell.

    Inappropriate training

    Perlin argues that many current problems with single sourcing arise because writers are not properly trained to use the tools they have. The solution: more training.

    I’m not a fan of this argument in any context. There are certainly tools that require training, supervision, qualification, and even periodic recertification. But these are expensive things. Whenever someone blames poor training for a problem, my first instinct is to suspect that poor system design is the real culprit.

    Here’s the thing about multiple output formats and content reuse: unless each format is prepared entirely by hand (in which case, why do you need any special tools?) you are asking writers to prepare structured content that will be processed by algorithms to create different documents and/or different formats on the output side. Whether or not your tools identify as structured writing tools, they are requiring specific structures in order to work.

    If such a system is going to fail, it will be because the writers did not create the structures that the algorithms were expecting. If they are going to require additional training, that is what they are going to need additional training on.

    But whenever you are asked to prepare data to a specific format, there are two factors that are crucial to success: guidance and validation. Writers need clear guidance on the structures to be created and clear and immediate feedback when they make a mistake. If you don’t give people clear guidance and clear and immediate feedback, no amount of training is going to make them perform the task consistently, and if you do, relatively little training will be required. More training isn’t the answer. Better system design, in the form or more robust guidance and validation, is.

    Inappropriate standards

    “People often have no standards to follow to when it comes to using their tools – no templates for different types of material, or style usage standards, for example,” Perlin argues.

    Here, of course, we are in complete agreement (see my note on the previous point). The question is, how do you define standards, where do you implement them, and how do you validate compliance with them?

    Perlin goes on to recommend that we should “embed the standards into the authoring tools as much as possible to make their use automatic.” Here again we are in violent agreement. But the catch here is the extent to which your tools make it possible to do this. In standard commercial tools today, that capability is limited, as Perlin’s examples show:

    For example, create topic-type templates with embedded prompts – “type the list of tools here” – to guide authors as they write. Or create a stylesheet with clear style names and make it the project’s master stylesheet so that it will be applied automatically to every topic.

    That’s about as much of that as a standard unstructured tool like Flare is capable of supporting. But there are severe limitations here:

    • Templates and embedded prompts get overwritten with content, so the structured is not retained and is not available to guide subsequent writers and editors.
    • There is no ability to validate compliance with the template.
    • There is very limited ability to factor out invariant pieces of content, which is a key feature of true structured writing approaches.

    For these reasons, variations from the standard can easily creep in, especially over time. You are still left relying more on training than on guidance and feedback.

    Increasing complexity

    Here I think it is useful to quote Perlin in full:

    Single-sourcing requires many tasks beyond just writing the content. Authors have to decide which output is primary in order to decide which features to use because some won’t work well or at all on different outputs. That means understanding those features. Authors have to create and assign conditions to control which content to use for which output. Define re-usable chunks of content. Create style sheets that behave differently depending on the output. Perhaps define microcontent. And more. And this all must be documented somewhere for reference by the current authors and later ones.

    The result? The increasing power of our tools and increasing customer demands are leading to increasingly complex projects that that can easily go out of control.

    The solution? Again, simple. Document your project.

    I agree absolutely with Perlin that complexity is the central problem, for all of the reasons he lists, and more. The managing of complexity is the central theme of my new book, Structured Writing: Rhetoric and Process (on sale today) . In it I talk about how you can use structured writing to redirect much of the complexity of content management and publishing away from writers, allowing them to focus more of their precious attention on research and writing.

    And this is where I disagree with Perlin. Documenting all of your complexity is not a good (or simple) solution. Documenting it does not remove it from the writers attention. It is better than not documenting it, but not by much. The writer still has to deal with it, still has to spend time and mental energy on it, and can still make mistakes in following the complex procedures you have documented. Much of this complexity can be factored out using the right structured writing techniques. (I would elaborate, but that would spoil the book for you.)

    Lack of motivation on authors’ parts

    This is a fundamental part of the problem. Single sourcing, for the most part, is a solution to someone else’s problem, not the writer’s. As a result, as Perlin writes: “Authors type their content and make sure it prints well and that’s that.”

    Perlin’s solution is essentially imperative: make single sourcing a job requirement, explain why it is important to the company, and provide management oversight to ensure compliance.

    This is indeed the best you can do with current tools, but I have two fundamental problem with this approach:

    • With the best will in the world, people can’t comply with a regime that benefits someone else rather than themselves unless they get clear, direct, and immediate feedback, which current tools don’t give them, because the only real feedback is the appearance of the final outputs.
    • Management oversight can’t ensure compliance in the production phase of a process if it can only perceive compliance in the finished product. Assessing the finished product every time is too time consuming and error prone (you can miss things). And the effectiveness of management oversight decreases exponentially the longer the delay between when the writer writes and when the manager finds the defect.

    What is the alternative? Use structured writing technique to factor out the parts of the process that are not of interest to the writer and institute a clear mechanical compliance regime to validate content as it is produced. In other words, remove the need for them to be motivated to do single sourcing at all, and give them an environment the rewards them for the things they are motivated to do.

    Alternatives to Traditional Single-Sourcing? read more

    Is personalized content unethical?

    Personalized content has been the goal of many in the technical communication and content strategy communities for a long time now. And we encounter personalized content every day. Google “purple left handed widgets” and you will see ads for purple left handed widgets all over the web for months afterward. Visit Amazon and every page you see will push products based on your previous purchases. Visit Facebook …

    Well, and there’s the rub, as Mark Zuckerberg is summoned before congress for a good and thorough roasting. Because what Cambridge Analytica did was personalized content, pure and simple, and no one is happy about it.

    As Jacob Metcalf points out in Facebook may stop the data leaks, but it’s too late: Cambridge Analytica’s models live on, in the MIT Technology Review, the issue is not simply that Cambridge Analytica had access to data it should not have had, and that access has now been removed, but that they used that data to develop models of persuasion that can be used to customize content in the future, and which can be further refined using any other datasets they can get their hands on.

    Concerns have been growing for a long time about the degree of influence that the ability to target ads at individuals can have on their shopping behavior. When the same techniques are used to target their voting behavior, the alarm bells really start to go off. The concerns about the Cambridge Analytica case are not simply that they had access to data they shouldn’t but that they engaged in a process of message manipulation that many consider unethical on its face. The reason for wanting to restrict the access people have to data about us is precisely that we fear they will use that data unethically. The key ethical question is what the owner of the data will do with it.

    But there is really only one thing a content company like Facebook can do with it: show customized content, either by building and refining models from it or by using information on individuals to tailor content to those individuals.

    Services and manufacturing companies could perhaps use such data to change the products and services they offer . Law enforcement agencies could use it to track potential or actual criminal behavior. Governments could use it to track dissidents. But organizations in the content business, which includes manufacturing and service companies who market their products, governments who communicate with their citizens, politicians who run for office, and even law enforcement who issue bulletins to the public and cooperate with prosecutors to try accused, use such data simply to personalize content.

    The ethics of personal data collection, therefore, are the ethics of personalized content. And since the models derived from personal data can live on and be refined even after access to the data has been cut off, the ethics of personalized content go well beyond access to particular data sets.

    And so the question is, is personalizing content ethical?

    Certainly it is possible to personalize content for benign purposes, to personalize it in such a way that all the benefit goes to the consumer and none of it to the vendor. (I’m not sure what would motivate that, or how you would prove it, but as a thought experiment we can at least imagine it.) But professional ethics require more than the possibility of doing good. Professional ethics are about avoiding both the appearance and the occasion of doing evil.

    This is the difference between morality and professional ethics. Morality is about personally avoiding sin in individual cases. Professional ethics are about conducting your professional affairs in such a way as to avoid the imputation of sin, and as much as possible to remove the temptation to sin from the practitioner. Moral teaching bids us do good deeds by stealth; professional ethics requires us to act with full transparency, to expose all our deeds to public scrutiny. They also require us to refrain entirely from activities where the possibility of sin is so great, or the commission of sin is so hard to detect, that neither we nor the public can be confident that we have acted ethically.

    Personalized content may well be one of those areas so fraught with the possibility of sin that professional ethics should require us to forswear it altogether.

    The moral hazard of personalized content

    What Cambridge Analytica did was use structured data to drive personalized content to influence customer behavior. They are not shy about it. Their homepage proclaims:

    Cambridge Analytica uses data to change audience behavior. Visit our Commercial or Political divisions to see how we can help you.

    There is a huge moral hazard in any such endeavor. However much you may think you are helping your customers get the information they need, it is ultimately your vision of what they need that is driving the model. It is the advancement of your own goals that you are seeking to achieve.

    All communication is like this, of course.  All content seeks to influence people. As I have written before, the purpose of all communication is to change behavior and if you want to communicate well you need to have a firm idea of whose behavior you want to change and what you can say that will produce the change you are looking for.

    The issue is, are there methods that it is unethical to use to achieve the behavioral change you are looking for?

    We could take an all’s-fair-in-love-and-war attitude to this. The public, we could reason, knows that the things they read are intended to influence them. Schools take pains to teach their students this under the banner of critical thinking. The ethical presumption, therefore, is that their knowledge that we are trying to influence them is sufficient prophylactic against covert hypnosis: the reader recognizes the attempt and is able to make a mature judgement about it.

    Of course, we have always known that that is not strictly true, that the power to persuade is real power. But it is also power that has limits. Though it has always been able, under certain circumstances, to move civilized people to commit the vilest atrocities (as in Nazi Germany, for example), it has always been limited by people’s innate moral sense and by the power of persuasion wielded in opposition.

    In other words, in the history of communication to date, there has always been a gap that the propagandist could not cross. They could not individually address the particular hopes, fears, and prejudices of each individual because they did not have access to the data or the means to customize the message. They had to issue a general message based on an appeal to general sentiment, and that always leaves open some room for the critical faculties of the recipient to operate, and for opposing arguments to find a way in. The propagandist might get to our doorstep, but they could not get into our heads, and therein lay a saving measure of freedom.

    But the combination of neurological science and big data opens up the possibility of the means of persuasion becoming a whole lot more powerful. If neurological science tells the propagandist exactly where the buttons are and big data lets them identify exactly how to push them in each person individually, the propagandist, like the vampire, can cross the threshold and enter the individual mind, and the gap that provides our last measure of freedom is gone.  Even if the effect is not permanent (Cambridge Analytica did get found out, after all) it allows the propagandist to wield enormous influence, particularly if they time it right before a critical event such as an election.

    In other words, if our engines of persuasion become so sophisticated, so targeted, so attuned to the particulars of our neurological makeup, that the degree of critical thinking that we can reasonably hope to develop in the citizenry is no prophylactic against it, then we, as professional communicators, have lost our moral cover. Buyer beware cannot be our excuse if we have removed any possibility of wariness from the buyer.

    A method that cannot be detected or countered in the time and with the tools available to the person on whom it is used, therefore, cannot be considered an ethical method, even when used for a moral purpose. If nothing else, if fails the basic ethical requirement of transparency. The temptation to sin is too great and the detection of sin is too difficult for such a method to ever be considered ethical.

    Are we actually there yet? A big lie does not necessarily need big data. By no reasonable measure was the US election of 2016 a calamity on the scale of the German election of 1932. It may well be that the chaotic democracy of social media is actually an antidote to manipulation more powerful than the forms of manipulation that social media can presently achieve.

    But let’s suppose that the technology driving personalized content is not mature enough yet to strip the recipient of their freedom, and therefore strip the author of their ethical cover. The point, surely, is to mature it to the point where it is sophisticated enough to do just that. And if we are going down that road, is it a valid ethical argument to say that everything is fine because we have not got there yet? Surely the pursuit of unethical means is itself unethical.

    Personalized content driven by sophisticated predictive behavioral models and extensive data on individuals and groups is a potentially a tool of persuasion against which no reasonable defence is possible, and as much as we may proclaim the innocence of our intentions, our intentions cannot be purer than our hearts, and we are all apt to grossly overestimate the purity of our hearts.

    This is the reason we have ethics in a profession. It is not to let us go right up to the line, but rather to hold us back from even approaching the line, knowing that if we get too near to the line we are inevitably going to step over it. Not only is a person with fiduciary responsibility required not to have a conflict of interest, they are to avoid even the appearance of conflict of interest. The only way to be sure we don’t cross the line is to stop ourselves well short of it.

    And because ethics is at least in part about public perception of your methods, how the public feels about things is very much an ethical consideration, and it is pretty clear that the public has grave concerns about personalized content, concerns which the Cambridge Analytica case has only made more grave. If there is widespread public consensus that the practice is unethical, chances are it actually is unethical, if for no other reason than that demonstrating that you are acting ethically is itself an ethical obligation.

    But the really scary thought is this: if we get really good at this, the public’s objections will vanish, not because the public has decided for itself that it likes this degree of personalization, but because we will have used personalized content to convince them that they do. In such a world, there is clearly no transparency at all, and if there is no transparency, there is no ethics.  The ultimate ethical objection is that if we go too far down this road, all ethical objections will be snuffed out. Not answered; obliterated.

    And so I ask, where should professional communicators draw the ethical line on personalized content? Wherever we draw it, it has to be consistent with transparency. One way to draw that line is to say that it is unethical to do data-driven personalized content at all. If we don’t draw the line there, where do we draw it?

    Time to move to multi-sourcing

    Single sourcing has been the watchword of technical communication for the last several decades. We have never fully made it work. A pair of seminal posts by prominent members of the community give me cause to hope that we may be ready to move past it.

    Single sourcing is about the relationship between sources and outputs of content. Traditional publishing methods had a one to one relationship between sources and outputs, where each output was generated from a single source.

    The inefficiencies of this model are evident, particularly where different outputs are just different formats of the same document, such as PDF and HTML. Single sourcing attempts to remove this inefficiency by combining content into a single source:

    As Sarah O’Keefe comments in  Single-sourcing is dead. Long live shared pipes!, the first of the two seminal post I want to discuss:

    [W]e have been trying for a world in which all authors work in a single environment (such as DITA XML, the Darwin Information Typing Architecture), and we pushed their content through the same workflow to generate output. We have oceans of posts on “how to get more people into XML” or “how to work with part-time contributors.” The premise is that we have to somehow shift everyone into the One True Workflow.

    But there are two problems with this model. The first is that the source in this model is significantly more complicated than the individual sources in the original model. The added complexity comes from the need to support all of the potential outputs from a single source format and all of the management operations you want to perform on that content. But that complexity is imposed on everyone who writes any of the sources involved. Where people used to be able to write in Word or a simple visual HTML editor, they now have to write in Flare or DocBook or DITA or a complex CMS interface.

    You can’t get everyone to agree that taking on this additional complexity is worth their while, and in practice it often slows down getting particular pieces of work done (even if it improves efficiency overall). So what happens is that some people drop out of the system, or refuse to sign on for it in the first place, or don’t use it for everything they do, leaving you with something that looks like this:

    Life remains just as complex for the folks who continue to use the system, but the organization realizes fewer benefits from it because not everyone is using it.

    Modern content demands have only made this situation worse. We now look to do more than simply issue content in multiple formats. We want to select and combine content to create different outputs for different audiences. We want to enhance content with metadata for consumptions by intelligent downstream systems. We want to richly link and organize content both statically and dynamically. We want to control terminology, manage quality, and steamline localization. All of this leads to greater complexity in your source format in order to support all of these things:

    This added complexity is only going to result in more defections and dealing with the complexity, not to mention paying for the complex system, is only going to compromise your hoped-for ROI.

    Is there another way?

    O’Keefe suggests that the answer may lie in what she calls “shared pipes”. That is, a system in which content flows from many sources through a shared publication system and out to many different outputs.

    This is a diagram I have been drawing for many years, and I love Sarah’s shared pipes analogy to describe it. It comes down to this:

    There are many sources of content and many outputs, but the source material goes through a shared process in the middle that handles multiple outputs, selecting and composing for multiple audiences, adding rich metadata, rich static and dynamic linking, and all the rest.

    But wait, don’t all those separate sources constitute that great boogeyman of the content industry, silos? Yes, they absolutely do, and that brings me to the second seminal post, Don’t Dismantle Data Silos, Build Bridges by Alan Porter. Porter begins by noting the reluctance of people to throw out their current way of doing things in favor of the one great single sourcing system:

    Let’s face it: no one is going to throw out their incumbent systems just because we say they should. Especially not if those systems are still doing the job they were purchased to do. We have all worked with systems that are “good enough” to fulfill a specific set of tasks.

    Removing and replacing existing systems isn’t quick or cheap, but the biggest hurdle isn’t budget, or technology (although that’s what’s often cited) — it’s human.

    The human element is indeed crucial. Technically, the single sourcing model might work very well if it didn’t have to be used by humans. But, as Porter notes, humans build systems to suite their own work and they are not willing to give them up to make someone else’s work easier.

    Nor should they, since the quality of a person’s work very much depends on the suitability of their tools to the task at hand. You don’t do your best brain surgery with tools designed for trimming hedges. The current systems that people are using may not be the best possible system they could be using, but at least those systems are specific to the work they are doing. They are comfortable. Perhaps we could design them a system that is even more comfortable, but if so it will be a system more specific to their task, not a single gigantic complex system designed to be a single source of everything.

    So, a multi-source system lets us keep each source system focussed on the needs of the individuals who contribute to it. We then collect information from all those systems and run them through a shared publication process to produce whatever outputs we need. As Porter says:

    Each customer interfacing system can still stand alone and address the needs of a particular line of business, or be an enterprise single source of truth. Yet by passing data between them, or existing enterprise business systems, they can be the foundation of a fully connected continuous customer experience.

    Of course, it is not quite so easy as that. Setting aside the technical issues of actually connecting the various systems together, we are still left with the issue of whether the content coming from each of these systems has enough structure and metadata attached to it for the shared pipes to actually perform all the operations we need them to perform.

    As O’Keefe points out, you don’t actually need every source to contain the structures to perform every system function:

    not all content needs the same level of quality review. Instead, we’ll have make a distinction between “quick and dirty but good enough” and “must be the best quality we can manage.”

    Thus it might be perfectly fine for some of your content to be written in a simple format like MarkDown and pass through your shared formatting pipe without necessarily passing through every other function and process of your central system.

    But what about the content that does need to pass through  those functions. Can we support this without making every individual silo format as complex as our single source format:

    Clearly if the silo formats have to be as complex as the single source format, we have accomplished very little. Even if each of these super-complex formats is specific to its silo, it is not likely to be supported by the existing system, nor is it likely to be comfortable for the individual contributor.

    How then do we keep the formats of the individual silos small while still supporting all of the functions of the central publishing system for all the types of content that need it?

    The answer lies in structured writing, particularly in the style of structured writing that I call “subject domain.” In subject-domain structured writing, the markup you add to the text is specific to the subject matter. Here, for example, is a recipe in subject-domain a structured writing format:

    recipe: Hard-Boiled Egg
    A hard-boiled {egg}(food) is simple and nutritious.
    ingredients:: ingredient, quantity, unit
    eggs, 12, each
    water, 2, qt
    1. Place eggs in (pan){utensil} and cover with water.
    2. {Bring water to a boil}(task).
    3. Remove from heat and cover for 12 minutes.
    4. Place eggs in cold water to stop cooking.
    5. Peel and serve.
    prep-time: 15 minutes
    serves: 6
    wine-match: champagne and orange juice
    beverage-match: orange juice
    serving: 1 large (50 g)
      calories: 78
    total-fat: 5 g
    saturated-fat: 0.7 g
    polyunsaturated-fat: 0.7 g
    monounsaturated-fat: 2 g
    cholesterol: 186.5 mg
    sodium: 62 mg
    potassium: 63 mg
    total-carbohydrate: 0.6 g
    dietary-fiber: 0 g
    sugar: 0.6 g
    protein: 6 g
    read more

    Chatbots are not the future of Technical Communication

    And suddenly every tech comm and content strategy conference seems to be about getting your content ready for chatbots. Makes sense if you are a conference organizer. Chatbots are sexy and sex sells, even if the definition of sexy is a grey box with a speaker sitting on the counter.

    But chatbots are not the future of technical communication. Here’s why:

    Chatbots are stupid

    No, I don’t mean that they are a stupid idea. I mean they are actually stupid. As in they are not very bright. As Will Knight writes in Tougher Turing Test Exposes Chatbots’ Stupidity in the MIT Technology Review, current AI does barely better than chance in deciphering the ambiguity in a sentence like: “The city councilmen refused the demonstrators a permit because they feared violence.” (Who feared the violence?) Human do this so easily we rarely even notice that the ambiguity exists. AI’s can’t.

    As Brian Bergstein points out in The Great AI Paradox (also MIT Technology Review), an AI that is playing Go has no idea that it is playing Go. It is just analysing a statistical dataset. As Bergstein writes:

    Patrick Winston, a professor of  AI and computer science at MIT, says it would be more helpful to describe the developments of the past few years as having occurred in “computational statistics” rather than in AI. One of the leading researchers in the field, Yann LeCun, Facebook’s director of AI, said at a Future of Work conference at MIT in November that machines are far from having “the essence of intelligence.” That includes the ability to understand the physical world well enough to make predictions about basic aspects of it—to observe one thing and then use background knowledge to figure out what other things must also be true. Another way of saying this is that machines don’t have common sense. read more

    The incomplete bridge

    In the Top Gear Patagonia Special, the presenters come upon an incomplete bridge and have to construct a ramp to get their cars across. This is a great metaphor for technical communication, and, indeed, communication of all kinds: the incomplete bridge.

    Technical communication is often described as a bridge between the expert and the user. But that bridge is always incomplete. The user always has to build the final span that connects the bridge to the bit of ground they are standing on.

    This is true for several reasons, the most basic of which is that you have to contextualize any information you receive to your own project in order to act on it confidently and successfully. If the document tells you to push the red button, it is still your job to determine if you are looking at the right device and the right red button, and if your purpose will truly be served by pressing the red button at this time. The document can never entirely ensure that you do not press the wrong button on the wrong device or at the wrong time for the wrong purpose. Only the individual reader can determine those things, and thus only the reader can build the final span of the bridge.

    But the reader may need to build more than that final span of context. While the writer can do a lot to make sure they understand the class of people to which their target set of readers belong, and thus to use vocabulary that the users use the way they typically use it, they cannot guarantee that the piece is written in the vocabulary of each individual user. Inevitably some users will have to build some vocabulary spans for themselves some of the time.

    The same goes for broader issues of task and craft knowledge. The piece may be written for experienced practitioners of a particular craft, but even experienced practitioners may not be familiar with every task in that craft (you might cook for years without ever separating an egg or blanching a carrot). References to these tasks and other aspects of the craft that the reader is not familiar with are other spans that the reader needs to construct for themselves.

    In short, while the bridge the writer builds is always somewhat incomplete, it can be a good deal more incomplete for some readers than for others. Thus there will always be a significant section of the user population for whom the documentation sucks. The bridge will always be incomplete, and the ability and willingness of the reader to build the missing spans for themselves will vary widely.

    This has some very important consequences for technical communication (and communication generally).

    The first is that you cannot make it perfect. You have to design your documentation set with reasonable and achievable goals in mind and you cannot set those goals in terms of perfect and effortless performance of every task by every user. This does, of course, make the setting of goals and the measurement of performance much more difficult. But setting unachievable goals is not a recipe for success. This is why the Every Page is Page One principles of defining a specific and limited purpose and assuming that the reader is qualified are vital to keeping a project from going off the rails.

    The second is that if you keep adding things to your bridge you are more likely to make it impassable than to make it more accessible. The old London Bridge was so overbuilt with shops and houses that overhung both the river and the roadway that it was severely congested and took over an hour to cross. Not to mention that the additional weight of all those buildings caused frequent collapses.

    Overbuilding your documentation set will not reduce the effort that readers have to put in to fill in the spans they need to contextualize the content to their task or to bridge the gaps in their own experience and knowledge where it differs from what the document assumes. Clearly, of course, a bad document can end up a bridge to nowhere that no one can access and no reader is willing to invest in building spans to complete. A good document does all that the writer reasonably can do given the diversity of their audience and their inevitable ignorance of each reader’s individual circumstances. But building more stuff on top of that reasonable document will make it worse, not better. It will make it less passable, not more accessible.

    The third consequence follows. Since a reasonable document is designed with the knowledge that it is an incomplete bridge, it should be designed and presented in a way that facilitates as much as possible the bridge building activity that the reader has to do for themselves. This means that it has to leave room for the intellectual work that the reader will need to do in order to make use of the document.

    It is fashionable today in tech comm circles to talk about the next wave of user assistance taking the form of chat bots or digital assistants such as Alexa.  Of course, this is not the first time we have been told that text is dead. People have been saying that ever since the VCR went on sale, but while YouTube videos are now a major part of the total tech comm landscape, they have occupied a niche to which they are well suited; they have not put an end to textual documentation. Alexa and the Chat Bots (cool band name!) won’t put an end to it either.

    While conversation with a machine is cool in its way, it is the audible equivalent of a command line interface. There is no territory to explore. There is no discoverability. The basic lesson of the graphical user interface is that the user can get on faster if they are given an environment that is easier to explore. Exploration is, of course, the user building their end of the bridge.

    Conversation is a form of mutual bridge building as well, each person adjusting what they say and how they say it in order to get across to the other person exactly what they are trying to express. But even where conversation is available to us, we often prefer to go away and study and work on a problem for ourselves. Getting our minds to the point where the last span is complete and the last spike is driven is not merely a matter of an exchange of information, but of a maturing of mental models in the individual mind. That takes time, and a unique exploration of the problem space by the individual doing the learning. There is no Nurnberg Funnel. Each reader has to build their part of the bridge for themselves.

    The virtue of textual documentation for this process is first that it is asynchronous. We can absorb the lessons of text at the pace our own minds are working at at a given moment. The second virtue is that it is discoverable. Text, particularly in the form of a mature hypertext, creates an environment that the reader can freely explore for the unique parts they need to build their own bridge spans. Google and the Web, have, of course, greatly increased the discoverability of text. The best thing we can do to help the reader do their own part of the bridge building between their business needs and our products is to create content that is best suited to the hypertext/search paradigm created by these technologies. Every Page is Page One.