Can Content be Engineered; Can Writers be Certified?

tl;dr: We can apply engineering methods to content development, but we do not have the body of proven algorithms or known-good data to justify formal certification of communication professionals they way we have for doctors and engineers.

We talk about content engineering. I call myself a content engineer sometimes. But can content really be engineered? Is content engineering engineering in the same way that engineering a bridge is engineering, or only engineering by analogy?

This post is prompted by a fascinating conversation with Rob Hanna and others at the monthly STC Toronto Networking Lunch. The conversation morphed into something I think I can fairly characterize as: is there a uniform methodology to technical communication, one that can form the basis of a curriculum, a certification, or a toolset, or is there a legitimate diversity of approaches, roles, methods, and tools.

Let’s clear one thing up first: When people talk about content engineering what they often mean is engineering publishing and content management. There is no doubt in my mind that publishing and content management can be engineered. The question is, can they be engineered without doing violence to content quality and burdening authors with tasks and requirements they should not have to bear? Ultimately I think the answer to that question is yes, though I don’t think many systems achieve that today. But actually engineering content — that is, making content itself better through an engineered process — is something different.

Engineering and imagination

We could make a very broad brush division of human activity into works of engineering and works of imagination. Works of engineering would constitute all activities based on proven algorithms and known good data, and works of imagination (whatever imagination may be) would constitute all the rest.

But we can immediately see that this is a false dichotomy. The ideal engineering solution certainly involves proven algorithms and known good data, but these are hard to come by. And the way we get to them is not by a pure act of imagination, like Minerva springing from the head of Zeus. Instead, we use imperfectly proven algorithms running hoped-good data, and we study the results with the aim of systematically improving the quality of our algorithms and our data.

And that, indeed, would be a far better description of what engineering is: the process of systematically improving our algorithms and data. And that process itself is carried out in an engineering fashion, using the best available algorithms and the best available data to validate our algorithms and our data.

Which is not to say that imagination is absent from engineering, because the process of improvement requires some input and impetus which must come from some place other than the algorithms and data that we already have: imagination. In addition, each engineering problem has unique aspects to it for which we have no certain data or algorithm and which must be addressed by imagination, informed, of course, by the best data and algorithms that we do have. (But discerning how these apply to a unique situation is a matter for the imagination.)

Certifying proven practices

As certainty in our algorithms and our data grows, we can begin to codify knowledge and practices, and then to educate, certify, and license practitioners based on their grasp of this knowledge and these practices. This is the kind of education, certification, and licensing that we require of civil engineers and doctors. But we don’t require it of software developers, by and large, or of technical writers.

Is this mere omission, crying out to be rectified, or are there valid reasons to require certification and licensing in some fields and not in others?

There are commercial considerations, of course. Certification and licensing are attractive to some people employed in a field because they hope that erecting barriers to entry will raise salaries and improve job prospects for people with the certification. Others in these fields oppose certification because they do not want to be burdened with certification requirements or bound to standards of practice that they may not agree with. This is very much the case with the debate over certification in technical communication today.

When is certification in the public interest?

But certification merely to raise salaries of practitioners is not in the public interest as it merely raises prices without raising quality or improving safety.  The social justification for standardization of practice and certification and licensing of practitioners is that it improves outcomes sufficiently to compensate for the raised costs. High costs for engineers and doctors is a price society is willing to pay for buildings that don’t fall down and surgeries that don’t kill more people than they cure.

To be socially justified, standardization of practice and certification and licensing of practitioners in technical communication, or the content professions generally, would have to show a verifiable and consistent superiority of a given method compared to all others. And the simple fact of the matter is that there is no current method that can do that.

Why not?

Algorithms and data

It comes down to algorithms and data. Engineering is a matter of continually refining algorithms and data through a process that is itself based on algorithms and data. At a certain stage in this process, it is better to let a thousand flowers bloom. We don’t yet have a sufficiently well defined algorithm or sufficiently good data to select one algorithm or one data set either as the objectively most correct or the pragmatically most likely to produce the best results over time if we abandon all the rest and focus everybody on refining this one. Until we get to that point, prematurely cutting off the other flowers is more likely to doom us to an inferior path than it is to foster greater progress.

As things progress, certain paths may prove to be dead ends. At a certain point a few paths, or one path, will be so strongly supported by evidence that there is a very clear social benefit in standardizing on that path and in training and certifying practitioners who follow that path. We have very clearly reached that point in civil engineering and medicine.

This is not to say that there are no outliers in these fields. There are doctors and engineers who have been trained and certified and have then chosen to examine other paths. There are people who are not trained or certified but have struck out on their own. Most of these efforts will come to nothing, of course, but sometimes they can add something new. Very very occasionally, they can spark a revolution and put an entire profession on a new track.

Two models of a profession

This suggests two fundamental models for professions, based on where they are in this process of refinement. For those whose algorithms and data are sufficiently refined and proven, a broad certified core of practitioners and a small cadre of outliers returns the greatest social value and the greatest promise of continued growth. For those whose algorithms and data are less certain, though, greater social value and the greatest promise of continued growth will come from letting a thousand flowers bloom. Premature selection and concentration can only do harm. Premature certification is a social evil that limits growth.

Technical communication, and the content arts generally, seem to me to be very firmly in the second camp. They are not alone. Software development methodologies and business process improvement is also in this camp, with new theories and approaches been developed and tried on a regular basis. (Check out the debate among tech entrepreneurs about the value of MBA’s.)

It may well be that certain professions will graduate from the second camp to the first (as medicine and civil engineering did in the past) while others never will. I expect that technical communication and the content arts generally will never graduate, and I will explain why in a minute. But whether we expect to graduate or not, until we do, we still need to let a thousand flowers bloom and not try to prematurely close off alternative approaches until we can prove our algorithms and our data with much greater certainty than we do now.

Engineering and a thousand flowers

Letting a thousand flowers bloom does not mean abandoning engineering or engineering methods. It does not mean that we should stop developing algorithms, gathering data, and rigorously testing the results. These are the flowers. If we stop doing these things, we don’t have a thousand flowers, we have none. Not every method we try will work. Not every method that works for one situation will work for the next. Not every method that one team uses successfully will be used successfully by the next. But on balance, applying engineering techniques will make our individual projects better, even if they are not provably universal in application or globally superior in outcomes.

And of course, individual organizations cannot afford to let a thousand flowers bloom internally. There is an overhead to each flower that requires that the number of approaches within an organization be limited. This does not mean they can or should be limited to one, however. The fact is, few engineered approaches to content are universal in scope, and the more universal they attempt to become, the more overhead they tend to have. Letting different content groups with different aims use their own tools and processes is often the lesser of two evils compared to trying to centralize all content production in one complex training-heavy system.

But the issue here is not whether a thousand flowers should bloom in one organization. It is whether a thousand flowers should bloom across the content producing arts. And the fact here is that not only should they, but they certainly will, for there is no method or tool that has achieved anything resembling universal acceptance. Nor has any one of them shown itself to be universal in scope. Different tools and techniques are better at different things.

Certification in tech comm is often compared to certification in project management, where the CPMP certification has received widespread industry acceptance. I don’t know enough about project management to judge if it has reached the level of engineering knowledge of civil engineering or medicine, but it is clear that project management is a much narrower field than content creation and that, crucially, it is well positioned to gather useful performance data against which to validate its methods.

Validating a body of knowledge

A profession has a body of knowledge. A body of knowledge consists, essentially, of its proven algorithms and known-good data. A collection of books and articles, however large or well regarded, is not a body of knowledge without proven algorithms and known-good data.

Trying to define a professional body of knowledge, or to certify practitioners based on that body of knowledge, has no societal value until your ability to validate it reaches the tipping point where the social value of standardization outweighs the social value of letting a thousand flowers bloom. None of the content arts is at this point. I doubt they ever will be.

The triumph of tacit knowledge

Here’s why: In the communication arts, some of the greatest practitioners in all fields have been people with no formal training or method. Some of the best novels are written by people who never took a creative writing course. Some of the best manuals and technical books were written by people who had never heard of technical writing, let alone taken a course in it, nor read a book on cognitive psychology, or even cracked a style guide. All of these things may make some people better, but it is still possible to practice that art at a high level without any formal knowledge, based on tacit knowledge acquired over years of teaching, reading, writing, and practicing the subject matter you write about.

The limits of tacit knowledge

This same kind of native practice is just not possible in civil engineering or in medicine. Most of us can build very basic structures and treat simple cuts and colds without formal training, but we can’t build great buildings or cure serious ailments without significant formal knowledge of the properties of material and structures or the operations of the body and the effect of drugs on the body. Such knowledge cannot be acquired tacitly. It requires careful measurement, exact recording, and formal methods of extrapolation and application. In these professions there is a demonstrable and impenetrable ceiling which no one can pass without formal study, and beyond which no one should be allowed to venture without professional certification.

This simply is not true for content. Whatever benefits content engineering may bring (and it brings many) it is not a prerequisite for peak performance. It may enable many people to perform far better than they otherwise would, but some people can and do produce great works without formal study or knowledge or method.

It may be that all the great works of communication of the past merely represent the ceiling of what tacit knowledge can achieve and that in the future codified communication methods will produce far greater works. But getting there requires proven algorithms and know-good data, and that is tough.

The difficulty of proving communication algorithms

The biggest problem for formalizing methods in communication is that the algorithms are fiendishly hard to define and validate, and the data is fiendishly hard to gather, validate, and interpret. Unless we can make progress on these fronts, we can’t move closer to the point at which we can choose which of a thousand flowers will become our single tree.

Why is it so hard? First, the algorithm for communicating is very hard to define. What facts does one present in what order and with which words to convey an idea to an individual reader? It is impossible to pin down because every reader has a different background, and a different goal in mind as they read. And even the same reader coming back to the content a week later will be different because of what they have learned and forgotten in that week, and how their task has changed since the last time they read.

The hardness of concrete of a given formulation does not change from one building to another or one day to the next. The effect of a drug on a body system may vary within a range between individuals, but only within a range, and probably not from one day to the next.

And those properties are easier to measure. Does a particular block of concrete bear a particular load? You can measure that. Does a particular drug effect a particular cure? You can measure that. Does a particular article enable all users to complete a particular task? You will have a very hard time measuring that. And if you could, what would you be measuring? What quality of the article produced the result you measured?

Not only is it that much harder to measure, but there are many more users and many more tasks than there types of building material or drugs and organs. Indeed, the variety of users and tasks is so great that it is hard to get enough data points to validate an algorithm. A/B testing can tell you that one formulation works better than another for a particular application (if you have a way to measure successful task completion, which you usually don’t), but that does not prove either choice best, nor does it generalize the algorithm. Worse, in some cases what worked in the past may cease working in the future as the audience changes. (This is particularly true of content marketing, where people quickly get wise to the latest gimmick.) You can certainly generalize and recognize patterns in what works and what does not, but such generalizations are inherently less certain than a specific load bearing number for a specific building material.

Secondly, it is very hard to measure the actual effect of content. For technical content the measure is, did the user act correctly. But in the real world, not only are we seldom in a position to measure if the act was correct or not, we are seldom in a position to even know what the intended act was. We can set up artificial lab conditions in which we define the act, but while this may tell us something useful, there are all kinds of uncertainties involved in this kind of measurement.

Third, it is hard to measure the effect of a content production rule or procedure on the writer who produces the content. No matter how much structure we wrap around the process, in the end a writer is a storyteller. How do we measure whether the structure and rules we enforced led them to tell the optimal story? And when they fail to tell the right story, how do we detect that they have failed or why they have failed?

If we can’t measure these things consistently and accurately, we can’t refine the algorithm. And in many cases, we simply can’t. We can observe that all sorts of things worked better under certain local conditions, but understanding exactly why and formulating it as an algorithm that will ensure equal success in all other circumstances simply eludes us.

Good actionable knowledge

This is not to say that there isn’t good actionable knowledge about technical communication, content strategy, and the content arts. There is lots of it. I hope I have been able to make a small contribution to it. The thousand flowers are growing and being well tended. Its simply that we don’t have the data to prove any one of those flowers to be the one true tree (if, indeed, there actually is one true tree).

Certification in individual methods

This does not mean that there is no role for standardization and certification, or even licensing. Any one of the flowers is an engineering method and you can standardize a method and certify people as practitioners in the current state of that method. For instance, Agile is one of several competing software development methodologies. Scrum is one of several competing methods for organizing work in an Agile environment. You can be certified as a scrum master, demonstrating that you know how to do scrum in the present state of the art.

But what you can’t reasonably do is to wrap up a bundle of a dozen of the more popular flowers at the moment, wrap a certification around it, and call it a certification for the profession as a whole. You can reasonably say that you need to be a certified scrum master to be a scrum master, but not that you need to be a certified scrum master to be a software developer. This has nothing to do with whether such a certification validly measures the current state of the art for a professional technique. It is simply a matter of not being able to demonstrate that the techniques in your bundle are objectively better than those outside of it.

Why bother trying to engineer content then?

If imagination still plays so large a role in content tasks, and if engineering methods are so hard to prove, and therefore to advance, why bother with content engineering at all?

For one thing, even if engineering methods could tell us nothing at all about effectiveness, they can still be used to improve the consistency of what imagination produces. If imagination determines that a certain pattern is the correct way to write a certain topic, then engineering can help ensure that that pattern is followed consistently.

Secondly, until we can achieve a measure of consistency in how we execute the work of imagination, we will not be able to usefully measure anything, and therefore to assess the value of any algorithm that our imaginations may devise.

The value of patterns

At this stage of development (which may be our permanent state) I believe patterns are tremendously important. A pattern is far from having the formality and rigor of an algorithm, but it can do a lot to improve quality and completeness and can form the basis of a repeatable measurement.

Hollywood does not have a particularly high success rate in developing new works of imagination. When it does get a hit, it tries to reproduce the pattern as many ways as it can until all appetite for the pattern has been wrung out of the market place. Not all attempts to follow the pattern are equally successful, but it is very clear that attempting to define and follow the patterns of hit shows has a better success rate than trying to launch entirely new patterns.

This does not mean that we have found the story algorithm, the algorithm that would ensure that every movie, book, or TV show will be a hit, but what we have seen is that in a field in which a thousand flowers bloom, it is possible to identify and reproduce the better flowers, at least for a while. This is an absolutely valid application of content engineering, even if it does not produce, and may never produce, the grand unifying story algorithm.

Yes, we can apply engineering methods to the improvement of content. But we are not at the point, and may never reach the point, of having a rigorously proven set of algorithms and data the can form the basis of a socially justifiable certification of technical writers or other communication professionals.

, , , , , , , , , ,

11 Responses to Can Content be Engineered; Can Writers be Certified?

  1. John Crossland 2016/05/16 at 18:12 #

    Interesting post Mark, could content engineering in technical communication be achieved through communication component reuse with copyleft licensing? Working with hardware engineers I learnt how few hardware designs were original work and how many common components were reused plus how much badge engineering happens. Similarly working with software engineers on open source code how much gets reused and integrated. But most organizations have copyright licensing for product information and documentation.

    • Mark Baker 2016/05/17 at 09:57 #

      Thanks for the comment, John.

      When it comes to reuse, we need to make a distinction. The act of reuse can clearly be engineered. But can the creation of content through reuse be engineered. Most reuse today is ad hoc, not engineered. Engineered reuse would imply that the writer was not going out and finding content and including it by hand, but that it was being done algorithmically.

      But at best, this would only represent a very small part of engineered content.

      I don’t think the hardware and software models of reuse really fit. Hardware and software are both constructed out of pieces that perform common tasks. Reuse in these fields simply pulls in existing components to perform these tasks. The unique product at the top of the pile is built out of common functions.

      It is the same in content. The unique story being told is built out of references to common stories. But those references are executed in the reader’s head, not on the writer’s page. We do sometimes refer to other sources for those stories, but we can do that better by linking than by reuse, because most readers only need to follow some of those links.

      This limits the applicability of reuse to certain industrial applications. The replication of patterns, rather than the reuse of content, is the more broadly applicable content engineering patten.

  2. John Crossland 2016/05/16 at 19:23 #

    Here’s a link to the body of knowledge from TechCommNZ members: http://tcbok.info/

    Getting new graduates in NZ is a challenge with one of the best regional courses on hold. The Christchurch Polytechnic Graduate Diploma in Technical Communication and Information Design courses provided local industry with interns and graduates. The course may restart, but NZ firms were not hiring enough of the course graduates to keep it running last year: https://www.tcanz.org.nz/Story?Action=View&Story_id=73

    I was lucky getting my training and internship in the UK with many US firms operating from the UK as a gateway to the EU, so there was a lot of demand for UK-produced content for EU quotas allowing US firms to sell their products and services into the EU.

    Good regional professional bodies outside of the STC, such as ISTC in the UK & TechCommNZ in NZ have been useful resources for advancing practice & training.

    • Mark Baker 2016/05/17 at 10:03 #

      Thanks for this comment as well, John.

      I think we have to be very clear about what the body of knowledge of a profession is. It is not a record of all that is though and said by practitioners, or accounts of typical roles and job responsibilities. These may be valuable resources, but they are not a body of knowledge. The medical professions body of knowledge, for instance is a very specific and highly proven collection of information on anatomy, body chemistry, treatments, drugs, etc, all proven with a high degree of rigor, and all essential for professional practice. It is not the collecting that makes a body of knowledge; it is the proving.

      This is not to invalidate any of the resources you mention. It is just to point out that they are a different class of thing from a professional body of knowledge.

  3. Barry Schaeffer 2016/05/16 at 21:12 #

    Good analysis Mark

    I would suggest, as a model for the development of content professionals, the experience of the library world with cataloging. Back in the day (1960s and 70s for me) books and periodicals had to be catalogued before they could included in card catalogs and printed book catalogues.

    I worked with the folks who did this content analysis and cataloging during a stint with Xerox Education Division’s Library Services Group, and found that while many catalogers had MS in Library Science degrees, many did not and some of the non-degreed catalogers were quite good.

    While the setting, media and technology have changed, the goals have not and we can, I believe, learn much from the cataloging world, some practitioners of which are still around.

    I also want to emphatically agree with Mark that just anyone cannot and should not be trusted with content cataloging. It is a skill that must be taught and letting everyone who sits down at a computer decide how to catalog content is a recipe for chaos.

    • Mark Baker 2016/05/17 at 10:12 #

      Thanks for the comment, Barry.

      That may be a good model to follow, but it is a model for content managers, not content creators. There is a temptation in the tool and consultant community to equate “content professionals” with “content managers”, to look at librarians rather than writers. We do need to remember that these are very different functions that serve very different purposes.

      As to not letting just anyone catalogue content, I can’t agree. If you read David Weinberger’s Everything is Miscellaneous, he cites a number of cases in which collective citizen cataloguing through tagging outperforms professional organization.

      It is easy to miss that content retrieval is a big data problem with a multifaceted solution. Older cataloguing methods were developed for a paper world where the power to collect, store, and manipulate usage and subject matter data from divergent sources was lacking.

      • Barry Schaeffer 2016/05/17 at 10:49 #

        Point well taken on the difference between content creators and managers. I would suggest, however, that cataloging as I understand it is part of authoring not of library management although in practice it forms the bridge between the two

        Catalogers must understand how an author organizes his or her content in order to decide how and to what level of detail it should be cited for location. Once that is done, the librarian is responsible for organizing the vehicle by which users actually search and locate the content they want.

        As for the differences between paper and big data retrieval; while they are real, the techniques appropriate for each are not that different in concept, only in quantity. I once managed online retrieval cataloging for NASA STIF’s RECON system (using 360/65s if you can believe that) and although all location and use of content was electronic, the basic process of deciding how to find it was similar to library cataloging.

        • Mark Baker 2016/07/29 at 13:39 #

          Barry, I have to disagree. Big Data retrieval is very different in concept from anything based on cataloguing. involves abstracting out a tiny percentage of the content and casting it into universalized terms. Readers than search that minimal and abstract content set, hoping that their understanding of the universalized terms matches that of the cataloguer, and that the cataloguer thought the same things worth cataloguing as they do. The problem with that is not merely that it is difficult to do, but that it is inherently error prone and uncertain. Big data retrieval refuses reduce or to abstract the content. It does not prejudge what is relevant of how relevant things should be named. It takes in the whole, and it compares it with every other whole, and every searcher with every other searcher. It is holistic and probabilistic as opposed to selective and deterministic. The two models could not be more different.

  4. Alex Knappe 2016/05/17 at 10:31 #

    Hi Mark,
    essentially the tekom Europe is doing such a certification with quite some success already:
    http://competences.technical-communication.org/profiling-tool.html

    As you can see (if you visit the URL), there are a lot of fields of expertise for a professional. But there’s no expertise in writing “good” documentation requested. And there shouldn’t be.

    As an engineer you aren’t taught, how to construct “good” structures. As a software developer you aren’t taught, how to code “good” software.
    All you’re being taught is the basic background (physics, mechanics, mathematics, writing etc.) and some best practices.

    If you want good results on any construction, code, documentation or what else, you will need talent, experience and brainpower.

    I personally draw the line between knowledge of the basics of the profession and the skill to actually use those basics to reach good results. The first part can be measured, put into numbers and certified. The second part is the one that only time and feedback tells.

  5. Sandhya 2016/07/29 at 12:46 #

    Hi Mark

    Here’s a query that I had. For content engineered in a Sharepoint environment, is there any way that the content quality can be automated through plug ins which can do minimal edits in the article? Are there any recommendations for this? This mosty has to do with Knowledge Base articles

    • Mark Baker 2016/07/29 at 13:28 #

      Thanks for the comment, Sandhya

      I’m not a sharepoint expert so I don’t know the answer to that. Someone else might, but this is probably not the best place to ask that question. I’d suggest looking for a Sharepoint forum.