Do Structured Writing and Crowdsourcing Mix?

By | 2011/11/20

Are structured writing and crowd-sourced content on divergent paths, or can you have both? It’s a pretty hot topic right now. Sarah O’Keefe recently  tweeted:

Must push XML and structure out to masses and need better tools for that @boses#lavacon

Linda Urban recently tweeted this:

Yep! RT @finiteattention: The problems of crowdsourced user assistance:

And most recently, Tom Johnson has blogged about Wiki Culture, Reader/Writer Distinctions, and Divergence from Structured Authoring. No surprise that this is a concern, since structured writing and wikis are two of the hottest trends in technical communication at the moment.

So here is the key question: can you get structured data from the crowd? The answer is an unequivocal yes. In fact, it is not only possible, it is common. So common, in fact, that you have been involved in it many times, and probably in the last week.

Before we go there, though, it is important to establish what we mean by structured writing. The term structured writing is used in several different ways. Each is legitimate in its own way, but it is important to distinguish them if we want a useful answer to the question of whether structured writing and crowd-sourcing can mix. Structured writing can simply mean writing to a consistent template. As I have argued recently, real topics (as opposed to shredded books) tend to naturally conform to a type. In this sense, every cookbook is an example of structured writing.

Structured writing also means creating content in a format that can be read and processed by computers. The most common means of structuring content in this sense is XML. In this sense, all XML is structured content. But that would mean that XHTML is structured content, since it is XML, and most people in the structured content business will tell you that HTML, whether prefaced by an X or not, is not structured.

So what does structured mean in the sense that excludes XHTML from the set of structured markup languages? It means a markup that is more specific to the content, markup that imposes limits and restrictions on the author, markup that tells you something about what the content means, not simply about how it should be displayed.

It is probably easiest to make the point by way of analogy. Consider Microsoft Excel: it allows you to create spreadsheets. If a ledger book is structured in the first sense — that the content is organized in a consistent way, Excel allows you to make it structured in the second sense, by marking up the rows of numbers in a form that can be read and processed by the computer.

But Excel is generic. It imposes almost no limits or restrictions on the data you can enter. You can do pretty much anything you want with Excel: do your taxes, catalog your record collection, keep score for your softball league. But you do have to do it yourself. If you wanted to use Excel to do your income taxes, for instance, you could set up a spreadsheet that looked like the tax form and did the calculations, but it would be a lot of work. For doing your taxes, you would be much better off buying a tax preparation program like TurboTax.

TurboTax is structured in the third sense: the structure tells you what the content means, that imposes limits and restrictions the data the user can enter. TurboTax is about taxes from the ground up. Every field in the program is pre-coded to do exactly one thing. It comes with a huge amount of validation capability as well. It can tell you if you have claimed a deduction you are not entitled to. It can also optimize deductions between spouses. It can do all of this because it is built to do one thing and one thing only: your taxes.

TurboTax is much more powerful than Excel for doing taxes. But it is also much more limited than Excel. You can’t use it to catalog your record collection or keep the scores for your softball league. This is the essential point about structure: structure means limits. The more structured something is, the more limits it has. Limits are good. The more limits you put on data, the more reliable that data becomes, and the more reliable data becomes, the more processes you can apply to it reliably.

Structure equals limits; limits equal reliability; reliability equals processability – that is all ye know on Earth, and all ye need to know.

In Canada, the Canada Revenue Agency (our version of the U.S. IRS and the UK Inland Revenue), certifies certain tax preparation packages for submitting tax forms using the E-File service. (I’m sure most other countries have something similar.) Even if you created an Excel spreadsheet to do your taxes, Revenue Canada would not let you submit it through E-File. The strict limits that TurboTax and other packages place on the data that is entered into them make them more reliable, to the point where Revenue Canada is willing to allow that data to be fed directly into their tax processing system.

Which is the point we have been working towards: how crowd-sourcing and structured data can mix. Because that is exactly what Revenue Canada, and probably every other first-world tax authority is doing: They are crowd-sourcing tax data. Millions of ordinary taxpayers around the world are entering tax data directly into government computer systems, speeding up tax processing and saving millions by avoiding the need to captured data from printed forms.

Governments are not alone in this. Banks crowd-source financial data from ATMs, point of sale terminals, and online banking systems. Amazon crowd-sources order entry. The airlines crowd-source flight booking and check-in. Filled out a form online lately? Welcome the the wonderful world of crowd-sourced structured data.

So, can crowd-sourcing and structured data mix? Absolutely. In fact, structured data is an absolute requirement for crowd-sourcing. Revenue Canada, your bank, Amazon, your airline, and all the other businesses that now have you do their data entry for them, are not going to accept an Excel spreadsheet of your taxes, your transactions, your order, or your check-in. Nor is Revenue Canada going to accept your taxes through, or your bank allow you to withdraw money through an airport check-in terminal. Each system is specific to its purpose. Each institution is only going to accept highly structured data: reliable data that is specifically structured and verified according to their exact specifications.

Excel may be ubiquitous, and generic enough to use for almost any purpose, with an appropriate amount of ingenuity and effort, but being generic and ubiquitous are not the keys to successful crowd-sourcing. Quite the opposite, successful crowd-sourcing of data requires highly precise and specific structure that ensure the reliability of the data.

How does this translate to the crowd-sourcing of tech pubs content? That is a subject for another post.


4 thoughts on “Do Structured Writing and Crowdsourcing Mix?

  1. Tom Johnson

    I like your point about the integration between crowd sourcing and structured content. You’re right that forms help structure content. The neat thing is that forms also simplify the authoring experience as well. I think I’ll look for more form-based methods as I work with community authors. One extension I want to implement on the wiki (Mediawiki) is something called Semantic Forms.

    1. Mark Baker Post author

      Hi Tom. Thanks for the comment.

      I agree, the great thing about forms is that the both provide structure and make authoring easier. I think that one of the great mistakes I see in the design of structured authoring systems is that the are designed to make publishing easier rather than to make authoring easier. The result is that most structured authoring systems are hard for authors to learn — which means they don’t get used, or don’t get used well.

      The proper way to design a structured authoring system is to figure our what information structures your need, and then figure out what will be the easiest and most intuitive way for authors to provide you with that information. Only then should you worry about how to transform that authored content into publishable form.

      That may make writing your publishing scripts harder, but that’s OK. You write the publishing scripts one. People write content every day. If people would approach structured writing as an exercise in data gathering rather than an exercise in automated publishing, things would work a lot more smoothly.

  2. Randy Burgess

    A problem I see is that people who are not technical writers, but producing content such as journal articles or books or edited collections within their profession (e.g. clinical psychology), see this sort of advice and think it might save them time ad money if they could just change their process for producing a jointly produced article or book to include the kind of crowdsourcing made possible by a wiki.

    But as observed in “How content producers get collaboration wrong,” on this same blog, sometimes a small team is better than crowdsourcing. E.g. Wikipedia can produce as many content guidelines as it wishes, but content there will always be uneven because each article is by necessity ad hoc in structure, vocabulary level, etc.

    This only occurs to me as an issue because I am investigating the boundary between collaborative writing/editing practices in technical writing versus the relative lack of same in other fields, e.g. the professions, media companies, etc.

    1. Mark Baker Post author

      Hi Randy,

      Thanks for the comment. There is definitely a tension between crowdsourcing and effective collaboration. Part of the point I was trying to make is that you can use structured writing techniques to constrain the crowdsourced material to improve collaboration. Could people in other professions use these techniques themselves (thus cutting out writers) or would they need writers (with a suitable background) to set this up for them. I honestly don’t know at this point. Certainly, I don’t know too many writers who are currently qualified to do this kind of thing.

      But are there challenges to the professional writer’s role in this and other fields, thanks to the way in which the Web changes how content is created and curated? Certainly there are — very serious challenges. More and more, people are turning to the web instead of to docs when they need product information or help. I know this, because this is what I do myself. Yes, the web is doing a job we used to do, or at least a part of it. Where does the leave us? I don’t know. One door closes, and another opens — though not always in perfect synchronization. One thing I am sure of, though, trying to defend our old turf is not going to work, and may just make us miss the opening of the next door.


Leave a Reply