Someone recently asked me to discuss companies like Amazon’s Mechanical Turk. If you’re just starting your company, it can be a great way to get some of the more tedious tasks done in a quick way.
Mechanical Turk is an on-demand platform for work, where you can submit jobs, and workers on these platforms will do your job for an extremely low cost. In many cases, we’re talking on the order of pennies. This labor-on-demand concept is a bit vague, but basically, you can submit any job that can be done on the web. Examples include: categorizing retail items to be “red” or “green” or asking people to do some quick web research and write down the results. Most gigs are mini, taking seconds or minutes to do. Workers who do these gigs come from all over the world, so your gig can be done while you’re asleep.
On our site Shiny Orb, we have an attribute-based search, where brides and bridesmaids can find wedding apparel based on dress criteria, such as length or neckline, etc.. This requires us to tag every dress with a length, a neckline, and a sleeve type. Short of building visual algorithms to attempt to scan each photo and return the correct attributes, we decided to try Mechanical Turk to categorize our dresses.
We quickly found that the biggest problem with this crowdsourcing concept is monitoring quality. How could we make sure that each worker categorized the length, neckline, and sleeves correctly? In fact, in glancing at the results, a few workers just randomly selected categories! So, we did a few experiments to try to improve quality.
Payment — how much does this matter?
In short, we’ve found that price makes no difference when you’re talking about super cheap, super fast gigs.
For Shiny Orb, we ran two price tests. We first paid $0.03 to get the length, neckline, and sleeves classifications for each dress. For the second test, we decided to offer $0.01 for all three. We found no difference in quality, and found that offering less payment actually makes it easier to deal with workers if you have to reject their work. (Mechanical Turk allows you to reject incorrect work, and when you do so, workers complain more if the compensation is higher.) The downside to offering less compensation is that fewer workers do your gigs, making it slower to receive results. Still, we had no problem getting all dresses categorized within half a day regardless of the price.
Clarity — how much does this matter?
From our tests, clarity affects quality more than anything else. By that I mean, we found significant improvement in results by clarifying the definitions for our categories and placing those definitions upfront and center.
In particular, in our first Turk test, one of the choices we had for neckline and sleeves was “Other,” which workers tended to select a lot. Our success rate of correct categorizations for that test was:
92% for length, 64% for neckline, and 64% for sleeves
In our second test, we made it very clear that “Other” basically shouldn’t be chosen, which increased our success rate in the neckline and sleeves categories to
90% for length, 86% for neckline, and 87% for sleeves
Lastly, we found that in order to get these fairly high quality numbers, we had to run the same gig with three workers. I.e. have three workers categorize each dress. We took the majority “vote” of the categories and found this to improve our quality significantly (as opposed to having just one worker do each gig). $3 for a good categorization of 100 dresses is great! Takeaways: run a gig 3 times, pay as little as possible, and be super clear in creating your gig.
For more tips and resources on starting a web business without coding, visit LaunchBit.