Crowdsourcing with Amazon’s Mechanical Turk

By Elizabeth

Someone recently asked me to discuss companies like Amazon’s Mechanical Turk.  If you’re just starting your company, it can be a great way to get some of the more tedious tasks done in a quick way.

Quick background:

Mechanical Turk is an on-demand platform for work, where you can submit jobs, and workers on these platforms will do your job for an extremely low cost.  In many cases, we’re talking on the order of pennies.  This labor-on-demand concept is a bit vague, but basically, you can submit any job that can be done on the web.  Examples include: categorizing retail items to be “red” or “green” or asking people to do some quick web research and write down the results.  Most gigs are mini, taking seconds or minutes to do.  Workers who do these gigs come from all over the world, so your gig can be done while you’re asleep.

Mechanical Turk has competitors such as CrowdFlower and CloudCrowd, though I’ve never tried either of the latter.

Sorting Dresses:

On our site Shiny Orb, we have an attribute-based search, where brides and bridesmaids can find wedding apparel based on dress criteria, such as length or neckline, etc..  This requires us to tag every dress with a length, a neckline, and a sleeve type.  Short of building visual algorithms to attempt to scan each photo and return the correct attributes, we decided to try Mechanical Turk to categorize our dresses.   

Quality

We quickly found that the biggest problem with this crowdsourcing concept is monitoring quality.  How could we make sure that each worker categorized the length, neckline, and sleeves correctly?  In fact, in glancing at the results, a few workers just randomly selected categories!  So, we did a few experiments to try to improve quality.

Payment — how much does this matter?

In short, we’ve found that price makes no difference when you’re talking about super cheap, super fast gigs.   

For Shiny Orb, we ran two price tests.  We first paid $0.03 to get the length, neckline, and sleeves classifications for each dress.  For the second test, we decided to offer $0.01 for all three.  We found no difference in quality, and found that offering less payment actually makes it easier to deal with workers if you have to reject their work.  (Mechanical Turk allows you to reject incorrect work, and when you do so, workers complain more if the compensation is higher.)  The downside to offering less compensation is that fewer workers do your gigs, making it slower to receive results.  Still, we had no problem getting all dresses categorized within half a day regardless of the price.

Clarity — how much does this matter?

From our tests, clarity affects quality more than anything else.  By that I mean, we found significant improvement in results by clarifying the definitions for our categories and placing those definitions upfront and center.

In particular, in our first Turk test, one of the choices we had for neckline and sleeves was “Other,” which workers tended to select a lot.  Our success rate of correct categorizations for that test was:

92% for length, 64% for neckline, and 64% for sleeves

In our second test, we made it very clear that “Other” basically shouldn’t be chosen, which increased our success rate in the neckline and sleeves categories to

90% for length, 86% for neckline, and 87% for sleeves

Conclusion

Lastly, we found that in order to get these fairly high quality numbers, we had to run the same gig with three workers.  I.e. have three workers categorize each dress.  We took the majority “vote” of the categories and found this to improve our quality significantly (as opposed to having just one worker do each gig).  $3 for a good categorization of 100 dresses is great!  Takeaways: run a gig 3 times, pay as little as possible, and be super clear in creating your gig.

For more tips and resources on starting a web business without coding, visit LaunchBit. 

  1. Wendy says:

    This might be a different issues all together but I just want to throw it out there…We’re running into an issue with our site. If users select an item from seller #1 and another item from seller #2, and add both items to the shopping cart, the user needs to make two credit card transactions. It’s not possible for multiple sellers to receive a portion of a single credit card charge. The only way to get around this is to have the credit cards processed by our credit card secured system (which cost us 2.8% process fee, and we have to charge our seller 3% just to cover the cost, obviously Amazon, eBay have much better % rate with the credit card company due to their huge volumes of transactions) and then the funds are distributed to each seller by using Pal pay mass distribution function. It is such a painful way to do because, on daily basis we have to execute PayPal mass distribution and randomly check on it to make sure it is working properly.Originally we want to use Etsy.com model, each seller will deal with their own store/ merchandise and handle their own payment process in order to bypass/ avoid using our secured system and save some processing fee…its win win situation for both sides…since Pal pay does not charge any processing fee from account to account…or if they have their own credit card secured process account, they can absorbs their own processing fees.Our programmer is having hard time to figure out how to make it work without all these trouble (Etsy is able to do it, but according our programmer, they spend tons of time to develop their codes…and if we want to be lean start-up, we need to go with what we have for now and push our site live and work on the back end later to make improvement for 2.0 vision….Any suggestions? Feedbacks? My concern is to save the hustle, but most importantly to save seller money on the 3% process fee….

  2. Jennifer Chin, Elizabeth Yin says:

    @Wendy, wow, this sounds like a topic for a whole new post! Without knowing the details of what you are building, it sounds like this site isn’t live yet, and you haven’t yet validated the market? (correct me if I’m wrong) To keep the user experience simple, I completely agree — seems best to just have one payment system and pass the cost on to the seller. I don’t know what your business model is, but were you planning on charging the seller for selling on your site? If so, then it doesn’t seem like it would matter if you pass the cost on to the seller? If you were not planning on charging sellers, then maybe you can take a loss on the 3% fee in the beginning while you are proving out the concept? I don’t know much about your product, but from the little I know, it sounds like the #1 thing you’re trying to prove right now is whether people will come to your site and buy something — anything. And to get there, logistical details not related to the customer experience don’t matter right now. I would be happy to talk with you in more detail over email — feel free to drop me a note at hello@launchbit.com. cheers! :)

  3. Ed Kohler says:

    Nice write-up. Regarding Crowdflower: Crowdflower is a service that resides a level up from Mturk. They make it easier to run HITs, can automate the flow of HITs from RSS feeds, and other fun stuff. And, they have built out quality controls that may help you get the results you’re looking for with less duplication of work. In some cases, that may be valuable. Yes, you’ll pay a bit more for this, but it may be a better choice than coding more complex mturk interactions using the Mturk API.

  4. Jennifer Chin, Elizabeth Yin says:

    @Ed, thanks for your thoughts on Crowdflower. Since we’ve never used it ourselves, this is super helpful. Do you know how the costs compare to say the cost of running the same job on Mechanical Turk multiple times?

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>