Recaptcha
Friday, September 10th, 2010Anti-spamming software is an integral part of many websites, often taking the form of “type in the distorted text you see”. The assumption is that this is – hopefully – too difficult for a machine to do quickly and trivially, but easy for humans with our better pattern-understanding skills.
reCAPTCHA is an example of this which takes this a step further: the second word is used to confirm I am indeed human and not spamming, while the first word is text from a failed attempt to digitize historic documents: i.e., reCAPTCHA provides a (free!) way to get humans to help digitize old text from before the computer era. This is a great idea, but does throw up some interesting examples in practice:
It’s obviously rather harsh to expect someone to be able to enter (i) a mathematical forumla with superscript and subscripts and (ii) Greek text, particularly as the text box only accepts plain English with no formatting. More worryingly, I got thinking about whether other even more inappropriate (i.e., offensive) data might wing its way to the user. I followed this up, and got the following response:
“The facility does have filters in place to prevent offensive words coming up… some of the words in these texts are difficult for computers to process, we are using the results of your efforts to help decipher them.”
Which makes me wonder: if the texts are difficult to process, how can we truly be confident in any filters? And how should a tester go about raising a concern like this: is it a real defect? Or just a worry…?



