How do CAPTCHAs test our human identity by making our language unrecognizable?

Wherever we go on the internet, we encounter CAPTCHAs, those twisted words that block or enable entries on websites. Need to post an ad on Craigslist? There’s a CAPTCHA. Want to comment on an article or blog post? There’s a CAPTCHA. So why do we have them? They were invented to block spamming machines from posting wherever they want. In order to keep out spammers, a CAPTCHA has to effectively test if you are human or machine. Computer scientists figured out that one of the easiest ways to do that is to use images of language. In order to deceive spammers, the images of language take randomly generated text and manipulate the image, so that a human can barely read it, but a computer trying to take a picture of it cannot. (You’ve wondered about CAPTCHAs, but you probably have also been curious what the i in “iPhone” stands for. The complex and interesting answer lies here.)

Even though we read words on the internet, the internet and computers are not made of words. In fact, computers often have a hard time understanding languages because they do not conform to the hard and fast rules that computer programs demand. That’s why coding languages have to be invented: human languages are too irregular. This is also one of the reasons it is so hard to create an intelligent robot. So, CAPTCHAs take advantage of the uniquely human ability to see letters that have been stretched or manipulated and still be able to decipher which letters they are.

The term CAPTCHA was first used by computer scientists at Carnegie Mellon University in 2000. CAPTCHA is actually an acronym that stands for Completely Automated Public Turing test to tell Computers and Humans Apart. That’s a pretty straight-forward title, except for the Turing test part. What exactly is a Turing test? Alan Turing was a computer theorist who invented the Turing test which humans use to see if a machine can converse like a human being. A CAPTCHA is actually an inverted Turing test whereby a machine tests to see if you are human or not, but the core principle remains.

You may wonder why CAPTCHAs don’t use images of things other than letters, like a beach or a dog, but images are harder to have an exact answer for. A picture of a beach could generate a wide variety of responses–sea, sand, sunny, ocean, and so on–but a CAPTCHA that uses letters is paired to a particular answer. Letters, unlike images, are able to be deciphered by the human eye and programmed precisely by whoever creates the CAPTCHA.
Do you think computers will ever be able to decipher CAPTCHAs?


  27. Demosthenes -  May 20, 2012 - 12:16 pm

    Demosthenes -  May 20, 2012 - 12:16 pm

CAPTCHAs are good. It doesn't matter if some people can't get in, as long as they keep a bigger percentage of computers out. Computers will eventually learn to decipher the CAPTCHAs, as they're not that hard. They're designed so that any human can understand them. And as robots and computers are getting closer and closer to human intelligence every day, they'll soon learn how to bypass them. Then they'll become harder. But this is one of humanity's only advantage over machines.

    Cris -  October 13, 2011 - 7:16 pm

The reason they are so hard for us is because some computer programs can convert images to text they can understand. There are those business card readers than scan business cards and convert the image into a text format it can decipher and plug into the program. The CAPTCHA's have to be difficult for that reason. Although yeah it is very annoying when they are impossible to read

    HU_Soldier -  October 13, 2011 - 2:51 pm

Well, CAPTCHAS are what prevents your favorite forum/place from being spammed. Although, I think stetching them out more instead of mashing the words SO CLOSE that there hard for a human to read seems unessecary.

  60. GoldFish -  October 13, 2011 - 2:09 pm

    I have an alternative theory here, maybe it’ll lead to bloodshed. You know Google Books, right? It is my belief that those bits and pieces that Google’s own OCR fails upon are, after a little added sugar’n'spice (i. e. added lines/transforms/whatnot,) thrown at us in form of captchas. The first few entries are automatically accepted (combined with previous, mostly-known entries, so that there are still entries that are required to match); this way, the unreadable words of certain books are read by _you_ instead of the OCR software. It’s probably a cost-effective solution, as long as there are books to digitise. Or something similar, not sure if it’s Google (though I wouldn’t find it completely unbelievable) but this way, the resources used for captchas are actually having some added benefit. Just my tuppence.

    DragonDefender -  October 13, 2011 - 1:50 am

I see many people on here saying whether or not they think CAPTCHAs are easy or hard to read. I see others berating the people saying they are hard by responding that we are 'blind' or 'undereducated.' Written language is already difficult for people with dyslexia to decipher and the CAPTCHAs makes it all the harder no matter how well educated you may be.
    Have a nice day,
    Dyslectics untie

    Some Kid -  October 12, 2011 - 6:26 pm

I don't understand why people are complaining about how hard it is to decipher CAPTCHAs. I've come across tons of them and never had a complaint. Sure, some might have been tricky, but there was usually a "reCAPTCHA" option, which brought me an easier one with no hassle. And you could always listen to the audio version if you were that illiterate.

    Not pictures but rebi (rebus-pl.) should be more challenging than warped-letters, for a computer, because you interpret the picture for its things and its fun sense, yet the human can’t go wrong– because the final spelling is one definite rebus…

    Another trick they play -or can – is timing, which for authentication can be so slow that even if it were a machine, it’d be a very very slow, hack….

    OLH064 -  October 12, 2011 - 5:24 pm

Soon, we'll have audio CAPTCHAs, since programmers are struggling with audio recognition. Maybe it would play a complex song and ask what instruments were used… Maybe even use the data like RECAPTCHA and help the audio recognition software development.

    Why -  October 12, 2011 - 5:13 pm

I don't see why the black lines have to be over the letters. Even if they were below them or above wouldn't that mess up any program trying to find where the image goes from black to white and trying to "read" the letters that way? And can anyone give examples of some algorithms that are used to try and get around captchas?

    Anything easy for a human is hard for a computer and everything easy for a computer is hard for a human for the most part. For example, walking, or instantaniously solving complex arithmatic.

  117. Sparta -  October 12, 2011 - 1:24 pm

    Matt -  October 12, 2011 - 11:08 am

I consider myself to be well educated and even I find some CAPTCHAS to be difficult. Sometimes CAPTCHAS even have the oddest symbols (like greek letters) that I can't figure out how to type. (How do you type theta on a keyboard?!) At times words and letters are so obscued and warped that anyone could find them difficult to read, but this does not mean Malik or anyone who finds them a little unrecognizable to be uneducated.

    The images of objects idea is actually much better than the images of letters and numbers implementation. To handle the problem of ambiguity, simply add a short subtext with the starting letter.


    [image of a beach]
    subtext: First letter is “B”

    [image of a dolphin]
    subtext: First letter is “D”

    [image of a mouse]
    subtext: First letter is “M”

    [image of a chair]
    subtext: First letter is “C”

    This is much better than automated CAPTCHA, since the process of creating examples that are easy for humans to recognize is itself a human endeavor that is easy for humans to do but virtually impossible for machines no matter how advanced they may be.
    This is because while a machine may be able to assign words to images it cannot know if the result would be unambiguous, since there are examples that would be ambiguous even with a starting letter such as “fire” and “flame”. A human would be able to recognize potential ambiguity errors and avoid using such examples.

    Hyudra -  October 12, 2011 - 9:56 am

I think the reason captchas can be so bizarre sometimes is the way people collect/create them. At least one of the major Captcha developers is scanning texts into a digital database, and pieces of those texts (say, two random words) are made into a captcha. Which is why you see untypable foreign words or a lot of dates (ie. 1800 Москва́) The benefit of this system is that it lets the people doing the scanning double check their work, and it provides an endless database of Captchas.

Believe it or not, the people who create spam are in a multi-billion

    Believe it or not, the people who create spam are in a multi-billion-dollar field. It’s in their interest to find programs to get around captchas, and the quality of their work increases every day. To get around this, sometimes the Captcha developers have to find a new angle (another way of obscuring letters that confounds the latest program) or they simply have to make it harder to read. When this goes too far, people can’t figure out what it says.

    Some captchas are used to decipher a massive library of scanned text. Someone has scanned thousands of books, and when the computer runs into text it can’t decipher, the words end up being displayed as captchas.
    When you enter the text, it submits your entry as a possible solution. I don’t know a lot more about it, but thanks for helping with the project- Didn’t know you were on the payroll did ya? We’ll get your check right out to you!

  146. Rachel -  October 12, 2011 - 8:43 am

    “Completely Automated Public Turing test to tell Computers and Humans Apart” – shouldn’t that be CAPTTTTCAHA, or at least CAPTTTCHA?

  147. Chap -  October 12, 2011 - 8:30 am

    Isn’t that something to know and learn that CAPTCHAs are for us human beings to test our ability recognizing and proving that what we are not—machines.
    Fancy, machine testing humans! Then again why not! It is human’s creations. Should we like or be scared as our life and living be dictated (in this way, at least) by them!
    Then yet again raises this, who is machine!–Man or machine!

    Some computers can already beat CAPTCHA. A few slipped through the one I used to have. There are other, arguably less frustrating ways to trick bots. Math problems, simple tasks or questions. One of my favorites: Are you human? () Yes. () No. OR Uncheck this box if you are human.

    Because of the Spammers. If the Spammers went away, so would the Capatchas. But you know they won’t. You might want to check out “Recaptcha” who are using Captchas to help digitalize old newspapers, etc.

  159. Kahlon -  October 12, 2011 - 7:25 am

    A fascinating extension of CAPTCHA technology is reCAPTCHA, designed to “stop spam and read books.”

    Many ongoing archival projects are scanning old books, newspapers and transcripts. But the OCR software digitizing the text may not recognize some words because they appear smudged, blurred or faded.

    So reCAPTCHA gets web users to decipher these unrecognizable words. Thus not only do we prove we’re human (to post ads or add comments), but we also add to the world’s knowledge base.

    See more here: http://www.google.com/recaptcha/learnmore

    Of course computers will someday be able to decipher CAPTCHAs. I’m a programmer, and while I may not have experience in the area of text recognition, I keenly appreciate the advancements we have yet to make. For example, the military is developing robots capable of pathfinding across rugged terrain in order to engage the enemy on their own. The mind is a machine, and there is no reason that with enough research we cannot create artifical intelligences with similarly advanced decision-making processes. Watson is a prime example of future possibilities.

    Most of the time when I fill out a CAPTCHA test I remark to myself how silly it is that computers can’t already solve these themselves. I take it as a sign of how limited our technological progress has been. We have much left to learn.

  191. Andy -  October 12, 2011 - 1:24 am

    Just about every image-based CAPTCHA technology out there has been defeated by spammers these days. Even Google’s own high tech tool that they offer freely is overcome very quickly.

    For instance, the popular forum software phpBB ships with a number of spam-deterring technologies to stop spammers signing up and spamming the forums. However, phpBB’s own advice is that all except one have been defeated and that the image-based CAPTCHAs should never be used.

    The solution is a (very machine-readable) challenge and response mechanism, whereby a plaintext question is presented to the user for which a human being can work out the answer very easily, but a spam-bot would find almost impossible. By referring to other things on the web page, or playing to the strengths of the audience, the amount of possibilities for a spam bot to try to work out grows exponentially and thus the spambot is deterred reliably. For example, a gaming forum might present the question: “A famous games console is pictured in the top right corner of this page. What colour is it?” or “Which console natively supports playback of Blu-ray media?” – and you can even allow a couple of case insenstive answers… as long as the human has a good chance of getting it right.

    Seriously though Captcha is a solution to the problem. A terrible solution that everyone hates. Why not just have your first 25 posts or so require mod verification before they are posted on the board. Cause most spammers usually have less then 25 for a post count, that’d fix that.

    The spammers would be banned, the good posts go through and once you hit 25 or so posts you don’t need the GM verification anymore and all is happy without the sad captcha tripe.
    Because the all volunteer staff here have better things to do than wade through 100′s to 1000′s of moderated posts a day. Moderated posts also take away from the conversation as it delays the post from showing up.

    Overall that’s a far more disruptive and unmanageable solution than the 3 seconds it takes someone to type in a captcha. Especially since you need far less posts to avoid a captcha than 25.

    [Generated message v2.7.905 in {http://dictionary.reference.com/} - dated 2011.10.11 at 10:17 PM] I can read CAPTCHAs. Programming works through Microsoft Office 2010 font matching and Photoshop background contrast differentiation.

  204. Jack -  October 11, 2011 - 10:02 pm

    I think computers will some day be able to decipher them. People are making machines for that right now.
    @Malik haha :) Same here.

    This is really interesting for people who are and aren’t into computer sciences and programming. I do agree Malik, sometimes the letters are hard to decipher. At least there’s a refresh button to press several times before getting an easy one.

  228. Sakura -  October 11, 2011 - 7:19 pm

    It’s lots of fun when you can’t tell if it’s an “r” and an “n,” or just an “m.”

    Just another way that Humans are special! (And the acronym clears things up nicely!)

    Do you think computers will ever be able to decipher CAPTCHAs?

    Yes, by process of OCR…

  251. James -  October 11, 2011 - 4:59 pm

    CAPTCHA’s were also used in pares to help decipher and digitize old manuscripts in the effort to get all books ever printed on line. One of them the computer had the answer for, the other was a word that had been scanned from an old text and which the computer could not decipher. It was assumed that if you got one of the words correct you got the other one right as well and your answer was used to replace the scanned word.

  264. Yosh -  October 11, 2011 - 4:27 pm

    Most of the time, if you get legible word that is kind of crossed out, you can just type that one word.

    Anything with greek letters or numbers or anything that doesn’t make up a set of letters doesn’t have to be typed in.

    Same goes for legible words whose letters are white printed on a black splotches.

  271. Danny -  October 11, 2011 - 3:49 pm

    I wonder how mathematical forumlas and etc. end up in the captchas?
    There is no way to filter that out in the programming? Not all of us are set up typing in descrete math or physcis problems!

    Also thought some folks might find the whole culture of Inglip and the CAPTCHAs interesting. As a pun lover I enjoy it!
    The inglipnomicon

  278. Irene -  October 11, 2011 - 3:28 pm

    Hey, why do captchas have to be so hard? Half the time I can’t figure out what it says!


