Time to Kill CAPTCHA?

felgall · August 5, 2013, 9:51pm

What do you think of the current campaign to kill off CAPTCHA? Here are my thoughts:

The not so obvious answer to that question is a very definite NO. Without CAPTCHA there would be no more web because the spambots would destroy it.

The reason why anyone would be considering the killing off of CAPTCHA is that they are confusing the idea of attempting to distinguish real people from spambots with one or two particular types of CAPTCHA. What they probably mean when they are referring to CAPTCHA is those obnoxious hard to read images that many sites have been using for several years now as their way of attempting to use a Completely Automated Public Turing test to tell Computers and Humans Apart. Now that particular type of CAPTCHA is close to if not already past its use by date. CAPTCHAs like that are now so common that at least some spambots are able to do a better job of interpreting the content of the image than real people are.

The time for using such obtrusive CAPTCHAs is past as in order to make those images unreadable to spambots they make them unreadable to too big a group of real people as well. If that were the only type of CAPTCHA that exists then yes it would be time to kill CAPTCHA and let the web die with it.

Fortunately that is not the only type of CAPTCHA and a few dozen of the millions of alternative CAPTCHAs are gradually replacing those annoying in your face obtrusive ones that block many real people from access. By using a different sort of CAPTCHA that is easier for real people to use and which currently fools the spambots sites are able to stay online and useful to those real people instead of being swamped by spambots and having to shut down.

One issue that all CAPTCHAs have is accessibility and that no matter which one you use you are going to discriminate against a small number of real people in order to block most of the spambots. Until a perfect CAPTCHA that can distinguish 100% accurately between responses from real people and responses from computers there will always be some that get put in the wrong group. In fact as computers get more powerful this task will become harder rather than easier and a perfect CAPTCHA will be less and less likely.

Any CAPTCHA that you choose is going to be a tradeoff between the amount of spam that floods the site and the number of people that the CAPTCHA incorrectly blocks. Unfortunately the choice is between blocking a small group of people from accessing the site and taking the site down completely. Removing all CAPTCHAs so as to allow everyone easy access usually means that the site will be so flooded with spam as to be unuseable by anyone. Until such time as your site is discovered by the spambots there is no reason to implement a CAPTCHA.

Now there are several basic ways in which we can attempt to distinguish between real people and computers. Those horrible image CAPTCHAs are of the visual type where the distinction is made based on the difference between people and computers to distinguish the content of something visual. Since these CAPTCHAs rely on the real person being able to see and the computer not having powerful enough OCR it discriminates against anyone who can’t see or who for whatever reason cannot see images in their browser. To get around this some sites also incorporate a corresponding audio CAPTCHA that tells those who can’t see the image what the image contains - hopefully in a way that the computers can’t use to solve the CAPTCHA. One alternate visual CAPTCHA that some sites have switched to using is one where a number of images are displayed and the person is asked to click on the image that contains a particular object. Now this is an easier CAPTCHA for people who can see to solve while being far harder currently for computers to solve. Unfortunately this type of CAPTCHA has no audio equivalent and so it potentially discriminates even more against people who can’t see the images. Unless sites implementing that type of CAPTCHA also implement some other type of CAPTCHA for those real people who cannot use the visual CAPTCHA then that group of real people are blocked from access even more effectively than the computers are.

A completely different type of CAPTCHA that some sites now implement is where the person has to fill in the answer to a question. A really common one uses simple arithmatic where a calculation is displayed and the person is asked to supply the answer. Foe example “What is 1 + two?”. This type of CAPTCHA will work even for those who can’t see because the CAPTCHA uses plain text that their web reader can read. This type of CAPTCHA relies on people being able to interpret the meaning of the question and to then be able to provide the answer. Computers are currently not as good at that interpretation and so CAPTCHAs like this currently work. As this type of CAPTCHA becomes more common the spambots will be updated to make it easier for them to work out the answer to the question as well and then this type of CAPTCHA will die out. Another variant of this which would be slightly harder for the computers to solve would be to ask more general questions such as “What colour is the sky?”. Unfortunately there is another group of real people who are as unable to solve these types of CAPTCHA and so you are still blocking some real people as well as the computers.

The best types of CAPTCHA are those where the real people will almost always be completely unaware that the CAPTCHA is even there. When the forum on my JavaScript tutorial site started receiving hundreds of spambot signups every day I implemented a JavaScript CAPTCHA on the signup page. This type of CAPTCHA basically discriminates against anyone who does not have JavaScript enabled by blocking them from being able to sign up. Since spambots rarely try to run the JavaScript on the web page they form the largest part of the group that are blocked due to not having run the JavaScript. In this particular case since the site is actually about JavaScript it is reasonable to expect that anyone using the site should have JavaScript enabled. I would not use this type of CAPTCHA on a site on some other topic as there I would not want to block those who have disabled JavaScript. An even better unobtrusive CAPTCHA that I am considering implementing distinguishes based on the amount of time that it takes to fill out the form. Computers are much faster than real people and so the amount of time between their first loading the page containing the form and the form being submitted back to the server will generally be much less than that for a real person who will need to type content into at least some form fields even if their browser fills out others for them automatically. At the very least they will need to take the time to read the form and check that their browser has autofilled all the fields with correct information before they submit. Unless the spambot has a deliberate delay built in so that it waits a while before submitting the form, the computer submitted form should arrive back at the server much sooner than one filled out by a real person. This time difference can be used as an unobtrusive CAPTCHA. Any forms that take more than a selected amount of time can be automatically accepted as having been filled out by a real person. Those returned more quickly can be rejected. This particular unobtrusive CAPTCHA has the additional benefit that were it gets it wrong and rejects a form submitted by a real person it doesn’t need to be the end of their ability to submit the form. The rejected form can be redisplayed perhaps now incorporating some other type of CAPTCHA that is more obtrusive. The end result would be that only those people who fill out the form so quickly that they are suspected of being a bot will need to deal with any of the other types of CAPTCHA mentioned above. Those with the types of disability that mean they get grouped with the computers for those types of CAPTCHA are the groups least likely to fill out the form too quickly and so this combination of different types of CAPTCHAs keeps those real people misidentified as computers to a minimum.

By combining a time CAPTCHA with a JavaScript one you could set things up so that only those who both have JavaScript disabled and who can fill out and submit the form as fast as a computer can will be misidentified as computers. The two groups of people who would be misidentified by either of these CAPTCHAs would be very small and no one else would need to know that the form incorporated not one but two unobtrusive CAPTCHAs.

Using CAPTCHA will become more and more important in the future as more sites are attacked by spambots and those bots become smarter at solving the more commonly used CAPTCHAs. The types of CAPTCHA that people use will change and hopefully become less obtrusive as further ideas on how to distinguish between people and computers are developed. It is time for those horrible image CAPTCHAs to die but only by replacing them with some other form of CAPTCHA that is at least as effective at blocking computers while hopefully minimising the number of real people who get blocked as well.

Stevie_D · August 5, 2013, 10:48pm

I completely agree that some kind of automated test is needed to prevent automatic registrations, and you’re right that we need to remember that Captcha != transcribing mutilated characters on a fuzzy background.

Another one that you haven’t mentioned is the honeypot field. Like the timestamp, this is an unobtrusive method whereby you include a hidden field that should not be filled in. Any submission that has this field filled in can reliably be assumed to be a bot, because actual people wouldn’t have encountered the field.

Possibly the biggest threat to Captcha security at the moment are the Captcha sweatshops. Actual people work on behalf of spammers, usually in less developed countries, just spend their time solving Captchas for a few cents apiece. This costs the spammers practically nothing, and enables them to bypass pretty much whatever system you have put in place.

force · August 6, 2013, 2:58am

If there was a good alternative to CAPTCHA, I’d be all for moving to something else.

Unfortunately, proposed alternatives like simple math or logic puzzles would still fall prey to farming like CAPTCHA is–and possibly still act as a barrier to entry for legitimate users.

felgall · August 6, 2013, 9:59pm

What type of solution to blocking spambots would do you think would work that doesn’t attempt to distinguish between real people and computers?

Those alternatives that you say have problems are called CAPTCHAs and if you read what I wrote in my original post you will see why all CAPTCHAs will have problems making the distinction with it getting harder and harder over time to distinguish.

The brest of the current CAPTCHAs use multiple tests where most are completely invisible to the person using the form. Only where those approaches have failed to confirm that it is a real person would they be actually presented with something that asks them to do something that persumably they can do and a computer can’t.

For example:
CAPTCHA 1: the time between displaying the form and submitting the form is long enough for a real person to have read what each field is and to have entered their responses. A form submitted quicker than this is less likely to be a person and more likely to be a bot.
CAPTCHA 2. A JavaScript to insert a value into a hidden field runs. The value inserted is dependent on the available values for screen size and number of colours. Where the value sent in this hidden field is one you’d expect when a web browser is being used then it is more likely to be a real person and less likely to be a bot.
CAPTCHA 3: A field is included in the form and hidden using CSS so that it should not normally be seen in order to enter anything in it. If a value is entered in this field then it is more likely to be a bot and less likely to be a real person.

None of these tests will positively distinguish between a real person and a computer but if you initially accept the form as being from a real person if any of these tests indicate it is likely to be a real person then you’d only need to use a more obtrusive CAPTCHA if you want to give those where all three tests failed one last chance to demonstrate that they are not a computer.

Jeff_Mott · August 6, 2013, 10:56pm

Exactly. I think this will always be the achilles heel of any CAPTCHA, because if a computer genuinely can’t solve it, then they farm it out to humans.

felgall:

CAPTCHA 1: the time between displaying the form and submitting the form is long enough for a real person to have read what each field is and to have entered their responses. A form submitted quicker than this is less likely to be a person and more likely to be a bot.
CAPTCHA 2. A JavaScript to insert a value into a hidden field runs. The value inserted is dependent on the available values for screen size and number of colours. Where the value sent in this hidden field is one you’d expect when a web browser is being used then it is more likely to be a real person and less likely to be a bot.
CAPTCHA 3: A field is included in the form and hidden using CSS so that it should not normally be seen in order to enter anything in it. If a value is entered in this field then it is more likely to be a bot and less likely to be a real person.

The downside is that these are security by obscurity solutions. If a website became sufficiently popular that a spammer would check how it works, of if this CAPTCHA technique were to become widely used and known, then it could be easily defeated.

felgall · August 7, 2013, 2:06am

Also it isn’t security by obscurity since the purpose of a CAPTCHA is not security but attempting to determine whether the form is being filled out by a person or by a computer. Having the copmputer manage to get past the CAPTCHA is not going to compromise security since any value that would be rejected if entered by a person will also be rejected if entered by a computer.

That’s true of any popular type of CAPTCHA though. Keeping computers out is an ongoing process of implementing replacement CAPTCHAs each time the one currently used starts letting too many bots through.

Stomme_poes · August 7, 2013, 1:04pm

CAPTCHAs are stupid. They don’t even do what we want.

We want LEGITIMATE users, and do not want SPAM. None of those things has much to do with whether the LEGITIMATE user is a human, a computer, or a dog.

Unless you are absolutely certain that you only want your site to be requested/used/posted to by an actual human being, testing for whether they sweat or scream or cry or walk on two legs is useless. BTW, spammers can do all those things, as well as of course their mechanical Turks.

Meanwhile, WGET/CURL, Selenium, phantom.js, screen readers, mail clients and someone with the mental capability of a child may all be considered legitimate users. Why waste our time deciding if they are human when we don’t care?

If you have a spam problem, and take your frustration out on your users, close down your web site and go do something else. Or, learn to deal with your spam problem in a way that does not make It your users’ problem. it’s not their fault, so don’t charge them a mental fee.

ralphm · August 7, 2013, 1:10pm

Human spammers don’t post nearly as much spam as robots, so I’m happy with a solution that keeps the bots out while not hampering legitimate posters. I can live with a few idiots telling me each day that my site could be ranking better in search engines. It’s the hundreds of bot posts I was getting each day that really pained me.

PicnicTutorials · August 7, 2013, 3:22pm

Capchtas aren’t going anywhere soon. A simple addition math question that rotates numbers 1 through 5 is the best in opinion. The crazy images are usually over kill.

Jeff_Mott · August 7, 2013, 3:26pm

That’s actually a fair point. Sometimes bot programs can provide legitimate and useful content. I see this most often on Reddit. Some bots will automatically transcribe meme images into plain text. Or some bots will take a screenshot of a website for when the site inevitably becomes unavailable due to Reddit users trying to visit all at once.

truegether · August 7, 2013, 10:06pm

Is there a reason you guys don’t like the honeypot technique that Mouse Catcher pointed out? Unless you run a really popular site with hundreds of thousands of members, I don’t think the spambotters would take the effort to work around this.

Jeff_Mott · August 7, 2013, 10:20pm

It seems to me that to say a solution works only because a site isn’t popular enough to warrant a spammer’s attention doesn’t sound like much of a solution at all. Ideally we’d like to have a truly secure solution – one that for any size site, and even assuming the spammer knows your system, that they would still find it prohibitively difficult.

truegether · August 7, 2013, 10:31pm

Hi Jeff,

There is always a tradeoff between usability and security. You can build the most secure site on the planet but if no one uses it I think it is pointless. The honeypot technique is an acceptable middleground for most websites.

Thanks

force · August 8, 2013, 5:04am

Security vs convenience is typically the argument I hear (which is often applied to security in general, and not just human verification test).

2-factor or multi-factor authentication seems to be the next security authentication method trend on the way, since hackers seem to be easily cracking standard username & password authentication methods.

Smarties83 · August 8, 2013, 7:14am

I don’t think it’s the right time to kill CAPTCHA, because it is still useful. It may not filter all the bots, but at least it will discourage lazy human bots to log in.

ralphm · August 8, 2013, 8:30am

It also discourages your valued customers from logging in, too, and that’s the problem. It’s like trying to prevent shoplifting by making it really difficult for customers to enter the store. Just not viable.

Guava · August 8, 2013, 5:01pm

Two methods that I have used in the past that worked well:
1.
An input named “email” but with an actual text label that said “Please leave blank”
Bots would fill in an e-mail address looking at the <input name=email id=email> but humans would skip it. This field is not hidden. It is intentionally visible.
You could easily setup your system to rotate “email” with “firstname” “address” etc to make it more random.

Mark all of your fields with random names like “ui34nf9n3” and then do proper verification on the server side for the right types of values. Email addresses, names, addresses, zip codes, etc. That should filter out the auto filling in.

felgall · August 8, 2013, 10:06pm

How does a CAPTCHA that a real person doesn’t even know is there discourage valued customers from logging in.

Actually a login form is one of the more effective CAPTCHAs since you can apply blocks only when an incorrect password is entered. On those sites where I use a password CAPTCHA the user account is locked for 15 seconds each time an incorrect password is entered or an attempt to login while the account is locked. Since it is going to take at least that long for a person to realise that their login attempt failed and to try again they will be unaffected by the lock (just to make sure the page actually suggests waiting a while before trying again) while bots will generally keep trying different values in rapid succession all of which will be rejected because of the lock applied by the first incorrect guess and extended by the subsequent too fast retries.

Anywhere where having your visitors log in is reasonable, then using a password field as the CAPTCHA is the least obtrusive solution to keeping the bots out. That just leaves you with deciding on a CAPTCHA to prevent the bots getting an account of their own in the first place - using a double opt-in and deleting accounts where a response to the email is never received is probably the most effective current CAPTCHA to take care of that.

Stomme_poes · August 19, 2013, 2:48pm

Is there a reason you guys don’t like the honeypot technique that Mouse Catcher pointed out?

I like honeypots, and timers go nicely with them so long as they’re not stupid on both ends: that is, if I must hit refresh and submit within a second of each other, don’t also assume I’m a bot. That one hits me when someone’s site has some other timeout that forces me to refresh to submit. Dumb dumb.

I don’t care for 2-factor in something like submitting comments. I do like it with something like banking. Unfortunately mobile phones are slowly becoming the main method, which requires possession of a smart phone. Anyone who demands this of me had better go buy me one.

I saw a demo of Google’s 2-factor auth at the Perl conference, and I didn’t really like it much: it used UTC as a salt, which is never guaranteed to be the same between any two devices (unless you can sync them first… funny thing, time: it’ll differ in your altitude and geopolitically there’s all these little spots where crap doesn’t match up like it should…), and the devices doing your auth need to get the secret from the server (shared secret). Transmitting that secret had some problems for users, and had several places for intrusion. Meh.

My bank uses 2-factor auth (or 3? card, card reader and PIN) but instead of trying to get a shared secret onto a device of your choice, you instead get the device straight from the bank. They seem to have batteries inside that you can’t change yourself, so you can ask for multiples of these things and they’re interchangeable. They consist of a card reader and a random number generator. Dunno how random is random, but the display is limited in the number of characters it can show… 12 or 14 I think.

Ralph was referring to the post above him, who did not expressly mention something non-intrusive like honeypots.

BTW, login forms for anything a user considers throwaway is indeed a discouragement (though discouragement is perhaps exactly what you want, if for example you’re trying to discourage trolls and throwaways from posting comments somewhere). Fewer people post comments/do a free trial/whatever where they must first log in. Luke W has shown in data users are quicker to abandon “free software” tryouts if registration, login or other mentally-taxing chores are required first. When users believe it matters (like banking, or posting on a personal account like a social media platform), users consider logging in to be worthwhile, and are more likely to go through with it.

I only agree with the first part of that comment. Passwords don’t tell humans from machines. They authorise users, who at times may be machines. For example, our search engine logs into (our) things using a password (and a username). It is a legitimate user. It is not human. It is authorised, however.

This is why I can say without reservation that CAPTCHA MUST DIE. Complaints about bots are not complaints about non-humans, they’re complaints about non-legitimate users. That may sound like some whiney hair-splitting distinction, but it’s not. Not so long as anti-spam measures are so focussed on trying to think what “humans” can do that “bots” can’t. This seems to be the basis of every CAPTCHA out there. Even honeypots either use the presence of Javascript or the ability to read simple directions as authorisation, and most people tend to see humans as having JS enabled and being able to read simple directions and bots as not.

Every time those aren’t true, it fails.

So I’m practically cheering advances in robots reading and describing images, if only to stop the worst of CAPTCHAs, the image CAPTCHAs. As someone math-inhibited though, I do worry those would take over at that point.

It_Is_Easy · August 19, 2013, 3:12pm

Interesting read. I’ve been frustrated by those image CAPTCHAs for years - it has never been my strong side. I do however enoy the “mini games” CAPTCHAs like following a moving ball with your mouse pointer. Keep them around and eradicate the images!