Registration Form and Spammers!

Last night I was doing a final walk-through of my “User Registration” form when it occurred to me that I didn’t have a CAPTCHA thingy at the bottom like SitePoint and a lot of other websites have?! :eek:

So I spent a couple hours reading up on things, and now I’m really stressed and confused?! :frowning:

(I’m really trying to wrap-up my website, and it seems like every other day I stumble across another issue like this which just adds to my frustrations of being behind schedule! On one hand I want a top-notch website. On the other hand, I cannot please everyone!!! And I gotta get this thing done!!)

It appears that A LOT of people really loathe CAPTCHA from a usability and accessibility standpoint…

And there seems to be a lot of differing views on how necessary something like CAPTCHA is, and how effective anything can be against the world of mutated spammers that we live in?!

So, I have a few questions for all you SitePoint gurus…

1.) If I didn’t have some guard against spammers on my “User Registration” form, would I be a dead duck??

2.) How evil is CAPTCHA as far as usability and accessibility?

3.) Is there something I could program myself that would be more accessible, and at least as effective as CAPTCHA?

4.) How effective are any of these alternative techniques…

  • Using a Hidden Form Field that is left blank
  • Using a Form Field (hidden via CSS) that must maintain some special value
  • Checking Time Needed to Submit Form
  • Using Logic Questions (e.g. “What color is grass?”)
  • Confirmation Pages

Obviously I want to prevent having a million bogus spam accounts created on my website. However I still want it to be user-friendly and not add another month of coding to my already full plate!!!

Sincerely,

Debbie

  1. Could be, if they start to also leave comments
  2. CAPTCHA has become smarter to usability and accessibility, but that doesn’t make it fool proof.
  3. I wouldn’t, there are reasons why CAPTCHA is trusted, it has been proven over the years
  4. Hidden Form fields that need to be left blank, work (pretty well), even form fields shown on the screen and tell the user: “Leave this blank!” seem to work well too.
    Here at sitepoint we are using a few techniques

Hope that helps.

Some follow up questions…

a.) When a person completes my “User Registration” form, they get a confirmation e-mail, and must click on an “Activation Link” in order for their registration to be completed.

Isn’t that enough to weed out Spammers?

b.) What is wrong with using logic questions like “What is 21 divided by 3?” or “What color is grass?”

c.) Can you tell me in a PM what is a reasonable “Registration Time” in case I use that approach?

d.) If I decide to just use plain ol’ CAPTCHA, how would I go about that?

And is it hard to implement?

Sincerely,

Debbie

a) No, they obviously figured out how to by pass that or to click the automated link in the email…

b) Nothing

c) Simply, fill out your own form, time yourself, do it 5 more times, take the average. That’s what I did for here.

d) No, I’ve implemented ReCAPTCHA on several sites, they usually include a PHP class you can implement to authenticate the answers and everything. It is quite painless.

Scary!!

Do I have to be afraid that something similar would happen if I programmed my own “logic questions”?

For example, if I had just one question hard-coded, like “What color is grass?” would that be okay, or are spammers and their bots smart enough to figure out it would be the same question and that the answer is “Green”?

I guess I’m not sure which battle is being fought with CAPTCHA and “logic questions”…

Is it simply making so the words cannot be read and so the spam bot fails?

Is it that spam bots can’t do logic problems?

Or is it necessary to have 1 million questions that rotate constantly, because spam bots are semi-intelligent?

I would think if I had a Table or an Array with a dozen questions like, “What is the capital of New Hampshire?” or “What color is snow?” and I rotated them, that would be fairly effective. Or is it not that easy? :-/

As far as CAPTCHA, is there some place I can download and install it?

Is it open-source?

Are there “trusted” and “untrusted” CAPTCHA versions?

Oh, and is there a Procedural Version, since I don’t know OOP?!

Sincerely,

Debbie

No, as to my knowledge spam bots are not that keen on reading and parsing the text to figure out what the answer “should” be.

I’d at least make a couple, but make them non-culture specific, such as “what color is the sun?”, “what color is the grass?”, etc.

Combating spam/bots from entering your system. There are a few services that pay actual people to complete registration forms, but they’d have to physically be hired to be used to register for your site and it will likely take a while before you are popular enough for that type of an attack.

No, the theory behind the hidden questions is a spam bot fills out ALL fields, so when a hidden from display field is answered it indicates a bot just came through and answered every question hoping to pass all of the validation for registration.

I don’t think they can, as that would require interpreting the text in a contextual way and I just don’t think they are doing that yet.

Nah, I see too many forms that definitely are cycling through a million questions, I’d make a small list of 5-6 (maybe a dozen) and leave it at that.

That would be effective, but too costly for the benefit in programming (if you go with a table). Just make an array and utilize that. Keep in mind, you’ll need to give it some sort of ID that tells you what answer to expect.

http://www.google.com/recaptcha/whyrecaptcha and https://developers.google.com/recaptcha/docs/php?csw=1

Yes and No. The client is freely available for you to use (the second link above). The server that handles serving up the ReCAPTCHA is not (and for good reason, you don’t want to tell everyone how it works!)

ReCAPTCHA is very trusted. I use it (because I trust it).

As you can see from the second link above, it is VERY simple to use. You sign up, get a public/private key, output the recaptcha using your public key, validate it using your private key and ta da! You’ve helped prevent bots.

It sounds like if I put CAPTCHA on my website that I would need the “Server” version, right? (Because I would be serving up images for my users to see on their client computers…)

If so, how much would that cost me?

Sincerely,

Debbie

If I estimate that a human should take at least 20 seconds to enter a Name, Username, Confirmed Username, Email, Confirmed Email, Password, Confirmed Password, and Location, does that sound reasonable?

If it is quicker than that, then I display an error message and ask the “user” to try again. (I am assuming that spam bots would complete the form in under 5 seconds…)

Sincerely,

Debbie

20 may be a bit excessive, but I really don’t know the size of your registration form, so it is hard to say.

DD,

Bots take approx NO TIME to complete a form and submit so include a time loaded and compare with time submitted. Stephen had used that in another thread months ago and that (along with a hidden item in the form that a bot would complete and a human would not even see) should beat CAPTCHA any day.

Regards,

DK

Okay, thanks!

Debbie

I just read this, which introduces some interesting concepts: https://github.com/subwindow/negative-captcha

Just a comment regarding the “negative-captcha”, please never implement this as explained in the article and the library code. The reason for this is that disabled users will in most cases also receive these fields, and that will make you auto ban them. With that said, this idea is quite old and there is many ways of implementing this. Though it is the type of bot that decide if it is a successful approach or not.

Is it that spam bots can’t do logic problems?

This depends on the bot, but creating a lexical analyzer that is able to solve logic problem like this is quite simple. Keep in mind that OCR is at such a stage today, that there is also bots able to break captcha images.

However until your site receive a significant amount of traffic, I would not worry about bot’s tailored specifically to break your security.

Can you elaborate on this? I have not actually used the gem in any of my projects, as I simply don’t receive enough traffic for spam to be a problem.
Nonetheless, I would be curious to find out how this affects disabled users (presumably you mean those using a screen reader).
Is there one particular method that is bad or worse than the others?
Do you mean the honeypot one, where a text field is positioned off screen, or the real fields where the name is hashed?

Again, I’m not an active proponent of these techniques. I prefer to let anyone use my contact form, then just throw away the junk emails manually.

Yes, I meant users that utilize a screen reader and the field that is positioned off screen. Sorry for not being more specific.

When the screen reader reviews the html and then “voice it out” to the user, it will read the field in question. Which will possibly cause the user to fill some information into it.

Depending on the screen reader, a version that can work is hiding the input field inside a div with css instead of offsetting it. Most screen readers will ignore it then. On a similar note, this is why when you have a company image in the header, you offset the “explanation text” to the side instead of using hide since the screen reader will be able to see it then.

What we have done in the past (not that much the last few years though) is the hash method, where you generate a valid has per time the form is loaded. The problem with this approach, is that it is also not that secure. If you try to protect against a random spider bot that automatically looks for forms to fill out, it can work, but only if the bot does not send back the cookie value with his response. If it does, then they will buy pass the security.

This is why we have stopped doing this for the most part, back in 2006 this stopped all bots dead in their tracks, but today the bots has evolved and are in most cases able to buy pass it successfully.

The normal approach we use for clients today, is similar to what you mention, everyone is able to use the contact form. With a few modifications, we check for any injected headers, and also if there was any forms sent in from same IP/dns combination the last X minutes, if either of those is true, we dont send the email. Though depending on the traffic of the website, the IP/dns combination is not a valid one, especially if the majority of the users of the site is from US, so for those cases we still check the IP/dns but if there was a contact form submitted last X minutes, we show the form again and force them to fill out a captcha before send it.

Thanks for the reply.

Ah ok. So, it’s not the idea that’s bad, rather the implementation.
Do you think it’s better to use visibility: hidden; and/or display:none;?

This I had heard.
When it comes to headers and images, I use the technique outlined here: http://www.sitepoint.com/new-css-image-replacement-technique/

That’s interesting and seems like a balanced approach.

Yes CAPTCHAs are annoying but they are the standard so everybody knows that they must just go through it to sign up. I really don’t think there are many people who turn around and not sign up because of a CAPTCHA.

As for not having one, I really would not recommend that. You will be spammed by thousands of bots every day if you are lucky, if not it will be more.

If I were you I would just implement a CAPTCHA system and you’ll be fine.

Hi Stefan,

I think you’re missing the point.

Implemented badly CAPTCHAs are a real accessibility issue.
I’ve actually witnessed my mum give up on registering for a site, simply because she was unable to fill out the captcha correctly (she is old and has bad eyesight).

This is simply not true.
I don’t have a captcha on the comments or contact form of my blog and receive about ten spam comments per week.
My blog is not massive by any means, but seems to get between 500 and 1000 unique visitors per week

Saying that, if you do find yourself inundated with spam, you can implement a solution as outlined by TheRedDevil.

In the case we are talking about here I would use “display:none;”

Thanks for that.
I thought it only fair to inform the author of this fact, so I submitted an issue: https://github.com/subwindow/negative-captcha/issues/35