Spam prevention from Humans/Bots

Spam prevention from Humans/Bots - php

I am struggling these days to prevent spam from my signup form. I do not want just to prevent it from bots (with honeypots etc) but when a real human writes a script designed for my website to fill my database with dummy registrations(i do not want to use captcha). I have the following things in my mind to implement about it:
Check if email addresses exist (not only valid).I have read that you may be banned if there are lots of requests.Moreover it is possible that the script can contain valid email addresses (for instance when a university provides students email that are slightly different).
The other solution is to make a comparison between IP/Time_of_request and in case the same IP gives lots of requests for signup, consider user as spammer. For this you can set a threshold that you can consider signup request as spam. The problem here is that the script may find the threshold (e.g. 1 second ) and send request every 1.1 seconds. Moreover someone may use onion routing(?) and i will not be able to ban him.
3.What do you think about random input names ?
So what practices here are considered to be good enough to come through this situation?
Thanks!
UPDATE
I send email with confirmation link for activation,but i give users the feature to use the site for about 4 days without activating their accounts! I do not want in first step my database to have spam accounts!
SOLUTION
For everyone interested in, i used honeypots combined with a temporary database ! It seems to work fine!Thanks!

One more thing you can create a temporary registration database, and if someone verify email in 24 hours of registration, his/her data will moved to main registration database. and every entry will be deleted in 24 hours if email is not verified by user.

To validate the email address you could send a confirmation email with a validation link and the user should click on that link to confirm that is his email address and is not a boot.

In Response to OP Update: If you need to give users the option of using the site without clicking the activation link for X days, perhaps you could also send a set of two 4-digit (or just 6, but not secure) PIN numbers in the email (or separately) and have them use that as their temporary password until the account is activated via the original email link. In your database you'd notice if the PIN was used or not, indicating if it was a spam account. It could even be a one-time-use PIN.

Related

When sending an e-mail to confirm user registration, at what point is the username/password stored in the database?

I'm designing a user registration form and am working on sending a confirmation e-mail. The script that is responsible for adding the username/password/e-mail address etc. to the database is getting rather long and I wanted to break the code responsible for e-mails into another file. I was thinking about how the two scripts would work together; would the database script include the e-mail script or redirect to it and pass the arguments. Or do I have it backwards? Would it be the e-mail script including/calling the database script?
What happens first? Does 1) an e-mail containing an account activation link get sent out before any data is added to the database or 2)is the data put in the database right away with an "activated" field set to false and when the user clicks on the link in the e-mail the field will be updated to true 3)or some other way?

#2, this is so that other users don't take the username twice. If you don't save the information instantly, then other users can also activate their account and you'll have errors with that.
Most websites have an expire on their activation so that the usernames can't be held for a long period of time.
A column for state such as user/banned/confirmed/unactivated would be necessary to keep track of who has activated and who has not. A cron job could be used to sweep the database for old inactive users, basing on the timestamp of registration.

Every system I've worked with has just stored the user in the database until it's used, but when spam becomes an issue you can look at other answers.
You need to store the username and password somewhere and sending it in the email is going to cause issues, and otherwise the link you give the email won't know which user to activate, and doing wacky things like storing it in the session is going to cause many, many ux issues.
Other than creating a second table for un-activated accounts, and searching both for the two different calls you need to search both (creating new user/email, and changing username/email), I don't see a better solution.

Secure system for verifying e-mail addresses

From this question I really like #woliveirajr's answer because it solves:
how to protect against releasing e-mail addresses used on a website
verifies the owner of the e-mail address
To avoid this kind of leak, you could also begin the registration
process by asking for the e-mail. After entering it, you would send an
e-mail with a link so that the user could continue with the
registration process. If the e-mail was already registered, you would
send an e-mail saying that.
That way, only the owner of the e-mail could register.
Drawbacks:
probably the real, common users will get bored by having so much steps to register.
in very few cases simple revealing that an e-mail is already registered in a site is a problem, specially because it's easy to
register at any site providing any e-mail that you want. You'll just
won't receive the e-mail to activate your account, but in general the
site will link the account / username to that e-mail.
Where I'm uncertain is how to implement a system where a user can only access the registration page when they click on a link from an e-mail. Would the registration page retrieve data passed to it using GET and verify "a code" to know whether or not the user can register, and this code changes every 30 minutes? For example, the e-mailed registration link could be mysite.com/register.php?secretcode=as18d and register.php checks "the code" as18d but this code would change every 30 minutes. Is this the idea? Would the code be generated by a salted hash based on the system time?
Or, instead of e-mail a link with a few letters could be e-mailed which the user enters into the registration page to authenticate, kind of like how captures work but not really.

The general approach to this is to use an unguessable token, such as a GUID that is embedded in the link so that it gets submitted with the GET. This should be secure since it is highly statistically unlikely that someone will randomly guess a GUID for any user regardless of the time they spend trying to guess, thus expiration isn't even really necessary.
It is worth noting this should be done over an SSL link to avoid the possibility of a man in the middle compromising the verification process.

There's no way on the server side to guarantee that an HTTP request came from an link in an email message. You can't trust anything from the client side; it can all be spoofed and manipulated.
What you need is a hard to guess token. Long and random are good starting points.
Disagree with AJ on expiration for several reasons.
If your user db gets large, you don't want to track un-used tokens for years and years.
If someone requests an activation token and doesn't use it after a few days, it's unlikely they will. Might as well remove it.

What techniques are there for preventing multiple submissions to a competition?

The Project
We have a competition coded in PHP, with CodeIgniter. The form has validation on email addresses and mobile numbers. The page itself is hosted inside an iframe on a different domain (it's an agency-client relationship).
The Problem
We get users with 1000s of entries. We know they are fake because:
They use the same mobile number - assumedly they figure out a mobile number that passes the validation and then use that every time.
The email addresses are all on weird domains, with some of the domains repeated multiple times.
However, the IP addresses are unique, the entries are spread over a few days, the domains themselves have MX records, the user-agents look normal.
The client doesn't want to do anything which could result in fewer entries.
The Question
What are the pros and cons of methods like Captcha? What UI and code patterns have you used that worked?
One method I read is to allow entries that are suspicious, so that spammers entries are accepted, but their data has a 'suspicious' flag against it, which is then checked manually. What data can I check to see whether it is suspicious?

Some methods you could use:
Captcha: Stops bots submitting the form
Email Validation: Send them an email with a unique link to activate their competition entry. Stops invalid email addresses.
Mobile Number Validation: Send them a text message with an activation code. Stops invalid phone numbers.
In my opinion your approach should not be to prevent submission of entries but to require a level of validation on the details entered.

CONS of CAPTCHA:
Users hate it, and it can be frustrating when implemented poorly (failed captcha resets other form fields for instance).
Can be difficult for legit users to complete when the letters are hard to read.
Doesn't always work. Someone just scammed Ticketmaster by beating ReCAPTCHA a few months ago for instance*.
Ugly, more code to implement, and it passes the burden or responsibility from you to the users. PROVE YOU ARE HUMAN is not what I want to see when sending a form, very insulting.
#Nick's got the right idea, use text/email validation. IP checking can be OK sometimes, but as you said, you're getting unique IPs with the same mobile number, so it's not reliable.
There are lots of great posts here regarding CAPTCHA alternatives, definitely worth a read if you plan on employing it. You'll probably have to find a balance between making it easy for the user (encouraging submissions) and front end security techniques.
Why though, can't you simply disregard duplicate mobile numbers or phome number + IP combination? Just because they can can submit multiple times doesn't mean you have to accept it. If it is a human, let them think they are sending in multiple votes :)
*Ticketmaster used various means
to try to thwart Wiseguy’s operation,
at one point switching to a service
called reCaptcha, which is also used
by Facebook. It’s a third-party
Captcha that feeds a Captcha challenge
to a site’s visitors. When a customer
tries to purchase tickets,
Ticketmaster’s network sends a unique
code to reCaptcha, which then
transmits a Captcha challenge to the
customer.
But the defendants allegedly were
able to thwart this, as well. They
wrote a script that impersonated users
trying to access Facebook, and
downloaded hundreds of thousands of
possible Captcha challenges from
reCaptcha, prosecutors maintained.
They identified the file ID of each
Captcha challenge and created a
database of Captcha “answers” to
correspond to each ID. The bot would
then identify the file ID of a
challenge at Ticketmaster and feed
back the corresponding answer. The bot
also mimicked human behavior by
occasionally making mistakes in typing
the answer, authorities said.

Captcha is perfect in spam protection while confusing people very often.
But there is a workaround - You can use JavaScript to hide the captcha for real users (using browsers with JavaScript turned ON) while it will always be "visible" for spam bots (that do not have JS). It's quite simple - just by using of JS You set the div where the captcha is held to display:none, and create a hidden input with value containing that from captcha image...
Strongest approach may be the email validation - but then it means sometimes the rwritting of application. If user submit his reply You register it as not active and send him a validation email to the email address provided. If it is valid, after clicking on the link he will validate his email answer and You can turn his reply to status active...
Also a good workaround for users to prevent the re-submitting of forms on refresh is to redirect users to that same page after the form is submitted and processed... Yes, it takes a second or two longer to view the result, but it's much safer...

create account then verify (or verify then create account)

Most of the examples I see on the web create user accounts in this sequence: user comes to the site, they choose a username and password and enter their email. A confirmation email to sent to this email and if they click the link, the account gets "verified". If they don't verify, the account gets deleted after a while.
I was told about another way: get the user to verify the email first, and when they click the verification link in their email they can start to create a username and password.
Does anyone see any problems with the second way, whether a security concern or anything else? It's not common and I personally cannot find a totally obvious problem with it, but I'd prefer to use it only after many people confirm they don't see problems or loopholes with it either.

Personally I do see an issue that can be inconveniencing for the user:
When most people register with a web site, they expect that they will have to answer quite a few questions, spend some time reading the FAQ and the terms of service and then spend some more time setting up some preliminary aspects of their profile.
The traditional flow allows the user to choose the time to go through that process. Afterwards, the user only receives a verification link, which normally is a 3-second process to use and can be done at practically any time.
Your proposed flow forces the potential user of your site to spend time reading your documentation, then wait until they receive the message and then find some more time, potentially after a few days, to fill in the forms. I, for one, would find that at least slightly annoying - if not outright discouraging - especially if the mail takes its sweet time to arrive, as it's often bound to do.
I also don't like the inherent implication of such a scheme:
Traditional flow: "Oh nice, you filled in our forms, just give us an address to send you a proper verification". The user here is merely waiting to complete what is essentially a done deal.
Mail-first flow: "Oh it's you. Well, wait for a while and we will send you an invitation if we want you". Here, on the other hand, the user is left in a limbo of subconscious uncertainty until they receive your message.
I believe that the first approach is far more open and friendly to the user. It's also the current standard flow for these cases, which should be enough of an incentive to use on its own - you should avoid forcing your users into processes they are not used to, unless there is no other way.

Getting an email from a friend with an invite link to access a site is exciting - it feels exclusive and new and fun. I'm being given something - so I gladly sign up.
Being required to enter an email address in order to start using a site feels draconian and restrictive and annoying. I'm being asked to give something up as the first step then possibly (maybe?) get something of value down the road.
It's not logical - in both cases, my email address is must be verified before an account can be created. In fact, the first case requires my friend to actively SPAM me with an offer I never requested.
Do you know why I first created this StackOverflow account? Because when I wanted to contribute an answer I could click on the Google logo on the login page and start using the site immediately. No username, password, first name, last name, DOB, or other B.S.
Do you know why I never created an Experts Exchange account? Because the first time I tried to access an answer I was prompted to enter a credit card number, billing address and phone number. Before I could even sample what the site had to offer, I had to give something up.
The point is this: barriers to entry make your site suck. Account creation should be as seamless and painless as possible. Being able to access a site immediately after filling out a single-page signup form and a CAPTCHA is awesome, even if access to other features is restricted until email verification is completed. Maybe I'll even tell you my DOB and favorite color if it unlocks more features.

Personally I don't see a problem with it - its a matter of choice. I think the key point though is making it clear to the user that they must
1) enter their email address
2) wait for a confirmation email before they can get to step 3
3) sign up for the account.
It potentially removes the amount of data held and time invested by the user if they only have to enter a single piece of information (their email address) before filling in the rest of the information you require.
Personally, I'd keep it standard so users don't get confused. The amount of work is the same - get a username/password/email address - wait for users to click the link before they can login to your site.

So how many times would you allow to use link send in email?
If only once, user can't create an account if he close browser before selecting username.
If multiple times, a lot of people can create accounts using same link. Publishing this link and using password recovery feature can be nice phishing trick.
And if you check for this email in your database and allow it only once, user would not be able to create two legitimate accounts.

I could see this method being slightly simpler - when the user clicks the verify link in their email, you send them to a form with a hidden pre-generated id number inserted, and then assign a username and password to it afterwards. Blank accounts, with just and id and no other information, are easy to periodically filter out and you're not storing any details whatsoever until the account is successfully created.
However, there's probably a reason why most sites collect username and password before email - you're getting a user invested before you ask for a more personal bit of information. The account is created - now just verify your email. The other way around ask for an email address first and an account second - even though functionally it's the same, perceptually it's not. Also, the advantage of the standard "flow" is that users know what to expect - following conventions mean users feel like they know what's happening and don't get confused or lose interest.

I want to share some thoughts about second approach.
First of all, it is very similar to invite system, but IT IS NOT the same.
You have to allow to send more than one registration request for a single e-mail address. If you don't - potential user might get it accidentally deleted and there will be no way to repeat the procedure. If you do allow that some angry dudes might use this as spam tool (send as many mails as possible to one(maybe even more) e-mail address. Imagine how would you company/site look for a person who got 10k registration requests...
Standard way has one serious advantage: it allows user name reservation without confirming e-mail (user might want to register, but don't want or has no access to the e-mail server/account).
You MUST consider that your server might delay email sending for pretty long time. Possible reasons: out of memory, DoS attack, email server failure and etc. If you choose mail first approach and user don't receive that mail in 5 minutes (for ANY reason), 3 of 4 potential users will course you company/site and never complete registration.
There is a reason why it is called a standard way, as a lot of small details are considered.

Both approaches are OK - but if you're going defer creation of the account, then you're going to have to embed all the required details into the URL - expiry date, username, password and email address and then encrypt it all to prevent tampering - which makes it rather large.
Actually - you couldn't allow people to pick their own usernames - since you'd have no way of checking whether the username had already previously been requested and not verified. And if you're going to publish usernames, then you'd therefore be publishing email addresses.....not such a good idea?

Here would be my concerns with this approach.
Email delivery is not guaranteed and can be slow. If the user doesn't get the email right away, they may not complete the registration process. What if they mistype their email address or if the email gets marked as SPAM?

In my experience, it is always better to keep record of the users that try to register to a site.
The problem is that more then often the users do not get the confirmation e-mail.
When that happens they often forget the site and do not come back.
What I do is to retry sending the confirmation e-mail after a while, say one week. Often they receive the second e-mail and you end up recovering a registered user that otherwise would be lost.
As a matter of fact, I retry sending the confirmation e-mail once every week until the user confirms or it passed 30 days since the registration attempt.
Even if the user does not confirm after 30 days, I do not delete the account. Often the user comes back trying to register again. Then I just send him again the account confirmation once again and encourage the user to contact the site if he does not get it again.
All this is to maximize the chances of recovering a registered user that otherwise could be lost.

I would suggest the second option. Let the users verify themselves by clicking the link in their email. Then they can choose their preferred username and password. I hope the usernames are unique in the site.
It would be helpful in the situation where some users forget to verify the link in their emails for a long time and so their usernames are locked. Others cannot choose those usernames (until that record is deleted later). Also this can eradicate spammers from picking their own usernames and locking them for use by others.
Hence i would suggest to go with the second option. Let the user first verify his email and his existence before he picks a username and locks it for use by others.

There are actually some sites that do that.
You enter your mail
you get a
verification mail with an initial
password and verification link
once you click the link your account is
active you're directed to a form with
additional details (full name, etc.)
but you may skip them and fill them
any time in the future.
This minimal registration process will help you avoid the loss of potential customers who don't want to bother with filling to many forms and supplying data before they really need to.

What it comes down to is convenience for the user. If the only reason for them the check their email is to verify the account then it may seem like an inconvenience. Instead have the system generate a password for them, email it to them, and instruct them to check their email to get their password. You can allow them to change the password after they log in if they want. This method also help to make sure "strong" passwords are out there initially.

PHP - How to prevent spam to track all the available email list in the DB?

I designed a registration form that users can use for registration.
When the user registers an account, he/she has to provide an email and then I will use ajax call to a server php script called checkavailableemail.php.
If the provided email exists in the DB, then I will transfer back FALSE and the user can see the red warning message and form cannot pass the validation procedure.
Here is the question,
Is there a way that I can prevent spam from getting all available emails in my DB?
For example, the bad guy can design an auto-script that uses some methods to continuously send email validation request. In the end, the guy can get a full list of all my current customer email account.
Thank you

There are gazillions of email IDs possible.
The bad guy need to send gazillion requests to get your current customer email accounts list. So, if you get a lot of requests from a single IP, you can temporarily block that.
So, no need to worry.
[EDIT]: OP Clarification:
You can get the IP of the user by this code:
$IP = $_SERVER['REMOTE_ADDR'];
Keep a list of number of requests made by an IP in last 24 hours, log them into a database table. if number of requests go beyond a certain limit, say 500, you can deny requests from this IP by checking the IP of the user with same code.

You could use a session and increment it with every failed register attempt. Once they hit 10, disable the form for a set amount of time. Sorry I can't post examples, my PHP is not strong.

Brian and shamittomar pretty much said everything.
Your concern is almost fictitious since the spam "dudes" are much better off sending emails to random addresses rather than flooding your website validation, to then send spam to those addresses.
Simply establish a limit.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.