Related
I would like to add a form that allows my site visitors to sign up for a newsletter from my site.
I started to build the form, but apparently it's not such a good idea and instead I should use prebuilt scripts by other people to do this as they would be more secure.
Would it really be all that bad to make my own and just have it query all the signups into a database? I don't want it to be all that complex, just need something simple, but I definitely don't want to jeopardize the security of people signing up.
Thanks!
You can do it yourself if you know how to write it correctly.
The reason why people use tried and tested code is because you may be black listed by email services due to large volume of emails you would be sending. Once you are black listed then it is difficult to get yourself off the black list. The tried & tested code SHOULD send the newsletters out periodically rather then mass emailing at once.
Other reasons is due to security, if you write the code incorrectly or if you are new to forms which access databases, then it may not be sensible to write your own. It is good practice to learn it though, even if in the mean time you have to use someone else's code.
You also have to verify emails addresses by a confirmation request email and an associated form. In addition if you have a larger list (more than a few hundred members), then you must code bounce handling and unsubscriptions. For a small list you can do these manually. I did build such a web application step-by-step, but if I had to start it now, than I would consider an existing mail list software more seriously. At first I also thought that I only need a small form, and the list managers I checked was indeed difficult to integrate, but in the long run it was too much effort.
I've a website where I'm providing email encryption to users and I'm trying to figure out if there's a way to detect if a user is human or a bot.
I've been digging into $_SESSION in php but it's easy to bypass, I'm also not interested in captcha, useragent or login solutions, any idea of what I need ?
There are other questions very similar to this one in SO but I couldn't find any straight answer...
Any help will be very welcome, thank you all !
This is a hard problem, and no solution I know of is going to be 100% perfect from a bot-defending and usability perspective. If your attacker is really determined to use a bot on your site, they probably will be able to. If you take things far enough to make it impractical for a computer program to access anything on your site, it's likely no human will want to either, but you can strike a good balance.
My point of view on this is partially as a web developer, but more so from the other side of things, having written numerous web crawler programs for clients all over the world. Not all bots have malicious intent, and can be used for things from automating form submissions to populating databases of doctors office addresses or analyzing stock market data. If your site is well designed from a usability standpoint, there should be no need for a bot that "makes things easier" for a user, but there are cases where there are special needs you can't plan for.
Of course there are those who do have malicious intent, which you definitely want to protect your site against as well as possible. There is virtually no site that can't be automated in some way. Most sites are not difficult at all, but here are a few ideas off the top of my head, from other answers or comments on this page, and from my experience writing (non-malicious) bots.
Types of bots
First I should mention that there are two different categories I would put bots into:
General purpose crawlers, indexers, or bots
Special purpose bots, made specifically for your site to perform some task
Usually a general-purpose bot is going to be something like a search engine's indexer, or possibly some hacker's script that looks for a form to submit, uses a dictionary attack to search for a vulnerable URL, or something like this. They can also attack "engine sites", such as Wordpress blogs. If your site is properly secured with good passwords and the like, these aren't usually going to pose much of a risk to you (unless you do use Wordpress, in which case you have to keep up with the latest versions and security updates).
Special purpose "personalized" bots are the kind I've written. A bot made specifically for your site can be made to act very much like a human user of your site, including inserting time delays between form submissions, setting cookies, and so on, so they can be hard to detect. For the most part this is the kind I'm talking about in the rest of this answer.
Captchas
Captchas are probably the most common approach to making sure a user is humanoid, and generally they are difficult to automatically get around. However, if you simply require the captcha as a one-time thing when the user creates an account, for example, it's easy for a human to get past it and then give their shiny new account credentials to a bot to automate usage of the system.
I remember a few years ago reading about a pretty elaborate system to "automate" breaking captchas on a popular gaming site: a separate site was set up that loaded captchas from the gaming site, and presented them to users, where they were essentially crowd-sourced. Users on the second site would get some sort of reward for each correct captcha, and the owners of the site were able to automate tasks on the gaming site using their crowd-sourced captcha data.
Generally the use of a good captcha system will pretty well guarantee one thing: somewhere there is a human who typed the captcha text. What happens before and after that depends on how often you require captcha verification, and how determined the person making a bot is.
Cell-phone / credit-card verification
If you don't want to use Captchas, this type of verification is probably going to be pretty effective against all but the most determined bot-writer. While (just as with the captcha) it won't prevent an already-verified user from creating and using a bot, you can verify that a human being created the account, and if abused block that phone number/credit-card from being used to create another account.
Sites like Facebook and Craigslist have started using cell-phone verification to prevent spamming from bots. For example, in order to create apps on Facebook, you have to have a phone number on record, confirmed via text message or an automated phone call. Unless your attacker has access to a whole lot of active phone numbers, this could be an effective way to verify that a human created the account and that he only creates a limited number of accounts (one for most people).
Credit cards can also be used to confirm that a human is performing an action and limit the number of accounts a single human can create.
Other [less-effective] solutions
Log analysis
Analyzing your request logs will often reveal bots doing the same actions repeatedly, or sometimes using dictionary attacks to look for holes in your site's configuration. So logs will tell you after-the-fact whether a request was made by a bot or a human. This may or may not be useful to you, but if the requests were made on a cell-phone or credit-card verified account, you can lock the account associated with the offending requests to prevent further abuse.
Math/other questions
Math problems or other questions can be answered by a quick google or wolfram alpha search, which can be automated by a bot. Some questions will be harder than others, but the big search companies are working against you here, making their engines better at understanding questions like this, and in turn making this a less viable option for verifying that a user is human.
Hidden form fields
Some sites employ a mechanism where parameters such as the coordinates of the mouse when they clicked the "submit" button are added to the form submission via javascript. These are extremely easy to fake in most cases, but if you see in your logs a whole bunch of requests using the same coordinates, it's likely they are a bot (although a smart bot could easily give different coordinates with each request).
Javascript Cookies
Since most bots don't load or execute javascript, cookies set using javascript instead of a set-cookie HTTP header will make life slightly more difficult for most would-be bot makers. But not so hard as to prevent the bot from manually setting the cookie as well, once the developer figures out how to generate the same value the javascript generates.
IP address
An IP address alone isn't going to tell you if a user is a human. Some sites use IP addresses to try to detect bots though, and it's true that a simple bot might show up as a bunch of requests from the same IP. But IP addresses are cheap, and with Amazon's EC2 service or similar cloud services, you can spawn a server and use it as a proxy. Or spawn 10 or 100 and use them all as proxies.
UserAgent string
This is so easy to manipulate in a crawler that you can't count on it to mark a bot that's trying not to be detected. It's easy to set the UserAgent to the same string one of the major browsers sends, and may even rotate between several different browsers.
Complicated markup
The most difficult site I ever wrote a bot for consisted of frames within frames within frames....about 10 layers deep, on each page, where each frame's src was the same base controller page, but had different parameters as to which actions to perform. The order of the actions was important, so it was tough to keep straight everything that was going on, but eventually (after a week or so) my bot worked, so while this might deter some bot makers, it won't be useful against all. And will probably make your site about a gazillion times harder to maintain.
Disclaimer & Conclusion
Not all bots are "bad". Most of the crawlers/bots I have made were for users who wanted to automate some process on the site, such as data entry, that was too tedious to do manually. So make tedious tasks easy! Or, provide an API for your users. Probably one of the easiest way to discourage someone from writing a bot for your site is to provide API access. If you provide an API, it's a lot less likely someone will go to the effort to create a crawler for it. And you could use API keys to control how heavily someone uses it.
For the purpose of preventing spammers, some combination of captchas and account verification through cell numbers or credit cards is probably going to be the most effective approach. Add some logging analysis to identify and disable any malicious personalized bots, and you should be in pretty good shape.
My favorite way is presenting the "user" with a picture of a cat or a dog and asking, "Is this a cat or a dog?" No human ever gets that wrong; the computer gets it right perhaps 60% of the time (so you have to run it several times). There's a project that will give you bunches of pictures of cats and dogs -- plus, all the animals are available for adoption so if the user likes the pet, he can have it.
It's a Microsoft corporate project, which puts me in a state of cognitive dissonance, as if I found out that Harry Reid likes zydeco music or that George Bush smokes pot. Oh, wait...
I've seen/used a simple arithmetic problem with written numbers ie:
Please answer the following question to prove you are human:
"What is two plus four?"
and similar simple questions which require reading:
"What is man's best friend?"
you can supply an endless stream of questions, should the person attempting access be unfamiliar with the subject matter, and it is accessible to all readers, etc.
There's a reason why companies use captchas or logins. As ugly of a solution as captchas are, they're currently the best (most accurate, least disruptive to users) way of weeding out bots. If a login solution doesn't work for you, I'm afraid the only realistic solution is a captcha.
If users will be filling in a form, honeypot fields are simple to implement, and can be reasonably effective, but nothing is perfect. Create one or more hidden fields in the form, and if they contain anything when the form is posted, reject the form. Spambots will usually attempt to fill in everything.
You do need to be aware of accessibility. Hidden fields probably won't be filled in by those using a standard browser (where the field is not visible), but those using screen readers may be presented with the field. Be sure to label it correctly so that these users do not fill it in. Perhaps with something like "Please help us to prevent spam by leaving this field empty". Also, if you do reject the form, be sure to reject it with helpful error messages, just in case it has been filled in by a human.
I suggest getting the Growmap Anti Spambot Wordpress plugin and seeing what code you can borrow from it or just using the same technique. I've found this plugin to be very effective for curtailing automated spam on my WordPress sites and I've started adapting the same technique for my ASP.NET sites.
The only thing it doesn't deal with are human cut-and-paste spammers.
What is the point in validating your HTML forms using Javascript, if you are always going to need to validate the forms using PHP anyway? I realize that you get a speed boost from this, and its more convenient to the user, but beyond that, is the time spent on it worth it? If anyone has any good evidence on this I would love to hear it.
Thanks for any help!Metropolis
UPDATE
After receiving numerous answers I would like to change the question a little. We all know that javascript is much more convenient for the user and it gives faster feedback. What I am wondering is: Has anyone ever seen any "evidence" that its worth it? Or do we just do it because it makes things a little better and everyone says we should? The speed difference is not that significant, and as the internet gets faster javascript validation will become even more obsolete I would think.
I am starting to wonder if the time spent validating a page using javascript could be better spent.
Ideally, you validate through javascript and (in your case) PHP.
Both validation methods will work in-tandem to ensure you get the most robust and user friendly functionality possible for your end user.
You will use client-side validation to ensure that all fields are filled in, email addresses are valid, etc.. this will provide instant feedback and won't burden your servers or the user's internet connection.
you validate server-side for security. You can control everything on the server and nothing on the client machine. It's here that you ensure that all entered data is non-malicious and correct.
Keep this in mind: if you are only going to go with one type of validation, choose server-side validation because it is more secure. You should never rely on client-side code for any kind of security.
Using both types of validation gives you the best of both worlds (responsiveness and security) while having none of the downsides. Of course, this means you have to write more code, but in my opinion, it's worth it.
EDIT: In response to the comments
Yes, you have to write more code this way... As a rule of thumb, if it's harder for the programmer, it's easier on the user. It might not make sense in some budgets to do both types of validation and that's a call you're going to have to make. Just make sure your server side validation is rock-solid regardless.
Yes, time is money, and time invested in improving the user's experience is time well spent. If you can't afford to do it now (deadlines/schedule/budget) then do it when you can.
It's all about usability. It is much more convenient for the user to read what errors they have made before the page reloads, rather than continuously submit and reload the page. It can also give a nicer look with some AJAX and the likes, rather than a reload of the page and the very ugly looking red error messages, I think. So the advantage? Much more usable than having server side validation alone.
To provide a better user experience.
The feedback on JS validation is faster, and therefore better than server-side validation on form submit.
The main point of JavaScript validation (when available) is that it improves the user experience. A round-trip to the server requires a page load and the associated annoying flicker as it redraws. Validating in JavaScript code allows you to display a message without all that.
That being said, server-side validation is still required since JavaScript isn't always available (NoScript is quite popular) and because a malicious user will bypass the JavaScript.
Particularly for database backed websites, it tends to be that you need to do server side validation anyway. e.g. to make sure you're inputting valid data into a database or other system. Depending on what the website is updating this could be absolutely critical.
However, client side validation can provide a better user experience. It can be used to provide instant feedback. e.g. when you move focus away from a text box a validator can provide instant feedback which is great when you're filling in a long complicated form.
The bottom line is that you will still need to input good data into your database. And the more correct the information stored in there, the less problems with the system you'll have later. You need both.
e.g. What if someone updates the website code in the future and breaks the validation? or someone writes a script to automate inputting data, bypassing your web front end all it's validation?
I'll say it again. You need both.
...i think you're also keeping your karma cleaner, when hundreds or thousands of your users don't wish you burn in hell for making them fill in 5-7 fields (with textarea) to be informed on the next page they mistyped their email so they have to start all over again :D
it doesn't eat up much of my time to incorporate javascript, id say 1-2 minutes maximum for 1 form. and it saves lots of nerve cells of my users. be a humanist! love ur neighbour!))
Client-side validation allows for an increased user experience. The feedback you give to the user leads to less frustration, less errors, more conversion, more money.
You generally have a better response rate with this kind of validation, which is very valuable.
An high quality software needs this. Users feels happy, and they will spread their joy. A user who has a bad experience won't come came and won't tell his friend.
It's not only decoration when you get to business and sales. ;) The return on investment is worth it.
Easy.
Javascript to help the user enter correctly formatted data.
PHP to make sure whatever enters your script gets cleansed before further processing.
Ofcourse you'll have to do both. Users want it, your customers want it and frankly, you think it's fugly getting php errormessages after submit aswell.
I don't think the argument of having to code extra .js which presumably would eat up your time/budget holds any thruth. There's so many libs and scripts outthere, either one will enable you setting up disco validation in no time at all. However, don't get carried away with dealing out eye candy. .js validation is just there to help. Not to impress.
PHP runs serverside, javascript runs clientside. You don't want your server crunching form validation when you can get the clients computer to do so. Plus it saves bandwidth.
How do you stop bots on a page which is accessible to registered users only? 90% page is accessed by real users and 10% are bot.
I do not want to put captcha or verification method on the page because I know that my users wont like this and they lazy also.
Please share your ideas
Edit
I want to make this question more clear
Registration page has captcha
My site allows users to submit contents in other words its UGC site. Spammers copy other users content and put them on my site so blocking them via askimet is not possible.
Possible Solution
Just got one thing in my mind.
When user click on submit button server will generate a random number (using javascript) which will be then used in hidden field for verification.
Do you think this solution is practically applicable?
One trick I like to use is to add a hidden input field to my forms that a real user would never see or change, but that a bot would blindly fill out.
Something like
<input name="spam_stopper" value="DO NOT CHANGE THIS" style="display:none;"/>
and then, in your form handling code, make sure the value of spam_stopper is "DO NOT CHANGE THIS".
A smart bot may ignore display:none, but that's not too likely - many do ignore <input type="hidden"> though, so I wouldn't use that...
Given you have excluded captcha (which isn't 100% bulletproof), you need to check what your users type and allow or forbid their postings.
This task isn't going to be an easy one, so I would suggest to turn your attention to ready-made solutions such as Akismet.
Since these bots don't follow robots.txt, you can always block them with an .htaccess, but it's lot of work (need to maintain the block list) since bots/spammers often change IPs. You also risk to block genuine users.
You can see Block Bad Bots for an example.
It can be useful but it's often too much work to block all of them VS let's say a CAPTCHA or similar system.
Firstly, do you do human-verification on sign-up? That's the first step you should take to prevent spam on your site. Captchas are very effective, and even if you don't want to make users answer a captcha each time they post on the site, having them fill one out to create an account is perfectly reasonable. It only takes 2-3 seconds, and they only need to do it once.
If you're not willing to do that, you're going to have to put up with spam so long as your site is indexed in search engines.
Prevent not sort out the spam
Yes, CAPTCHAs are not user-friendly. There are a few techniques that you can use to prevent spams without using CAPTCHAs which some of them have been already mentioned by others:
Smarter Server-side Validation: This is specific to the form but for example in contact us form you can filter lengthy messages or messages including a lot of URLs. Or if you expect to get an email you can ping the domain.
Blacklist Mechanism: flag spammers by IP or phrases in a blacklist database. If you're using PHP a simple library like Guard can be helpful
Honeypots: This is already mentioned in the accepted answer
Time-based Protection: To check time to post a request is more than X seconds
Score-based Google reCAPTCHA v3: This version is totally re-designed compared to the previous one and detect spams behind the scene.
I've written a post recently and you can find more in depth there.
I have a forum on a website I master, which gets a daily dose of pron spam. Currently I delete the spam and block the IP. But this does not work very well. The list of blocked IP's is growing quickly, but so is the number of spam posts in the forum.
The forum is entirely my own code. It is built in PHP and MySQL.
What are some concrete ways of stopping the spam?
Edit
The thing I forgot to mention is that the forum needs to be open for unregistered users to post. Kinda like a blog comment.
In a guestbook app I wrote, I implemented two features which prevent most of the spam:
Don't allow POST as the first request in a session
Require a valid HTTP Refer(r)er when posting
One way that I know which works is to use JavaScript before submitting the form. For example, to change the method from GET to POST. ;) Spambots are lousy at executing JavaScript. Of course, this also means that non-Javascript people will not be able to use your site... if you care about them that is. ;) (Note: I don't)
In my experience, the best easy defenses come from just doing something "non-standard". If you make your site non-standard, this makes it so that any automated spam would have to be coded specifically for your site, which (no offense) probably isn't worth the effort. Note that if the spam is coming from human spammers, there's not really anything you can do that won't also stop legitimate posters. So the goal is to find a solution that will throw away any "standard" posts - that is, "fill out the whole form and push submit".
A couple examples that come to mind of things that you could try:
Have a hidden form field with a name that sounds like something a spammer would want to fill out, like "website" or "homepage" or something like that. If the form field gets filled out, throw away the message instead of posting it, because it was a bot automatically filling in the whole form, even invisible fields.
You don't have to use a "real" captcha, but even something simple like "Enter the following word backwards: <random backwards word>" or "What is the domain name of this website?". Easy for a human to do, but it would require a fairly complex bot to figure out what to fill in.
You could use a captcha, there are some good scripts like PHPCaptcha or use a spam control service, like Akismet, they have a PHP API.
You might want to look at this question, which has several answers that describe how you could implement a non-intrusive captcha.
Another thing to consider is to require time between posts to prevent massive spamming.
Include a CAPTCHA that is always "orange".
The spams may be by bots or humans - bots are more likely.
To stop the bots, put in a hidden field populated by Javascript - there is a 99.5% chance that a standard, stupid bot that isn't customised to your site will fail to fill that in.
If they fail to fill it in correctly, give them a message that Javascript is required or something, and give them an opportunity to post some other way (e.g. with a captcha or registration). That way anonymous users who aren't spambots can (mostly) still post with no problems, and most spambots (which haven't been tailored for your specific site) won't.
Don't bother blacklisting IP addresses or using third party blacklists, that will just generate false positives. Almost all bots use the same IP addresses as (some) legitimate users.
Another trick is to put in a text field with a plausible sounding name, which is made difficult to see with CSS - anyone filling this field in with anything is considered to be a bot.
Advanced solutions:
Akismet
Defensio
Sblam! (open source clone of the above)
You can try your luck with non-standard form:
fields that must stay empty hidden with CSS
fields with misleading names, e.g. <input name=email> for something that is not an e-mail.
For me CAPTCHA is like giving up to spammers and letting them damage your forum anyway – except that instead of spam damage, you get usability and accessibility damage.
Something I've found to be surprisingly effective: disallow comments that contain too many URLs (more than, say, 5). Since doing that, I've had zero comment spam.
Edit: Since writing the above, I've had recurring comment spam with only one link. I have now added some honeypot fields and have had no commend spam for a few months now.
Don't let anybody post until they respond to an email sent to their registered email address. You'll see lots of forums and mailing lists generate a unique email address or web url that is sent to the new user's given email address, and they have to respond to the email or click on the link to finalize their registration.
Captcha is definitely the easiest method - try KittenAuth if you want something bot-proof (Although I got pandas this time)
Kitten Auth
There is no single answer since Spam is really a matter of economics: how much is it worth it to someone to put their stuff onto the web. There, however, some solutions that seem pretty good
Recaptcha
Use CCS to create an invisible
field that robots fill-in
Create a time-specific hidden field in your form so the
robot can't use the same form over and over again.
I want to say that in most time, a CAPTCHA is enough for you to prevent SPAMers.
But do use a strong one, like http://www.captcha.net/.
Remember that SPAMers do not want to spend much time to deal with a particular site(except heavy traffic sites), they use a tool to post AD on a lot of sites. So make your FORM a little unusual, (e.g. give the user a image says '1.5+2.4=?' and let users to answer, this will block most of the spam tools :) )
The easiest thing I've done to stop spammers with (so far) 100% consistency is to validate the text that was submitted. If you use the php function strstr() to check for "a href" or even a non-clickable http or www, you can then just reroute the spammer elsewhere. I actually have a script then write to my .htaccess file to deny the offending IP address. Not sure if there's any other kind of spam to be concerned about, but links are all I've seen so far.