Related
I have a few forms on my site that have been getting hammered by SPAM bots lately. I've finally got it under control (without the use of a captcha).
Basically, I'm checking the form for various flags. If detected, I simply redirect them request to google.
Is there a way to redirect the bot either back to it's IP address, or some kind of infinite loop that will possible slow it down, or at least cause a minor headache for the person behind it?
CLARIFICATION:
I am already blocking the SPAM, I'm looking for a clever way to irritate the spammer once I redirect them.
Once one of my teachers told us that they developed a sort of anti-spambot honeypot. It was pretty simple, it redirected bots to a dynamic-generated page which contained an infinite loop of fake addresses. The aims where two: keep them busy and fullfill their DB with unusable email addresses, damaging the spammers.
This was just an idea, i don't know if it fits your needs but..it's worth the shot^^
Of course, it's easier to simply drop spambot-related request if you are able to identify them..
You should be blocking these requests if you can identify them. Block their IP addresses on the server side.
Also, this thread is related to DOS attacks, but might be useful to you.
BOT/Spider Trap Ideas
Technically it is still a captcha, but what about using a static 'general' question with your form.
What is the value of two plus two?
Check that field in your PHP script to ensure the answer is in fact correct. If it is not, stop processing!
Failing that and if you have control over your firewall (and proper logging) start dropping request from the most abusive IP address. Be warned though, this approach might make legit users unable to access your site!
This is was worked for me from one day to another:
I set this invisible formfield that bots fill out with gibberish and if it was filled, I didn't process the form and just returned a success page.
But the posting to this particular form grew on a daily base. It started with 2 POST requests a day and at the end there were 20+ requests.
So TL;DR
I send a 404 Not Found http header on this particular page now. Humans and browser don't see the different, but as far as I observed, the bot checked the availibility of the page first (HEAD request) - so the form was still there but how can the bot know when he get a 404 back?
This turned the POST action of bots completely down so far.
I know this solution doesn't work for pages that have to be visible for good bots (google etc) - but for a contact form or Login form it works fine.
Maybe one can whitelist "good" bots and send 404 for everyone?
No.
Spam bots look for obvious email and comment forms. They won't do anything with a redirect. You could setup some server that is a spider trap full of email forms that don't work.
So, you would want auto URL generation mechanisms to define a site tree, with each new url having another email form. You'd probably want to do this on a dedicated server.
But in the end NO. Think about it: how is your tiny little PHP server ever going to wear-out zombies or a 64-core spam server in Russia?
I don't think you understand what a redirect even does. It sets a response code and says content moved 'here'. A spam bot won't care and probably wont do anything if there's no email form there.
If you really want to avoid spam, read this. You can trap them, but if you're dealing with zombies it's ultimately not going to matter.
http://www.neilgunton.com/doc/?doc_id=8580
I've a website where I'm providing email encryption to users and I'm trying to figure out if there's a way to detect if a user is human or a bot.
I've been digging into $_SESSION in php but it's easy to bypass, I'm also not interested in captcha, useragent or login solutions, any idea of what I need ?
There are other questions very similar to this one in SO but I couldn't find any straight answer...
Any help will be very welcome, thank you all !
This is a hard problem, and no solution I know of is going to be 100% perfect from a bot-defending and usability perspective. If your attacker is really determined to use a bot on your site, they probably will be able to. If you take things far enough to make it impractical for a computer program to access anything on your site, it's likely no human will want to either, but you can strike a good balance.
My point of view on this is partially as a web developer, but more so from the other side of things, having written numerous web crawler programs for clients all over the world. Not all bots have malicious intent, and can be used for things from automating form submissions to populating databases of doctors office addresses or analyzing stock market data. If your site is well designed from a usability standpoint, there should be no need for a bot that "makes things easier" for a user, but there are cases where there are special needs you can't plan for.
Of course there are those who do have malicious intent, which you definitely want to protect your site against as well as possible. There is virtually no site that can't be automated in some way. Most sites are not difficult at all, but here are a few ideas off the top of my head, from other answers or comments on this page, and from my experience writing (non-malicious) bots.
Types of bots
First I should mention that there are two different categories I would put bots into:
General purpose crawlers, indexers, or bots
Special purpose bots, made specifically for your site to perform some task
Usually a general-purpose bot is going to be something like a search engine's indexer, or possibly some hacker's script that looks for a form to submit, uses a dictionary attack to search for a vulnerable URL, or something like this. They can also attack "engine sites", such as Wordpress blogs. If your site is properly secured with good passwords and the like, these aren't usually going to pose much of a risk to you (unless you do use Wordpress, in which case you have to keep up with the latest versions and security updates).
Special purpose "personalized" bots are the kind I've written. A bot made specifically for your site can be made to act very much like a human user of your site, including inserting time delays between form submissions, setting cookies, and so on, so they can be hard to detect. For the most part this is the kind I'm talking about in the rest of this answer.
Captchas
Captchas are probably the most common approach to making sure a user is humanoid, and generally they are difficult to automatically get around. However, if you simply require the captcha as a one-time thing when the user creates an account, for example, it's easy for a human to get past it and then give their shiny new account credentials to a bot to automate usage of the system.
I remember a few years ago reading about a pretty elaborate system to "automate" breaking captchas on a popular gaming site: a separate site was set up that loaded captchas from the gaming site, and presented them to users, where they were essentially crowd-sourced. Users on the second site would get some sort of reward for each correct captcha, and the owners of the site were able to automate tasks on the gaming site using their crowd-sourced captcha data.
Generally the use of a good captcha system will pretty well guarantee one thing: somewhere there is a human who typed the captcha text. What happens before and after that depends on how often you require captcha verification, and how determined the person making a bot is.
Cell-phone / credit-card verification
If you don't want to use Captchas, this type of verification is probably going to be pretty effective against all but the most determined bot-writer. While (just as with the captcha) it won't prevent an already-verified user from creating and using a bot, you can verify that a human being created the account, and if abused block that phone number/credit-card from being used to create another account.
Sites like Facebook and Craigslist have started using cell-phone verification to prevent spamming from bots. For example, in order to create apps on Facebook, you have to have a phone number on record, confirmed via text message or an automated phone call. Unless your attacker has access to a whole lot of active phone numbers, this could be an effective way to verify that a human created the account and that he only creates a limited number of accounts (one for most people).
Credit cards can also be used to confirm that a human is performing an action and limit the number of accounts a single human can create.
Other [less-effective] solutions
Log analysis
Analyzing your request logs will often reveal bots doing the same actions repeatedly, or sometimes using dictionary attacks to look for holes in your site's configuration. So logs will tell you after-the-fact whether a request was made by a bot or a human. This may or may not be useful to you, but if the requests were made on a cell-phone or credit-card verified account, you can lock the account associated with the offending requests to prevent further abuse.
Math/other questions
Math problems or other questions can be answered by a quick google or wolfram alpha search, which can be automated by a bot. Some questions will be harder than others, but the big search companies are working against you here, making their engines better at understanding questions like this, and in turn making this a less viable option for verifying that a user is human.
Hidden form fields
Some sites employ a mechanism where parameters such as the coordinates of the mouse when they clicked the "submit" button are added to the form submission via javascript. These are extremely easy to fake in most cases, but if you see in your logs a whole bunch of requests using the same coordinates, it's likely they are a bot (although a smart bot could easily give different coordinates with each request).
Javascript Cookies
Since most bots don't load or execute javascript, cookies set using javascript instead of a set-cookie HTTP header will make life slightly more difficult for most would-be bot makers. But not so hard as to prevent the bot from manually setting the cookie as well, once the developer figures out how to generate the same value the javascript generates.
IP address
An IP address alone isn't going to tell you if a user is a human. Some sites use IP addresses to try to detect bots though, and it's true that a simple bot might show up as a bunch of requests from the same IP. But IP addresses are cheap, and with Amazon's EC2 service or similar cloud services, you can spawn a server and use it as a proxy. Or spawn 10 or 100 and use them all as proxies.
UserAgent string
This is so easy to manipulate in a crawler that you can't count on it to mark a bot that's trying not to be detected. It's easy to set the UserAgent to the same string one of the major browsers sends, and may even rotate between several different browsers.
Complicated markup
The most difficult site I ever wrote a bot for consisted of frames within frames within frames....about 10 layers deep, on each page, where each frame's src was the same base controller page, but had different parameters as to which actions to perform. The order of the actions was important, so it was tough to keep straight everything that was going on, but eventually (after a week or so) my bot worked, so while this might deter some bot makers, it won't be useful against all. And will probably make your site about a gazillion times harder to maintain.
Disclaimer & Conclusion
Not all bots are "bad". Most of the crawlers/bots I have made were for users who wanted to automate some process on the site, such as data entry, that was too tedious to do manually. So make tedious tasks easy! Or, provide an API for your users. Probably one of the easiest way to discourage someone from writing a bot for your site is to provide API access. If you provide an API, it's a lot less likely someone will go to the effort to create a crawler for it. And you could use API keys to control how heavily someone uses it.
For the purpose of preventing spammers, some combination of captchas and account verification through cell numbers or credit cards is probably going to be the most effective approach. Add some logging analysis to identify and disable any malicious personalized bots, and you should be in pretty good shape.
My favorite way is presenting the "user" with a picture of a cat or a dog and asking, "Is this a cat or a dog?" No human ever gets that wrong; the computer gets it right perhaps 60% of the time (so you have to run it several times). There's a project that will give you bunches of pictures of cats and dogs -- plus, all the animals are available for adoption so if the user likes the pet, he can have it.
It's a Microsoft corporate project, which puts me in a state of cognitive dissonance, as if I found out that Harry Reid likes zydeco music or that George Bush smokes pot. Oh, wait...
I've seen/used a simple arithmetic problem with written numbers ie:
Please answer the following question to prove you are human:
"What is two plus four?"
and similar simple questions which require reading:
"What is man's best friend?"
you can supply an endless stream of questions, should the person attempting access be unfamiliar with the subject matter, and it is accessible to all readers, etc.
There's a reason why companies use captchas or logins. As ugly of a solution as captchas are, they're currently the best (most accurate, least disruptive to users) way of weeding out bots. If a login solution doesn't work for you, I'm afraid the only realistic solution is a captcha.
If users will be filling in a form, honeypot fields are simple to implement, and can be reasonably effective, but nothing is perfect. Create one or more hidden fields in the form, and if they contain anything when the form is posted, reject the form. Spambots will usually attempt to fill in everything.
You do need to be aware of accessibility. Hidden fields probably won't be filled in by those using a standard browser (where the field is not visible), but those using screen readers may be presented with the field. Be sure to label it correctly so that these users do not fill it in. Perhaps with something like "Please help us to prevent spam by leaving this field empty". Also, if you do reject the form, be sure to reject it with helpful error messages, just in case it has been filled in by a human.
I suggest getting the Growmap Anti Spambot Wordpress plugin and seeing what code you can borrow from it or just using the same technique. I've found this plugin to be very effective for curtailing automated spam on my WordPress sites and I've started adapting the same technique for my ASP.NET sites.
The only thing it doesn't deal with are human cut-and-paste spammers.
I'm looking to set up a whistleblowing/anonymous tip website, but I've run into some problems. The basic idea is that you navigate to a splash page, fill in a few fields (name and location optionally, and then the message), then fire it off. At that point the message gets sent to a specific email inbox so that our team can look at it.
I've done a bit of research and PHP seems like my best bet, but I would also like to be able to log IP addresses for every message (or, more ideally, append them to the email before it is sent) so that I can be sure I'm not getting trolled or spammed. Can anyone point me in the right direction with this? I'm kind of a PHP noob, but willing to learn.
Thanks!
The remote IP address will be available within your php script using the super global $_SERVER['REMOTE_ADDR']. You can append that to your mail.
Just to mention: If you log the ip address of the sender, you kind of miss something important if you want the sender to be ANONYMOUS. Because if you log the ip, then this is not really the case anymore.
Problem
Spambots most of the times have a network of computers(hacked!) so blocking IP addresses most of the times does not work. Also I would like to point out the probably some legimate user who is not aware of the malware on his PC can't use your service because you are blocking his IP address. Otherwise CAPTCHA's were NOT necessary at all and Google, Yahoo! would not be using them at all because as you most likely know these images are hard to read sometimes.
Solution
You should just have a good spam filter(GMail's works very good) in place and use Akismet to detect spam-messages instead. They have very decent libraries in place so that you don't have to do any coding at all and it is going to work a lot better, then what you were about to implement.
We're having an issue on one of our fairly large websites with spam bots. It appears the bots are creating user accounts and then posting journal entries which lead to various spam links.
It appears they are bypassing our captcha somehow -- either it's been cracked or they're using another method to create accounts.
We're looking to do email activation for the accounts, but we're about a week away from implementing such changes (due to busy schedules).
However, I don't feel like this will be enough if they're using an SQL exploit somewhere on the site and doing the whole cross site scripting thing. So my question to you:
If they are using some kind of XSS exploit, how can I find it? I'm securing statements where I can but, again, its a fairly large site and it'd take me awhile to actively clean up SQL statements to prevent XSS. Can you recommend anything to help our situation?
1) As mentioned above reCAPTCHA is a good start.
2) Askimet is a great way to flag spam before it is published. It's what Wordpress uses to stop spam and it is extremely effective. You then could reject or queue the entry for moderation based on the results. It's API is ridiculously easy to use, too. (I have PHP code if you need it). You might need a commercial license although I am sure you can get started using the free version.
3) Verification of email addresses is definitely a good idea as it requires a valid email account which many spammers do not have. Just make sure you make verifying the email address easy as if it is too difficult it can turn legitimate users away as well.
If the bots were exploiting a hole in a script somewhere, there should be evidence of that in the logs. Check for direct POSTs to user creation scripts and the journal entry creation scripts without the usual "normal" surfing activity prior to the hit: The bots may have trolled the site only once and are bypassing the step of pulling down the forms and pretending to fill them in. Look for GET requests with obvious XSS-type data in the query strings.
You could also embed a random token in an hidden field within the forms and require that token to be present for the activation/posting to go through. If the bots only parsed your signup scripts once and are doing direct posts, this will stop them in their tracks until the bot creators catch on and look for the token. But it would give you some breathing space to implement a better system.
If your user account tables don't have some kind of time-of-creation time stamp on them, put one in and have the server create the timestamp, not your user scripts. This way you can narrow down the time period(s) to scan the logs for bot activity and see what they're doing. And if nothing else, you could block the IPs the bots are posting from.
I'm surprised that someone can advise Akismet and it is accepted as answer:
Akismet engagement is illegal in EU as infringing privacy protection laws;
Any blacklisting system helps criminals, making internet unusable by legit users;
Systems collecting spam to analyze it are doomed to act retroactively always lagging behind spammer techniques advances and bot development;
Why to accumulate and analyse spam fed by bots instead of blocking spam bots?
I have a forum on a website I master, which gets a daily dose of pron spam. Currently I delete the spam and block the IP. But this does not work very well. The list of blocked IP's is growing quickly, but so is the number of spam posts in the forum.
The forum is entirely my own code. It is built in PHP and MySQL.
What are some concrete ways of stopping the spam?
Edit
The thing I forgot to mention is that the forum needs to be open for unregistered users to post. Kinda like a blog comment.
In a guestbook app I wrote, I implemented two features which prevent most of the spam:
Don't allow POST as the first request in a session
Require a valid HTTP Refer(r)er when posting
One way that I know which works is to use JavaScript before submitting the form. For example, to change the method from GET to POST. ;) Spambots are lousy at executing JavaScript. Of course, this also means that non-Javascript people will not be able to use your site... if you care about them that is. ;) (Note: I don't)
In my experience, the best easy defenses come from just doing something "non-standard". If you make your site non-standard, this makes it so that any automated spam would have to be coded specifically for your site, which (no offense) probably isn't worth the effort. Note that if the spam is coming from human spammers, there's not really anything you can do that won't also stop legitimate posters. So the goal is to find a solution that will throw away any "standard" posts - that is, "fill out the whole form and push submit".
A couple examples that come to mind of things that you could try:
Have a hidden form field with a name that sounds like something a spammer would want to fill out, like "website" or "homepage" or something like that. If the form field gets filled out, throw away the message instead of posting it, because it was a bot automatically filling in the whole form, even invisible fields.
You don't have to use a "real" captcha, but even something simple like "Enter the following word backwards: <random backwards word>" or "What is the domain name of this website?". Easy for a human to do, but it would require a fairly complex bot to figure out what to fill in.
You could use a captcha, there are some good scripts like PHPCaptcha or use a spam control service, like Akismet, they have a PHP API.
You might want to look at this question, which has several answers that describe how you could implement a non-intrusive captcha.
Another thing to consider is to require time between posts to prevent massive spamming.
Include a CAPTCHA that is always "orange".
The spams may be by bots or humans - bots are more likely.
To stop the bots, put in a hidden field populated by Javascript - there is a 99.5% chance that a standard, stupid bot that isn't customised to your site will fail to fill that in.
If they fail to fill it in correctly, give them a message that Javascript is required or something, and give them an opportunity to post some other way (e.g. with a captcha or registration). That way anonymous users who aren't spambots can (mostly) still post with no problems, and most spambots (which haven't been tailored for your specific site) won't.
Don't bother blacklisting IP addresses or using third party blacklists, that will just generate false positives. Almost all bots use the same IP addresses as (some) legitimate users.
Another trick is to put in a text field with a plausible sounding name, which is made difficult to see with CSS - anyone filling this field in with anything is considered to be a bot.
Advanced solutions:
Akismet
Defensio
Sblam! (open source clone of the above)
You can try your luck with non-standard form:
fields that must stay empty hidden with CSS
fields with misleading names, e.g. <input name=email> for something that is not an e-mail.
For me CAPTCHA is like giving up to spammers and letting them damage your forum anyway – except that instead of spam damage, you get usability and accessibility damage.
Something I've found to be surprisingly effective: disallow comments that contain too many URLs (more than, say, 5). Since doing that, I've had zero comment spam.
Edit: Since writing the above, I've had recurring comment spam with only one link. I have now added some honeypot fields and have had no commend spam for a few months now.
Don't let anybody post until they respond to an email sent to their registered email address. You'll see lots of forums and mailing lists generate a unique email address or web url that is sent to the new user's given email address, and they have to respond to the email or click on the link to finalize their registration.
Captcha is definitely the easiest method - try KittenAuth if you want something bot-proof (Although I got pandas this time)
Kitten Auth
There is no single answer since Spam is really a matter of economics: how much is it worth it to someone to put their stuff onto the web. There, however, some solutions that seem pretty good
Recaptcha
Use CCS to create an invisible
field that robots fill-in
Create a time-specific hidden field in your form so the
robot can't use the same form over and over again.
I want to say that in most time, a CAPTCHA is enough for you to prevent SPAMers.
But do use a strong one, like http://www.captcha.net/.
Remember that SPAMers do not want to spend much time to deal with a particular site(except heavy traffic sites), they use a tool to post AD on a lot of sites. So make your FORM a little unusual, (e.g. give the user a image says '1.5+2.4=?' and let users to answer, this will block most of the spam tools :) )
The easiest thing I've done to stop spammers with (so far) 100% consistency is to validate the text that was submitted. If you use the php function strstr() to check for "a href" or even a non-clickable http or www, you can then just reroute the spammer elsewhere. I actually have a script then write to my .htaccess file to deny the offending IP address. Not sure if there's any other kind of spam to be concerned about, but links are all I've seen so far.