I have a forum on a website I master, which gets a daily dose of pron spam. Currently I delete the spam and block the IP. But this does not work very well. The list of blocked IP's is growing quickly, but so is the number of spam posts in the forum.
The forum is entirely my own code. It is built in PHP and MySQL.
What are some concrete ways of stopping the spam?
Edit
The thing I forgot to mention is that the forum needs to be open for unregistered users to post. Kinda like a blog comment.
In a guestbook app I wrote, I implemented two features which prevent most of the spam:
Don't allow POST as the first request in a session
Require a valid HTTP Refer(r)er when posting
One way that I know which works is to use JavaScript before submitting the form. For example, to change the method from GET to POST. ;) Spambots are lousy at executing JavaScript. Of course, this also means that non-Javascript people will not be able to use your site... if you care about them that is. ;) (Note: I don't)
In my experience, the best easy defenses come from just doing something "non-standard". If you make your site non-standard, this makes it so that any automated spam would have to be coded specifically for your site, which (no offense) probably isn't worth the effort. Note that if the spam is coming from human spammers, there's not really anything you can do that won't also stop legitimate posters. So the goal is to find a solution that will throw away any "standard" posts - that is, "fill out the whole form and push submit".
A couple examples that come to mind of things that you could try:
Have a hidden form field with a name that sounds like something a spammer would want to fill out, like "website" or "homepage" or something like that. If the form field gets filled out, throw away the message instead of posting it, because it was a bot automatically filling in the whole form, even invisible fields.
You don't have to use a "real" captcha, but even something simple like "Enter the following word backwards: <random backwards word>" or "What is the domain name of this website?". Easy for a human to do, but it would require a fairly complex bot to figure out what to fill in.
You could use a captcha, there are some good scripts like PHPCaptcha or use a spam control service, like Akismet, they have a PHP API.
You might want to look at this question, which has several answers that describe how you could implement a non-intrusive captcha.
Another thing to consider is to require time between posts to prevent massive spamming.
Include a CAPTCHA that is always "orange".
The spams may be by bots or humans - bots are more likely.
To stop the bots, put in a hidden field populated by Javascript - there is a 99.5% chance that a standard, stupid bot that isn't customised to your site will fail to fill that in.
If they fail to fill it in correctly, give them a message that Javascript is required or something, and give them an opportunity to post some other way (e.g. with a captcha or registration). That way anonymous users who aren't spambots can (mostly) still post with no problems, and most spambots (which haven't been tailored for your specific site) won't.
Don't bother blacklisting IP addresses or using third party blacklists, that will just generate false positives. Almost all bots use the same IP addresses as (some) legitimate users.
Another trick is to put in a text field with a plausible sounding name, which is made difficult to see with CSS - anyone filling this field in with anything is considered to be a bot.
Advanced solutions:
Akismet
Defensio
Sblam! (open source clone of the above)
You can try your luck with non-standard form:
fields that must stay empty hidden with CSS
fields with misleading names, e.g. <input name=email> for something that is not an e-mail.
For me CAPTCHA is like giving up to spammers and letting them damage your forum anyway – except that instead of spam damage, you get usability and accessibility damage.
Something I've found to be surprisingly effective: disallow comments that contain too many URLs (more than, say, 5). Since doing that, I've had zero comment spam.
Edit: Since writing the above, I've had recurring comment spam with only one link. I have now added some honeypot fields and have had no commend spam for a few months now.
Don't let anybody post until they respond to an email sent to their registered email address. You'll see lots of forums and mailing lists generate a unique email address or web url that is sent to the new user's given email address, and they have to respond to the email or click on the link to finalize their registration.
Captcha is definitely the easiest method - try KittenAuth if you want something bot-proof (Although I got pandas this time)
Kitten Auth
There is no single answer since Spam is really a matter of economics: how much is it worth it to someone to put their stuff onto the web. There, however, some solutions that seem pretty good
Recaptcha
Use CCS to create an invisible
field that robots fill-in
Create a time-specific hidden field in your form so the
robot can't use the same form over and over again.
I want to say that in most time, a CAPTCHA is enough for you to prevent SPAMers.
But do use a strong one, like http://www.captcha.net/.
Remember that SPAMers do not want to spend much time to deal with a particular site(except heavy traffic sites), they use a tool to post AD on a lot of sites. So make your FORM a little unusual, (e.g. give the user a image says '1.5+2.4=?' and let users to answer, this will block most of the spam tools :) )
The easiest thing I've done to stop spammers with (so far) 100% consistency is to validate the text that was submitted. If you use the php function strstr() to check for "a href" or even a non-clickable http or www, you can then just reroute the spammer elsewhere. I actually have a script then write to my .htaccess file to deny the offending IP address. Not sure if there's any other kind of spam to be concerned about, but links are all I've seen so far.
Related
I have a few forms on my site that have been getting hammered by SPAM bots lately. I've finally got it under control (without the use of a captcha).
Basically, I'm checking the form for various flags. If detected, I simply redirect them request to google.
Is there a way to redirect the bot either back to it's IP address, or some kind of infinite loop that will possible slow it down, or at least cause a minor headache for the person behind it?
CLARIFICATION:
I am already blocking the SPAM, I'm looking for a clever way to irritate the spammer once I redirect them.
Once one of my teachers told us that they developed a sort of anti-spambot honeypot. It was pretty simple, it redirected bots to a dynamic-generated page which contained an infinite loop of fake addresses. The aims where two: keep them busy and fullfill their DB with unusable email addresses, damaging the spammers.
This was just an idea, i don't know if it fits your needs but..it's worth the shot^^
Of course, it's easier to simply drop spambot-related request if you are able to identify them..
You should be blocking these requests if you can identify them. Block their IP addresses on the server side.
Also, this thread is related to DOS attacks, but might be useful to you.
BOT/Spider Trap Ideas
Technically it is still a captcha, but what about using a static 'general' question with your form.
What is the value of two plus two?
Check that field in your PHP script to ensure the answer is in fact correct. If it is not, stop processing!
Failing that and if you have control over your firewall (and proper logging) start dropping request from the most abusive IP address. Be warned though, this approach might make legit users unable to access your site!
This is was worked for me from one day to another:
I set this invisible formfield that bots fill out with gibberish and if it was filled, I didn't process the form and just returned a success page.
But the posting to this particular form grew on a daily base. It started with 2 POST requests a day and at the end there were 20+ requests.
So TL;DR
I send a 404 Not Found http header on this particular page now. Humans and browser don't see the different, but as far as I observed, the bot checked the availibility of the page first (HEAD request) - so the form was still there but how can the bot know when he get a 404 back?
This turned the POST action of bots completely down so far.
I know this solution doesn't work for pages that have to be visible for good bots (google etc) - but for a contact form or Login form it works fine.
Maybe one can whitelist "good" bots and send 404 for everyone?
No.
Spam bots look for obvious email and comment forms. They won't do anything with a redirect. You could setup some server that is a spider trap full of email forms that don't work.
So, you would want auto URL generation mechanisms to define a site tree, with each new url having another email form. You'd probably want to do this on a dedicated server.
But in the end NO. Think about it: how is your tiny little PHP server ever going to wear-out zombies or a 64-core spam server in Russia?
I don't think you understand what a redirect even does. It sets a response code and says content moved 'here'. A spam bot won't care and probably wont do anything if there's no email form there.
If you really want to avoid spam, read this. You can trap them, but if you're dealing with zombies it's ultimately not going to matter.
http://www.neilgunton.com/doc/?doc_id=8580
i have made a simple php contact form following this tutorial:
http://www.catswhocode.com/blog/how-to-create-a-built-in-contact-form-for-your-wordpress-theme
The big problem is that this form processing is not safe, I have heard people can use it to send spam and/or hack my server.
What are the basic steps needed to make this form more secure?
Ps: I don't want to use re-captcha if it can be avoided...
Edit: I need suggestions to what php functions are used to filter and secure that the form is submitted "the right way" and not altered and/or used to hack my site or send email to other people (using the site to send spam to other people). Do i just need to use strip_slashes? or is there a better way?
One way: If you're not a huge site, it's not likely anyone is going to figure this out/take the time to.
You could use some tricky JS to handle tokens on click. So your server issues token-id's to clickable/focus-able elements on the page during the backend render phase. Log these in a database or data file. Then, when users click around and submit, you can compare the id's sent via the onclick() function. You could also apply some heuristics to determine if the history of clicks is reasonably paced. Posts are too fast to be a human or not, that is, even if they scripted the hijacking of the token-ids and auto submitted, you could check that the time between click events appears automated. Signed up for a twitter account lately? They use passive human detection that while not 100% foolproof, it is slower and more difficult to break. Somebody would REALLY want to hack/spam your site.
Important Step 2: strip out/URLEncode strange characters if you think this will break your page. common ones that break things are " and ' and :
Another Way: http://areyouahuman.com/
As long as you are using encrypted methods verifying humanity without crappy CAPTCHA is possible.I mean, don't ignore your headers either. These are complimentary ways.
The key is to have enough complexity to make for an NP-Complete problem. http://en.wikipedia.org/wiki/NP-complete
When the day comes when AI can solve multiple complex Human problems on their own, we will have other things to worry about than request tampering.
http://louisville.academia.edu/RomanYampolskiy/Papers/1467394/AI-Complete_AI-Hard_or_AI-Easy_Classification_of_Problems_in_Artificial
Another company doing interesting research is http://www.vouchsafe.com/play-games they actually use games designed to trick the RTT into training the RTT how to be more solvable by only humans!
Here's a great article on NP-Hard problems. I can see a huge possibility here: http://www.i-programmer.info/news/112-theory/3896-classic-nintendo-games-are-np-hard.html
I have a classifieds website...
Each classified is displayed in a php page called show_ad.php
I am working on a "tip a friend" function, where users enter their own name, the friends email and a short message to the friend.
The above is no problem, however, I need to make sure bots don't use this form for "spam" etc...
One way is captcha, but I was thinking about creating my own captcha, so here is my plan, and I need you to tell me if it has any flaws:
1- On load of the show_ad.php file, I generate a random number, say 5 digits.
2- I output the number to the user, and tell users to enter this number in a form text input.
3- The number is also put into a hidden input.
4- User presses "send" button.
5- I use ajax to call a php file called send_tip.php, and I fetch the value of the hidden input, and compare it to the text-input the user entered, and see if they match, and then send the email.
Nothing is ever safe enough, but is the above enough for a classifieds website?
Thanks
UPDATE:
6- I add a table to mysql, which records ip adresses of the user who sends email, and if it exceeds more than say 3 emails per minute AND 30 emails per day, I stop them... Although then maybe just the email is enough, and I should skip the first steps with the random number? What do you think?
You might want to consider using reCAPTCHA instead of reinventing the wheel and making your own CAPTCHA.
As a nice side effect, you're helping to digitize books!
One could easily write a bot that looks at the hidden field and submits the right data.
So no, it's not secure.
No CAPTCHA's are 100% bot-proof, but 99% bot-proof is enough.
AJAX will be a huge roadblock to bots, which is secure enough.
You should give misleading names to form fields. For example, you hidden field of your "number" will be named "message", so your bot will misfill it.
However, if your site is big enough, bot programmers will re-program their bots to cope with your site...
See also: Practical non-image based CAPTCHA approaches?
It's not very safe. A better solution would be to generate that 5-digit number and store it in the session. Then generate an image that shows the number. Any bot that needs to hack this captcha needs to be able to OCR the image, which is far more complex.
Another slight advantage to this approach is that it works without the need for AJAX, although that might be a disadvantage as well, because AJAX is an extra obstacle for bots. You can, if you want, still use AJAX to request the image.
[edit]
One very great advantage of writing your own captcha, is that someone needs to write a specific bot for it. Common captchas can be hacked by generic bots that just look for signs. I've had success with protecting some of my forms by replacing a complex captcha with a simple custom made one that shows just plain text and even always requires the same answer!
How do you stop bots on a page which is accessible to registered users only? 90% page is accessed by real users and 10% are bot.
I do not want to put captcha or verification method on the page because I know that my users wont like this and they lazy also.
Please share your ideas
Edit
I want to make this question more clear
Registration page has captcha
My site allows users to submit contents in other words its UGC site. Spammers copy other users content and put them on my site so blocking them via askimet is not possible.
Possible Solution
Just got one thing in my mind.
When user click on submit button server will generate a random number (using javascript) which will be then used in hidden field for verification.
Do you think this solution is practically applicable?
One trick I like to use is to add a hidden input field to my forms that a real user would never see or change, but that a bot would blindly fill out.
Something like
<input name="spam_stopper" value="DO NOT CHANGE THIS" style="display:none;"/>
and then, in your form handling code, make sure the value of spam_stopper is "DO NOT CHANGE THIS".
A smart bot may ignore display:none, but that's not too likely - many do ignore <input type="hidden"> though, so I wouldn't use that...
Given you have excluded captcha (which isn't 100% bulletproof), you need to check what your users type and allow or forbid their postings.
This task isn't going to be an easy one, so I would suggest to turn your attention to ready-made solutions such as Akismet.
Since these bots don't follow robots.txt, you can always block them with an .htaccess, but it's lot of work (need to maintain the block list) since bots/spammers often change IPs. You also risk to block genuine users.
You can see Block Bad Bots for an example.
It can be useful but it's often too much work to block all of them VS let's say a CAPTCHA or similar system.
Firstly, do you do human-verification on sign-up? That's the first step you should take to prevent spam on your site. Captchas are very effective, and even if you don't want to make users answer a captcha each time they post on the site, having them fill one out to create an account is perfectly reasonable. It only takes 2-3 seconds, and they only need to do it once.
If you're not willing to do that, you're going to have to put up with spam so long as your site is indexed in search engines.
Prevent not sort out the spam
Yes, CAPTCHAs are not user-friendly. There are a few techniques that you can use to prevent spams without using CAPTCHAs which some of them have been already mentioned by others:
Smarter Server-side Validation: This is specific to the form but for example in contact us form you can filter lengthy messages or messages including a lot of URLs. Or if you expect to get an email you can ping the domain.
Blacklist Mechanism: flag spammers by IP or phrases in a blacklist database. If you're using PHP a simple library like Guard can be helpful
Honeypots: This is already mentioned in the accepted answer
Time-based Protection: To check time to post a request is more than X seconds
Score-based Google reCAPTCHA v3: This version is totally re-designed compared to the previous one and detect spams behind the scene.
I've written a post recently and you can find more in depth there.
I'm currently working on a little chat/forum site that I roughed out in a weekend, and it has anonymous entries (i.e.: no usernames or passwords). This looks like it could be easy-cake for a spammer to ruin, but I don't want to bother the user with captchas or similar anti-spam inputs.
Are there any invisible-to-the-user alternatives to these? Thanks for your help.
One thing you should know about spammers is they always go for the low-hanging fruit. Same with hackers. By this I mean they'll pick the easiest to hit targets that affect the most users. This is why PHP and Windows vulnerabilities are often exploited: they affect so many users that if you find such a weakness/exploit your target "market" is huge.
It's also a big part of the reason why Linux and Mac OSs remain relatively unscathed by viruses for example: the target market is much smaller than Windows. Now I'm not equating the security and robustness of Windows with Mac/Linux but even though the security model of the latter two is much better the number of attacks against the former is still disproportionate with the deficiencies it has.
I say this because one of the best ways to avoid these kinds of problems is not to use popular softare. phpBB for example has had lots of attacks made against it just because it's so popular.
So by doing your own chat/forum system you're at a disadvantage because you have a system that doesn't have the field-testing something popular does but you also have an advantage in that it isn't worth most spammer's time to exploit it. So what you need to watch out for is what can automated systems do against you. Contact forms on Websites tend to have recognizable markers (like name, email and comment fields).
So I would advise:
Ignoring responses that come within say 5-10 seconds of sending the form to the user;
Using a honeypot (CSS/JS hidden fields as described elsewhere);
Using Javascript where applicable to render, reorder or display the form;
Using non-predictable form field names; and
Throttle bad responses by IP.
Not a bomb-proof solution, but you can have some hidden input fields. If those are not left empty, you caught a bot.
Bots tend to fill all input fields, while users will sure leave fields they don't see empty.
This has worked 100% of the time for me:
<input type="text" style="display:none" name="email" value="do not fill this in it is for spam catching" />
Then server side (PHP):
if($_POST['email'] != 'do not fill this in it is for spam catching') {
// spam
}
As mentioned earlier, most bots fill everything in, especially inputs named "email".
The idea of capchas is that they are very easy for humans to pass but very diffucult for bots etc. to avoid. If you don't want this kind of solution what will keep those spam-bots from posting to your site?
It's like you would like your computer to be safe but you don't want to use an antivirus and firewall.
I think you could create a session for every user that enters your site and first time they want to post something show them the capcha (don't require to log in, just pass capcha). If they pass it just store a flag in session that they are human. As long as they have their browser opened they can post and reply on your site what they want. Bots will unlikely pass this first test.
There are two classes of anti-spam protection.
The first is to make it difficult for automated bots to stumble data into your site. The hidden form field method is frequently mentioned for this, and is suitable for low traffic sites. These protections can be trivially defeated by a spam bot written for your site. However if you are too small a target, this won't happen.
The second is the "bothersome" types. This usually involves a captcha, registration, or email confirmation of post. You can use a few approaches to make this less bothersome, but requires much more effort on the bot's side to post spam.
Note that both of these approaches can often impede disabled and mobile users.