Blocking spam in a PHP site without bothering the user

Blocking spam in a PHP site without bothering the user - php

I'm currently working on a little chat/forum site that I roughed out in a weekend, and it has anonymous entries (i.e.: no usernames or passwords). This looks like it could be easy-cake for a spammer to ruin, but I don't want to bother the user with captchas or similar anti-spam inputs.
Are there any invisible-to-the-user alternatives to these? Thanks for your help.

One thing you should know about spammers is they always go for the low-hanging fruit. Same with hackers. By this I mean they'll pick the easiest to hit targets that affect the most users. This is why PHP and Windows vulnerabilities are often exploited: they affect so many users that if you find such a weakness/exploit your target "market" is huge.
It's also a big part of the reason why Linux and Mac OSs remain relatively unscathed by viruses for example: the target market is much smaller than Windows. Now I'm not equating the security and robustness of Windows with Mac/Linux but even though the security model of the latter two is much better the number of attacks against the former is still disproportionate with the deficiencies it has.
I say this because one of the best ways to avoid these kinds of problems is not to use popular softare. phpBB for example has had lots of attacks made against it just because it's so popular.
So by doing your own chat/forum system you're at a disadvantage because you have a system that doesn't have the field-testing something popular does but you also have an advantage in that it isn't worth most spammer's time to exploit it. So what you need to watch out for is what can automated systems do against you. Contact forms on Websites tend to have recognizable markers (like name, email and comment fields).
So I would advise:
Ignoring responses that come within say 5-10 seconds of sending the form to the user;
Using a honeypot (CSS/JS hidden fields as described elsewhere);
Using Javascript where applicable to render, reorder or display the form;
Using non-predictable form field names; and
Throttle bad responses by IP.

Not a bomb-proof solution, but you can have some hidden input fields. If those are not left empty, you caught a bot.
Bots tend to fill all input fields, while users will sure leave fields they don't see empty.

This has worked 100% of the time for me:
<input type="text" style="display:none" name="email" value="do not fill this in it is for spam catching" />
Then server side (PHP):
if($_POST['email'] != 'do not fill this in it is for spam catching') {
// spam
}
As mentioned earlier, most bots fill everything in, especially inputs named "email".

The idea of capchas is that they are very easy for humans to pass but very diffucult for bots etc. to avoid. If you don't want this kind of solution what will keep those spam-bots from posting to your site?
It's like you would like your computer to be safe but you don't want to use an antivirus and firewall.
I think you could create a session for every user that enters your site and first time they want to post something show them the capcha (don't require to log in, just pass capcha). If they pass it just store a flag in session that they are human. As long as they have their browser opened they can post and reply on your site what they want. Bots will unlikely pass this first test.

There are two classes of anti-spam protection.
The first is to make it difficult for automated bots to stumble data into your site. The hidden form field method is frequently mentioned for this, and is suitable for low traffic sites. These protections can be trivially defeated by a spam bot written for your site. However if you are too small a target, this won't happen.
The second is the "bothersome" types. This usually involves a captcha, registration, or email confirmation of post. You can use a few approaches to make this less bothersome, but requires much more effort on the bot's side to post spam.
Note that both of these approaches can often impede disabled and mobile users.

Related

Securing a php contact form

i have made a simple php contact form following this tutorial:
http://www.catswhocode.com/blog/how-to-create-a-built-in-contact-form-for-your-wordpress-theme
The big problem is that this form processing is not safe, I have heard people can use it to send spam and/or hack my server.
What are the basic steps needed to make this form more secure?
Ps: I don't want to use re-captcha if it can be avoided...
Edit: I need suggestions to what php functions are used to filter and secure that the form is submitted "the right way" and not altered and/or used to hack my site or send email to other people (using the site to send spam to other people). Do i just need to use strip_slashes? or is there a better way?

One way: If you're not a huge site, it's not likely anyone is going to figure this out/take the time to.
You could use some tricky JS to handle tokens on click. So your server issues token-id's to clickable/focus-able elements on the page during the backend render phase. Log these in a database or data file. Then, when users click around and submit, you can compare the id's sent via the onclick() function. You could also apply some heuristics to determine if the history of clicks is reasonably paced. Posts are too fast to be a human or not, that is, even if they scripted the hijacking of the token-ids and auto submitted, you could check that the time between click events appears automated. Signed up for a twitter account lately? They use passive human detection that while not 100% foolproof, it is slower and more difficult to break. Somebody would REALLY want to hack/spam your site.
Important Step 2: strip out/URLEncode strange characters if you think this will break your page. common ones that break things are " and ' and :
Another Way: http://areyouahuman.com/
As long as you are using encrypted methods verifying humanity without crappy CAPTCHA is possible.I mean, don't ignore your headers either. These are complimentary ways.
The key is to have enough complexity to make for an NP-Complete problem. http://en.wikipedia.org/wiki/NP-complete
When the day comes when AI can solve multiple complex Human problems on their own, we will have other things to worry about than request tampering.
http://louisville.academia.edu/RomanYampolskiy/Papers/1467394/AI-Complete_AI-Hard_or_AI-Easy_Classification_of_Problems_in_Artificial
Another company doing interesting research is http://www.vouchsafe.com/play-games they actually use games designed to trick the RTT into training the RTT how to be more solvable by only humans!
Here's a great article on NP-Hard problems. I can see a huge possibility here: http://www.i-programmer.info/news/112-theory/3896-classic-nintendo-games-are-np-hard.html

How to avoid repeated post submissions?

I have a form that allows users to paste input, like on StackOverflow. But if users know the format I send to the server, they can keep sending requests to me. How can I ensure it is a real user sending a request instead of some kind of machine attack to insert information?

There are a load of ways to do this and employing a large variety of different things is a good idea to protect against spam.
What stackoverflow does (from my experience) is if there is an abnormal amount of posting, or maybe the posts are very short, or something else is a bit suspicious then they use a capcha.
You can monitor this by using cookies, for instance monitoring the time between posts is a good indicator that someone is spamming. Similarly if the lengths of the messages posted are all about the same length, or include the same url/link or something you can also display a capcha to test if the user is a human or not.

You can use a captcha. That's probably the most common approach.

There are different techniques allowing you to achieve this with more or less success. Using a Captcha is one popular way used by many sites.

Detect if user is human without captcha or useragent

I've a website where I'm providing email encryption to users and I'm trying to figure out if there's a way to detect if a user is human or a bot.
I've been digging into $_SESSION in php but it's easy to bypass, I'm also not interested in captcha, useragent or login solutions, any idea of what I need ?
There are other questions very similar to this one in SO but I couldn't find any straight answer...
Any help will be very welcome, thank you all !

This is a hard problem, and no solution I know of is going to be 100% perfect from a bot-defending and usability perspective. If your attacker is really determined to use a bot on your site, they probably will be able to. If you take things far enough to make it impractical for a computer program to access anything on your site, it's likely no human will want to either, but you can strike a good balance.
My point of view on this is partially as a web developer, but more so from the other side of things, having written numerous web crawler programs for clients all over the world. Not all bots have malicious intent, and can be used for things from automating form submissions to populating databases of doctors office addresses or analyzing stock market data. If your site is well designed from a usability standpoint, there should be no need for a bot that "makes things easier" for a user, but there are cases where there are special needs you can't plan for.
Of course there are those who do have malicious intent, which you definitely want to protect your site against as well as possible. There is virtually no site that can't be automated in some way. Most sites are not difficult at all, but here are a few ideas off the top of my head, from other answers or comments on this page, and from my experience writing (non-malicious) bots.
Types of bots
First I should mention that there are two different categories I would put bots into:
General purpose crawlers, indexers, or bots
Special purpose bots, made specifically for your site to perform some task
Usually a general-purpose bot is going to be something like a search engine's indexer, or possibly some hacker's script that looks for a form to submit, uses a dictionary attack to search for a vulnerable URL, or something like this. They can also attack "engine sites", such as Wordpress blogs. If your site is properly secured with good passwords and the like, these aren't usually going to pose much of a risk to you (unless you do use Wordpress, in which case you have to keep up with the latest versions and security updates).
Special purpose "personalized" bots are the kind I've written. A bot made specifically for your site can be made to act very much like a human user of your site, including inserting time delays between form submissions, setting cookies, and so on, so they can be hard to detect. For the most part this is the kind I'm talking about in the rest of this answer.
Captchas
Captchas are probably the most common approach to making sure a user is humanoid, and generally they are difficult to automatically get around. However, if you simply require the captcha as a one-time thing when the user creates an account, for example, it's easy for a human to get past it and then give their shiny new account credentials to a bot to automate usage of the system.
I remember a few years ago reading about a pretty elaborate system to "automate" breaking captchas on a popular gaming site: a separate site was set up that loaded captchas from the gaming site, and presented them to users, where they were essentially crowd-sourced. Users on the second site would get some sort of reward for each correct captcha, and the owners of the site were able to automate tasks on the gaming site using their crowd-sourced captcha data.
Generally the use of a good captcha system will pretty well guarantee one thing: somewhere there is a human who typed the captcha text. What happens before and after that depends on how often you require captcha verification, and how determined the person making a bot is.
Cell-phone / credit-card verification
If you don't want to use Captchas, this type of verification is probably going to be pretty effective against all but the most determined bot-writer. While (just as with the captcha) it won't prevent an already-verified user from creating and using a bot, you can verify that a human being created the account, and if abused block that phone number/credit-card from being used to create another account.
Sites like Facebook and Craigslist have started using cell-phone verification to prevent spamming from bots. For example, in order to create apps on Facebook, you have to have a phone number on record, confirmed via text message or an automated phone call. Unless your attacker has access to a whole lot of active phone numbers, this could be an effective way to verify that a human created the account and that he only creates a limited number of accounts (one for most people).
Credit cards can also be used to confirm that a human is performing an action and limit the number of accounts a single human can create.
Other [less-effective] solutions
Log analysis
Analyzing your request logs will often reveal bots doing the same actions repeatedly, or sometimes using dictionary attacks to look for holes in your site's configuration. So logs will tell you after-the-fact whether a request was made by a bot or a human. This may or may not be useful to you, but if the requests were made on a cell-phone or credit-card verified account, you can lock the account associated with the offending requests to prevent further abuse.
Math/other questions
Math problems or other questions can be answered by a quick google or wolfram alpha search, which can be automated by a bot. Some questions will be harder than others, but the big search companies are working against you here, making their engines better at understanding questions like this, and in turn making this a less viable option for verifying that a user is human.
Hidden form fields
Some sites employ a mechanism where parameters such as the coordinates of the mouse when they clicked the "submit" button are added to the form submission via javascript. These are extremely easy to fake in most cases, but if you see in your logs a whole bunch of requests using the same coordinates, it's likely they are a bot (although a smart bot could easily give different coordinates with each request).
Javascript Cookies
Since most bots don't load or execute javascript, cookies set using javascript instead of a set-cookie HTTP header will make life slightly more difficult for most would-be bot makers. But not so hard as to prevent the bot from manually setting the cookie as well, once the developer figures out how to generate the same value the javascript generates.
IP address
An IP address alone isn't going to tell you if a user is a human. Some sites use IP addresses to try to detect bots though, and it's true that a simple bot might show up as a bunch of requests from the same IP. But IP addresses are cheap, and with Amazon's EC2 service or similar cloud services, you can spawn a server and use it as a proxy. Or spawn 10 or 100 and use them all as proxies.
UserAgent string
This is so easy to manipulate in a crawler that you can't count on it to mark a bot that's trying not to be detected. It's easy to set the UserAgent to the same string one of the major browsers sends, and may even rotate between several different browsers.
Complicated markup
The most difficult site I ever wrote a bot for consisted of frames within frames within frames....about 10 layers deep, on each page, where each frame's src was the same base controller page, but had different parameters as to which actions to perform. The order of the actions was important, so it was tough to keep straight everything that was going on, but eventually (after a week or so) my bot worked, so while this might deter some bot makers, it won't be useful against all. And will probably make your site about a gazillion times harder to maintain.
Disclaimer & Conclusion
Not all bots are "bad". Most of the crawlers/bots I have made were for users who wanted to automate some process on the site, such as data entry, that was too tedious to do manually. So make tedious tasks easy! Or, provide an API for your users. Probably one of the easiest way to discourage someone from writing a bot for your site is to provide API access. If you provide an API, it's a lot less likely someone will go to the effort to create a crawler for it. And you could use API keys to control how heavily someone uses it.
For the purpose of preventing spammers, some combination of captchas and account verification through cell numbers or credit cards is probably going to be the most effective approach. Add some logging analysis to identify and disable any malicious personalized bots, and you should be in pretty good shape.

My favorite way is presenting the "user" with a picture of a cat or a dog and asking, "Is this a cat or a dog?" No human ever gets that wrong; the computer gets it right perhaps 60% of the time (so you have to run it several times). There's a project that will give you bunches of pictures of cats and dogs -- plus, all the animals are available for adoption so if the user likes the pet, he can have it.
It's a Microsoft corporate project, which puts me in a state of cognitive dissonance, as if I found out that Harry Reid likes zydeco music or that George Bush smokes pot. Oh, wait...

I've seen/used a simple arithmetic problem with written numbers ie:
Please answer the following question to prove you are human:
"What is two plus four?"
and similar simple questions which require reading:
"What is man's best friend?"
you can supply an endless stream of questions, should the person attempting access be unfamiliar with the subject matter, and it is accessible to all readers, etc.

There's a reason why companies use captchas or logins. As ugly of a solution as captchas are, they're currently the best (most accurate, least disruptive to users) way of weeding out bots. If a login solution doesn't work for you, I'm afraid the only realistic solution is a captcha.

If users will be filling in a form, honeypot fields are simple to implement, and can be reasonably effective, but nothing is perfect. Create one or more hidden fields in the form, and if they contain anything when the form is posted, reject the form. Spambots will usually attempt to fill in everything.
You do need to be aware of accessibility. Hidden fields probably won't be filled in by those using a standard browser (where the field is not visible), but those using screen readers may be presented with the field. Be sure to label it correctly so that these users do not fill it in. Perhaps with something like "Please help us to prevent spam by leaving this field empty". Also, if you do reject the form, be sure to reject it with helpful error messages, just in case it has been filled in by a human.

I suggest getting the Growmap Anti Spambot Wordpress plugin and seeing what code you can borrow from it or just using the same technique. I've found this plugin to be very effective for curtailing automated spam on my WordPress sites and I've started adapting the same technique for my ASP.NET sites.
The only thing it doesn't deal with are human cut-and-paste spammers.

Preventing Spam

How do you stop bots on a page which is accessible to registered users only? 90% page is accessed by real users and 10% are bot.
I do not want to put captcha or verification method on the page because I know that my users wont like this and they lazy also.
Please share your ideas
Edit
I want to make this question more clear
Registration page has captcha
My site allows users to submit contents in other words its UGC site. Spammers copy other users content and put them on my site so blocking them via askimet is not possible.
Possible Solution
Just got one thing in my mind.
When user click on submit button server will generate a random number (using javascript) which will be then used in hidden field for verification.
Do you think this solution is practically applicable?

One trick I like to use is to add a hidden input field to my forms that a real user would never see or change, but that a bot would blindly fill out.
Something like
<input name="spam_stopper" value="DO NOT CHANGE THIS" style="display:none;"/>
and then, in your form handling code, make sure the value of spam_stopper is "DO NOT CHANGE THIS".
A smart bot may ignore display:none, but that's not too likely - many do ignore <input type="hidden"> though, so I wouldn't use that...

Given you have excluded captcha (which isn't 100% bulletproof), you need to check what your users type and allow or forbid their postings.
This task isn't going to be an easy one, so I would suggest to turn your attention to ready-made solutions such as Akismet.

Since these bots don't follow robots.txt, you can always block them with an .htaccess, but it's lot of work (need to maintain the block list) since bots/spammers often change IPs. You also risk to block genuine users.
You can see Block Bad Bots for an example.
It can be useful but it's often too much work to block all of them VS let's say a CAPTCHA or similar system.

Firstly, do you do human-verification on sign-up? That's the first step you should take to prevent spam on your site. Captchas are very effective, and even if you don't want to make users answer a captcha each time they post on the site, having them fill one out to create an account is perfectly reasonable. It only takes 2-3 seconds, and they only need to do it once.
If you're not willing to do that, you're going to have to put up with spam so long as your site is indexed in search engines.

Prevent not sort out the spam

Yes, CAPTCHAs are not user-friendly. There are a few techniques that you can use to prevent spams without using CAPTCHAs which some of them have been already mentioned by others:
Smarter Server-side Validation: This is specific to the form but for example in contact us form you can filter lengthy messages or messages including a lot of URLs. Or if you expect to get an email you can ping the domain.
Blacklist Mechanism: flag spammers by IP or phrases in a blacklist database. If you're using PHP a simple library like Guard can be helpful
Honeypots: This is already mentioned in the accepted answer
Time-based Protection: To check time to post a request is more than X seconds
Score-based Google reCAPTCHA v3: This version is totally re-designed compared to the previous one and detect spams behind the scene.
I've written a post recently and you can find more in depth there.

How do I protect my forum against spam?

I have a forum on a website I master, which gets a daily dose of pron spam. Currently I delete the spam and block the IP. But this does not work very well. The list of blocked IP's is growing quickly, but so is the number of spam posts in the forum.
The forum is entirely my own code. It is built in PHP and MySQL.
What are some concrete ways of stopping the spam?
Edit
The thing I forgot to mention is that the forum needs to be open for unregistered users to post. Kinda like a blog comment.

In a guestbook app I wrote, I implemented two features which prevent most of the spam:
Don't allow POST as the first request in a session
Require a valid HTTP Refer(r)er when posting

One way that I know which works is to use JavaScript before submitting the form. For example, to change the method from GET to POST. ;) Spambots are lousy at executing JavaScript. Of course, this also means that non-Javascript people will not be able to use your site... if you care about them that is. ;) (Note: I don't)

In my experience, the best easy defenses come from just doing something "non-standard". If you make your site non-standard, this makes it so that any automated spam would have to be coded specifically for your site, which (no offense) probably isn't worth the effort. Note that if the spam is coming from human spammers, there's not really anything you can do that won't also stop legitimate posters. So the goal is to find a solution that will throw away any "standard" posts - that is, "fill out the whole form and push submit".
A couple examples that come to mind of things that you could try:
Have a hidden form field with a name that sounds like something a spammer would want to fill out, like "website" or "homepage" or something like that. If the form field gets filled out, throw away the message instead of posting it, because it was a bot automatically filling in the whole form, even invisible fields.
You don't have to use a "real" captcha, but even something simple like "Enter the following word backwards: <random backwards word>" or "What is the domain name of this website?". Easy for a human to do, but it would require a fairly complex bot to figure out what to fill in.

You could use a captcha, there are some good scripts like PHPCaptcha or use a spam control service, like Akismet, they have a PHP API.

You might want to look at this question, which has several answers that describe how you could implement a non-intrusive captcha.
Another thing to consider is to require time between posts to prevent massive spamming.

Include a CAPTCHA that is always "orange".

The spams may be by bots or humans - bots are more likely.
To stop the bots, put in a hidden field populated by Javascript - there is a 99.5% chance that a standard, stupid bot that isn't customised to your site will fail to fill that in.
If they fail to fill it in correctly, give them a message that Javascript is required or something, and give them an opportunity to post some other way (e.g. with a captcha or registration). That way anonymous users who aren't spambots can (mostly) still post with no problems, and most spambots (which haven't been tailored for your specific site) won't.
Don't bother blacklisting IP addresses or using third party blacklists, that will just generate false positives. Almost all bots use the same IP addresses as (some) legitimate users.
Another trick is to put in a text field with a plausible sounding name, which is made difficult to see with CSS - anyone filling this field in with anything is considered to be a bot.

Advanced solutions:
Akismet
Defensio
Sblam! (open source clone of the above)
You can try your luck with non-standard form:
fields that must stay empty hidden with CSS
fields with misleading names, e.g. <input name=email> for something that is not an e-mail.
For me CAPTCHA is like giving up to spammers and letting them damage your forum anyway – except that instead of spam damage, you get usability and accessibility damage.

Something I've found to be surprisingly effective: disallow comments that contain too many URLs (more than, say, 5). Since doing that, I've had zero comment spam.
Edit: Since writing the above, I've had recurring comment spam with only one link. I have now added some honeypot fields and have had no commend spam for a few months now.

Don't let anybody post until they respond to an email sent to their registered email address. You'll see lots of forums and mailing lists generate a unique email address or web url that is sent to the new user's given email address, and they have to respond to the email or click on the link to finalize their registration.

Captcha is definitely the easiest method - try KittenAuth if you want something bot-proof (Although I got pandas this time)
Kitten Auth

There is no single answer since Spam is really a matter of economics: how much is it worth it to someone to put their stuff onto the web. There, however, some solutions that seem pretty good
Recaptcha
Use CCS to create an invisible
field that robots fill-in
Create a time-specific hidden field in your form so the
robot can't use the same form over and over again.

I want to say that in most time, a CAPTCHA is enough for you to prevent SPAMers.
But do use a strong one, like http://www.captcha.net/.
Remember that SPAMers do not want to spend much time to deal with a particular site(except heavy traffic sites), they use a tool to post AD on a lot of sites. So make your FORM a little unusual, (e.g. give the user a image says '1.5+2.4=?' and let users to answer, this will block most of the spam tools :) )

The easiest thing I've done to stop spammers with (so far) 100% consistency is to validate the text that was submitted. If you use the php function strstr() to check for "a href" or even a non-clickable http or www, you can then just reroute the spammer elsewhere. I actually have a script then write to my .htaccess file to deny the offending IP address. Not sure if there's any other kind of spam to be concerned about, but links are all I've seen so far.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.