Preventing spam bots on site?

Preventing spam bots on site? - php

We're having an issue on one of our fairly large websites with spam bots. It appears the bots are creating user accounts and then posting journal entries which lead to various spam links.
It appears they are bypassing our captcha somehow -- either it's been cracked or they're using another method to create accounts.
We're looking to do email activation for the accounts, but we're about a week away from implementing such changes (due to busy schedules).
However, I don't feel like this will be enough if they're using an SQL exploit somewhere on the site and doing the whole cross site scripting thing. So my question to you:
If they are using some kind of XSS exploit, how can I find it? I'm securing statements where I can but, again, its a fairly large site and it'd take me awhile to actively clean up SQL statements to prevent XSS. Can you recommend anything to help our situation?

1) As mentioned above reCAPTCHA is a good start.
2) Askimet is a great way to flag spam before it is published. It's what Wordpress uses to stop spam and it is extremely effective. You then could reject or queue the entry for moderation based on the results. It's API is ridiculously easy to use, too. (I have PHP code if you need it). You might need a commercial license although I am sure you can get started using the free version.
3) Verification of email addresses is definitely a good idea as it requires a valid email account which many spammers do not have. Just make sure you make verifying the email address easy as if it is too difficult it can turn legitimate users away as well.

If the bots were exploiting a hole in a script somewhere, there should be evidence of that in the logs. Check for direct POSTs to user creation scripts and the journal entry creation scripts without the usual "normal" surfing activity prior to the hit: The bots may have trolled the site only once and are bypassing the step of pulling down the forms and pretending to fill them in. Look for GET requests with obvious XSS-type data in the query strings.
You could also embed a random token in an hidden field within the forms and require that token to be present for the activation/posting to go through. If the bots only parsed your signup scripts once and are doing direct posts, this will stop them in their tracks until the bot creators catch on and look for the token. But it would give you some breathing space to implement a better system.
If your user account tables don't have some kind of time-of-creation time stamp on them, put one in and have the server create the timestamp, not your user scripts. This way you can narrow down the time period(s) to scan the logs for bot activity and see what they're doing. And if nothing else, you could block the IPs the bots are posting from.

I'm surprised that someone can advise Akismet and it is accepted as answer:
Akismet engagement is illegal in EU as infringing privacy protection laws;
Any blacklisting system helps criminals, making internet unusable by legit users;
Systems collecting spam to analyze it are doomed to act retroactively always lagging behind spammer techniques advances and bot development;
Why to accumulate and analyse spam fed by bots instead of blocking spam bots?

Related

Email spams php function getting bypassed

I have just added a CAPTCHA to a page to block spams but we are getting spams as usual.
The website is using Html, Php, Javascript and unsecured http only and nothing else.
I am generating and comparing captchas in Php using if statement. I am also adding both the captchas (generated and typed) in a comment for testing. So while genuine mails are received with both the generated and typed captcha. In the spams mails both capchas are blank (the spammers are at work so mystery and confusion).
I have checked all the files on website they are exactly as I had uploaded. I do not understand what spammers have done and how?
Some guidelines are needed. So, I can start studying books and websites.

You can basically chaulk that up to...
https://dynomapper.com/blog/514-online-captcha-solving-services-and-available-captcha-types
https://github.com/imagetyperz-api/imagetyperz-api-nodejs
https://github.com/bestcaptchasolver/bestcaptchasolver-php
The list goes on and on but I think you get the idea.
Your going to want to stack methods of spam prevention that don't annoy real users.
https://www.lifewire.com/solutions-to-protect-web-forms-from-spam-3467469
Good list there, my personal goto has always been honeypots, just a hidden field that looks and feels like a real field, but if someone fills it out you immediately know its a bot. Maybe throw it 9999px of the page to the right, but to a bot they still see it right under the other fields in your code.
Also if you are bored and have a bit of time and another great way of banning most of the current active botnets from your site, or maybe setting up a seperate site on the same hosting provider just to harvest all the active IPs of botnets trolling your hosting providers IP range.
Make a robots.txt file like this
User-agent: *
Disallow: /secret/
Any honest bot like google bot won't follow the path to /secret/ on your domain, but let me tell you that if its a bot you do not want on your site one of the first things its going to be programmed to do is check files like this for queues on where the paths on your site are, especially private ones.
Then just set up a script to automatically IP ban all traffic to /secret
Now odviously this is doing to end up banning some legit people who end up using the IP after the bot etc, etc. But honestly tell me it doesn't sound like a fun idea.

Would it be a good idea to set up a newsletter sign-up form in simple PHP?

I would like to add a form that allows my site visitors to sign up for a newsletter from my site.
I started to build the form, but apparently it's not such a good idea and instead I should use prebuilt scripts by other people to do this as they would be more secure.
Would it really be all that bad to make my own and just have it query all the signups into a database? I don't want it to be all that complex, just need something simple, but I definitely don't want to jeopardize the security of people signing up.
Thanks!

You can do it yourself if you know how to write it correctly.
The reason why people use tried and tested code is because you may be black listed by email services due to large volume of emails you would be sending. Once you are black listed then it is difficult to get yourself off the black list. The tried & tested code SHOULD send the newsletters out periodically rather then mass emailing at once.
Other reasons is due to security, if you write the code incorrectly or if you are new to forms which access databases, then it may not be sensible to write your own. It is good practice to learn it though, even if in the mean time you have to use someone else's code.

You also have to verify emails addresses by a confirmation request email and an associated form. In addition if you have a larger list (more than a few hundred members), then you must code bounce handling and unsubscriptions. For a small list you can do these manually. I did build such a web application step-by-step, but if I had to start it now, than I would consider an existing mail list software more seriously. At first I also thought that I only need a small form, and the list managers I checked was indeed difficult to integrate, but in the long run it was too much effort.

Detect if user is human without captcha or useragent

I've a website where I'm providing email encryption to users and I'm trying to figure out if there's a way to detect if a user is human or a bot.
I've been digging into $_SESSION in php but it's easy to bypass, I'm also not interested in captcha, useragent or login solutions, any idea of what I need ?
There are other questions very similar to this one in SO but I couldn't find any straight answer...
Any help will be very welcome, thank you all !

This is a hard problem, and no solution I know of is going to be 100% perfect from a bot-defending and usability perspective. If your attacker is really determined to use a bot on your site, they probably will be able to. If you take things far enough to make it impractical for a computer program to access anything on your site, it's likely no human will want to either, but you can strike a good balance.
My point of view on this is partially as a web developer, but more so from the other side of things, having written numerous web crawler programs for clients all over the world. Not all bots have malicious intent, and can be used for things from automating form submissions to populating databases of doctors office addresses or analyzing stock market data. If your site is well designed from a usability standpoint, there should be no need for a bot that "makes things easier" for a user, but there are cases where there are special needs you can't plan for.
Of course there are those who do have malicious intent, which you definitely want to protect your site against as well as possible. There is virtually no site that can't be automated in some way. Most sites are not difficult at all, but here are a few ideas off the top of my head, from other answers or comments on this page, and from my experience writing (non-malicious) bots.
Types of bots
First I should mention that there are two different categories I would put bots into:
General purpose crawlers, indexers, or bots
Special purpose bots, made specifically for your site to perform some task
Usually a general-purpose bot is going to be something like a search engine's indexer, or possibly some hacker's script that looks for a form to submit, uses a dictionary attack to search for a vulnerable URL, or something like this. They can also attack "engine sites", such as Wordpress blogs. If your site is properly secured with good passwords and the like, these aren't usually going to pose much of a risk to you (unless you do use Wordpress, in which case you have to keep up with the latest versions and security updates).
Special purpose "personalized" bots are the kind I've written. A bot made specifically for your site can be made to act very much like a human user of your site, including inserting time delays between form submissions, setting cookies, and so on, so they can be hard to detect. For the most part this is the kind I'm talking about in the rest of this answer.
Captchas
Captchas are probably the most common approach to making sure a user is humanoid, and generally they are difficult to automatically get around. However, if you simply require the captcha as a one-time thing when the user creates an account, for example, it's easy for a human to get past it and then give their shiny new account credentials to a bot to automate usage of the system.
I remember a few years ago reading about a pretty elaborate system to "automate" breaking captchas on a popular gaming site: a separate site was set up that loaded captchas from the gaming site, and presented them to users, where they were essentially crowd-sourced. Users on the second site would get some sort of reward for each correct captcha, and the owners of the site were able to automate tasks on the gaming site using their crowd-sourced captcha data.
Generally the use of a good captcha system will pretty well guarantee one thing: somewhere there is a human who typed the captcha text. What happens before and after that depends on how often you require captcha verification, and how determined the person making a bot is.
Cell-phone / credit-card verification
If you don't want to use Captchas, this type of verification is probably going to be pretty effective against all but the most determined bot-writer. While (just as with the captcha) it won't prevent an already-verified user from creating and using a bot, you can verify that a human being created the account, and if abused block that phone number/credit-card from being used to create another account.
Sites like Facebook and Craigslist have started using cell-phone verification to prevent spamming from bots. For example, in order to create apps on Facebook, you have to have a phone number on record, confirmed via text message or an automated phone call. Unless your attacker has access to a whole lot of active phone numbers, this could be an effective way to verify that a human created the account and that he only creates a limited number of accounts (one for most people).
Credit cards can also be used to confirm that a human is performing an action and limit the number of accounts a single human can create.
Other [less-effective] solutions
Log analysis
Analyzing your request logs will often reveal bots doing the same actions repeatedly, or sometimes using dictionary attacks to look for holes in your site's configuration. So logs will tell you after-the-fact whether a request was made by a bot or a human. This may or may not be useful to you, but if the requests were made on a cell-phone or credit-card verified account, you can lock the account associated with the offending requests to prevent further abuse.
Math/other questions
Math problems or other questions can be answered by a quick google or wolfram alpha search, which can be automated by a bot. Some questions will be harder than others, but the big search companies are working against you here, making their engines better at understanding questions like this, and in turn making this a less viable option for verifying that a user is human.
Hidden form fields
Some sites employ a mechanism where parameters such as the coordinates of the mouse when they clicked the "submit" button are added to the form submission via javascript. These are extremely easy to fake in most cases, but if you see in your logs a whole bunch of requests using the same coordinates, it's likely they are a bot (although a smart bot could easily give different coordinates with each request).
Javascript Cookies
Since most bots don't load or execute javascript, cookies set using javascript instead of a set-cookie HTTP header will make life slightly more difficult for most would-be bot makers. But not so hard as to prevent the bot from manually setting the cookie as well, once the developer figures out how to generate the same value the javascript generates.
IP address
An IP address alone isn't going to tell you if a user is a human. Some sites use IP addresses to try to detect bots though, and it's true that a simple bot might show up as a bunch of requests from the same IP. But IP addresses are cheap, and with Amazon's EC2 service or similar cloud services, you can spawn a server and use it as a proxy. Or spawn 10 or 100 and use them all as proxies.
UserAgent string
This is so easy to manipulate in a crawler that you can't count on it to mark a bot that's trying not to be detected. It's easy to set the UserAgent to the same string one of the major browsers sends, and may even rotate between several different browsers.
Complicated markup
The most difficult site I ever wrote a bot for consisted of frames within frames within frames....about 10 layers deep, on each page, where each frame's src was the same base controller page, but had different parameters as to which actions to perform. The order of the actions was important, so it was tough to keep straight everything that was going on, but eventually (after a week or so) my bot worked, so while this might deter some bot makers, it won't be useful against all. And will probably make your site about a gazillion times harder to maintain.
Disclaimer & Conclusion
Not all bots are "bad". Most of the crawlers/bots I have made were for users who wanted to automate some process on the site, such as data entry, that was too tedious to do manually. So make tedious tasks easy! Or, provide an API for your users. Probably one of the easiest way to discourage someone from writing a bot for your site is to provide API access. If you provide an API, it's a lot less likely someone will go to the effort to create a crawler for it. And you could use API keys to control how heavily someone uses it.
For the purpose of preventing spammers, some combination of captchas and account verification through cell numbers or credit cards is probably going to be the most effective approach. Add some logging analysis to identify and disable any malicious personalized bots, and you should be in pretty good shape.

My favorite way is presenting the "user" with a picture of a cat or a dog and asking, "Is this a cat or a dog?" No human ever gets that wrong; the computer gets it right perhaps 60% of the time (so you have to run it several times). There's a project that will give you bunches of pictures of cats and dogs -- plus, all the animals are available for adoption so if the user likes the pet, he can have it.
It's a Microsoft corporate project, which puts me in a state of cognitive dissonance, as if I found out that Harry Reid likes zydeco music or that George Bush smokes pot. Oh, wait...

I've seen/used a simple arithmetic problem with written numbers ie:
Please answer the following question to prove you are human:
"What is two plus four?"
and similar simple questions which require reading:
"What is man's best friend?"
you can supply an endless stream of questions, should the person attempting access be unfamiliar with the subject matter, and it is accessible to all readers, etc.

There's a reason why companies use captchas or logins. As ugly of a solution as captchas are, they're currently the best (most accurate, least disruptive to users) way of weeding out bots. If a login solution doesn't work for you, I'm afraid the only realistic solution is a captcha.

If users will be filling in a form, honeypot fields are simple to implement, and can be reasonably effective, but nothing is perfect. Create one or more hidden fields in the form, and if they contain anything when the form is posted, reject the form. Spambots will usually attempt to fill in everything.
You do need to be aware of accessibility. Hidden fields probably won't be filled in by those using a standard browser (where the field is not visible), but those using screen readers may be presented with the field. Be sure to label it correctly so that these users do not fill it in. Perhaps with something like "Please help us to prevent spam by leaving this field empty". Also, if you do reject the form, be sure to reject it with helpful error messages, just in case it has been filled in by a human.

I suggest getting the Growmap Anti Spambot Wordpress plugin and seeing what code you can borrow from it or just using the same technique. I've found this plugin to be very effective for curtailing automated spam on my WordPress sites and I've started adapting the same technique for my ASP.NET sites.
The only thing it doesn't deal with are human cut-and-paste spammers.

Preventing Spam

How do you stop bots on a page which is accessible to registered users only? 90% page is accessed by real users and 10% are bot.
I do not want to put captcha or verification method on the page because I know that my users wont like this and they lazy also.
Please share your ideas
Edit
I want to make this question more clear
Registration page has captcha
My site allows users to submit contents in other words its UGC site. Spammers copy other users content and put them on my site so blocking them via askimet is not possible.
Possible Solution
Just got one thing in my mind.
When user click on submit button server will generate a random number (using javascript) which will be then used in hidden field for verification.
Do you think this solution is practically applicable?

One trick I like to use is to add a hidden input field to my forms that a real user would never see or change, but that a bot would blindly fill out.
Something like
<input name="spam_stopper" value="DO NOT CHANGE THIS" style="display:none;"/>
and then, in your form handling code, make sure the value of spam_stopper is "DO NOT CHANGE THIS".
A smart bot may ignore display:none, but that's not too likely - many do ignore <input type="hidden"> though, so I wouldn't use that...

Given you have excluded captcha (which isn't 100% bulletproof), you need to check what your users type and allow or forbid their postings.
This task isn't going to be an easy one, so I would suggest to turn your attention to ready-made solutions such as Akismet.

Since these bots don't follow robots.txt, you can always block them with an .htaccess, but it's lot of work (need to maintain the block list) since bots/spammers often change IPs. You also risk to block genuine users.
You can see Block Bad Bots for an example.
It can be useful but it's often too much work to block all of them VS let's say a CAPTCHA or similar system.

Firstly, do you do human-verification on sign-up? That's the first step you should take to prevent spam on your site. Captchas are very effective, and even if you don't want to make users answer a captcha each time they post on the site, having them fill one out to create an account is perfectly reasonable. It only takes 2-3 seconds, and they only need to do it once.
If you're not willing to do that, you're going to have to put up with spam so long as your site is indexed in search engines.

Prevent not sort out the spam

Yes, CAPTCHAs are not user-friendly. There are a few techniques that you can use to prevent spams without using CAPTCHAs which some of them have been already mentioned by others:
Smarter Server-side Validation: This is specific to the form but for example in contact us form you can filter lengthy messages or messages including a lot of URLs. Or if you expect to get an email you can ping the domain.
Blacklist Mechanism: flag spammers by IP or phrases in a blacklist database. If you're using PHP a simple library like Guard can be helpful
Honeypots: This is already mentioned in the accepted answer
Time-based Protection: To check time to post a request is more than X seconds
Score-based Google reCAPTCHA v3: This version is totally re-designed compared to the previous one and detect spams behind the scene.
I've written a post recently and you can find more in depth there.

Blocking spam in a PHP site without bothering the user

I'm currently working on a little chat/forum site that I roughed out in a weekend, and it has anonymous entries (i.e.: no usernames or passwords). This looks like it could be easy-cake for a spammer to ruin, but I don't want to bother the user with captchas or similar anti-spam inputs.
Are there any invisible-to-the-user alternatives to these? Thanks for your help.

One thing you should know about spammers is they always go for the low-hanging fruit. Same with hackers. By this I mean they'll pick the easiest to hit targets that affect the most users. This is why PHP and Windows vulnerabilities are often exploited: they affect so many users that if you find such a weakness/exploit your target "market" is huge.
It's also a big part of the reason why Linux and Mac OSs remain relatively unscathed by viruses for example: the target market is much smaller than Windows. Now I'm not equating the security and robustness of Windows with Mac/Linux but even though the security model of the latter two is much better the number of attacks against the former is still disproportionate with the deficiencies it has.
I say this because one of the best ways to avoid these kinds of problems is not to use popular softare. phpBB for example has had lots of attacks made against it just because it's so popular.
So by doing your own chat/forum system you're at a disadvantage because you have a system that doesn't have the field-testing something popular does but you also have an advantage in that it isn't worth most spammer's time to exploit it. So what you need to watch out for is what can automated systems do against you. Contact forms on Websites tend to have recognizable markers (like name, email and comment fields).
So I would advise:
Ignoring responses that come within say 5-10 seconds of sending the form to the user;
Using a honeypot (CSS/JS hidden fields as described elsewhere);
Using Javascript where applicable to render, reorder or display the form;
Using non-predictable form field names; and
Throttle bad responses by IP.

Not a bomb-proof solution, but you can have some hidden input fields. If those are not left empty, you caught a bot.
Bots tend to fill all input fields, while users will sure leave fields they don't see empty.

This has worked 100% of the time for me:
<input type="text" style="display:none" name="email" value="do not fill this in it is for spam catching" />
Then server side (PHP):
if($_POST['email'] != 'do not fill this in it is for spam catching') {
// spam
}
As mentioned earlier, most bots fill everything in, especially inputs named "email".

The idea of capchas is that they are very easy for humans to pass but very diffucult for bots etc. to avoid. If you don't want this kind of solution what will keep those spam-bots from posting to your site?
It's like you would like your computer to be safe but you don't want to use an antivirus and firewall.
I think you could create a session for every user that enters your site and first time they want to post something show them the capcha (don't require to log in, just pass capcha). If they pass it just store a flag in session that they are human. As long as they have their browser opened they can post and reply on your site what they want. Bots will unlikely pass this first test.

There are two classes of anti-spam protection.
The first is to make it difficult for automated bots to stumble data into your site. The hidden form field method is frequently mentioned for this, and is suitable for low traffic sites. These protections can be trivially defeated by a spam bot written for your site. However if you are too small a target, this won't happen.
The second is the "bothersome" types. This usually involves a captcha, registration, or email confirmation of post. You can use a few approaches to make this less bothersome, but requires much more effort on the bot's side to post spam.
Note that both of these approaches can often impede disabled and mobile users.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.