Blocking comment spam without using captcha [closed] - php

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
What are some non-captcha methods for blocking spam on my comments?

In my experience the currently most effective methods are honeypot input fields that are made invisible to users via CSS (best use several different methods, such as visibility:hidden, setting a size of 0 pixels, and absolute positioning far outside the browser window); if they're filled anyway you can assume it's a spambot.
This blog describes a rather complex method that I've tried out myself (with 100% success so far), but I suspect that you could get the same result by skipping all the stuff with hashed field names and just add some simple honeypot fields.

1) Adding session-related information into the form Example:
<input type="hidden" name="sh" value="<?php echo dechex(crc32(session_id())); ?>" />
then at postback, check whether session is valid or not.
2) Javascript-only. Use Javascript injection at Submission. Example:
<input type="hidden" id="txtKey" name="key" value="" />
<input type="submit" value="Go" onclick="document.getElementById('txtKey').value = '<?php echo dechex(crc32(session_id())) ?>';" />
3) Time-limit per IP, User or Session. this is quite straightforward.
4) Randomizing field names:
<?php
$fieldkey = dechex(crc32(mt_rand().dechex(crc32(time()))));
$_SESSION['fieldkey'] = $fieldkey;
?>
<input type="text" name="name<?php echo $fieldkey; ?>" value="" />
<input type="text" name="address<?php echo $fieldkey; ?>" value="" />
Then you can check it over at the server side.

Akismet has an API. Someone wrote a wrapper class (BSD liscense) for it over at: http://cesars.users.phpclasses.org/browse/package/4401.html
There's also a Bayesian filter class (BSD Liscense as well)
http://cesars.users.phpclasses.org/browse/package/4236.html

This is simple trick to block spam bot or brute force attack without using captcha.
Put this in your form:
<input type="hidden" name="hash" value="<?php echo md5($secret_key.time()).','.time(); ?>" />
Put this in your php code
$human_typing_time = 5;/** page load (1s) + submit (1s) + typing time (3s) */
$vars = explode(',', $_POST['hash']);
if(md5($secret_key.$vars[1]) != $vars[0] || time() < $var[1] + $human_typing_time){
//bot?
exit();
}
Depend on weight of form you can increase or decrease $human_typing_time.

Naive Beyesian filters, of course:
http://blog.liip.ch/archive/2005/03/30/php-naive-bayesian-filter.html

There is the Honey Pot Theory as well. I enjoy coupling honey pots with other forms of spam reduction for best results.
http://www.projecthoneypot.org/

Another common approach is to give the user a simple question ("is fire hot or cold?" "what is 2 plus 7?" etc.). It is a little captcha-like, but it is more accessible to users with vision disabilities using screen readers. I think there must be a WordPress plugin that does this, because I see it very frequently on WordPress blogs.

As lot of people already proposed : use a honey pot input field. But there are two other things you need to do.
First, randomize the name / id of which input field is the honey pot. Store the state of usefull fields in session (as well as a form token, used against CSRF attacks). For exampe, you have these fields to get : name, email, message. In your form, you will have
"token" which is your token, "jzefkl46" which is name for this form, "ofdizhae" for email, "45sd4s2" for message and "fgdfg5qsd4" for honey pot.
In the user session, you can have something like
array("forms" => array("your-token-value" => array("jzefkl46" => "name",
"ofdizhae" => "email",
"45sd4s2" => "message",
"fgdfg5qsd4" => honey"));
You just have to re-associate it back when you get your form data.
Second thing, as the robot has lot of chances to avoid your honey pot field (25% chances), multiply the number of pots. With 10 or 20 of them, you add difficulty to the bots while not having too much overhead in your html.

Sblam! is an open-source filter similar to Akismet.
It uses naive bayesian filtering, checks sender's IP and links in multiple distributed blacklists, checks correctness of HTTP requests, and uses presence of JS as a hint (but not requirement).

Regular CAPTCHAs are spam-bot solvable now.
Consider instead "text CAPTCHAs" : a logic or common knowledge question, like "What's 1 + 1 ?" or "What color is General Custard's white horse?" The question can even be static (same question for every try).
(Taken from http://matthewhutchinson.net/2010/4/21/actsastextcaptcha )
I think Jeff Atwood even uses a validation like this on his blog. (Correct me if I'm wrong)
Some resources:
Text Captcha site & services : http://textcaptcha.com/demo
A plugin : http://matthewhutchinson.net/2010/4/21/actsastextcaptcha
More about text Captcha's with non-working code : http://www.thesamet.com/blog/2006/12/21/fighting-spam-on-phpbb-forums/

You could try looking at using a third party like Akismet. API keys are free for personal use. Also, The Zend Framework has a package for this.

Most bots simply fill out the whole form and send it to you. A simple trick that works is to create a normal field that you usually hide with the aid of javascript. On the server side just check whether this field has been filled. If so -- then it is spam for sure.

Disallow links. Without links, spam is useless.
[EDIT] As a middle way, only allow links to "good" sites (usually your own). There are only a handful of them, so you can either add them at the request of your users or hold a comment until you verified the link. When it's good, add it.
After a while, you can turn this off and automatically reject comments with links and wait for users to complain.

I have reduced about 99% of spam on my website through a simple mathematical question like the following:
What is 2+4 [TextBox]
The user will be able to submit the question/comment if they answer "6".
Works for me and similar solution works for Jeff Atwood from Coding Horror!

On my blog, I have a kind of compromise captcha: I only use a captcha if the post contains a link. I also use a honeypot input field. So far, this has been nearly 100% effective. Every now and then there will be a spammer that submits something to every form which contains no links (usually something like "nice site!"). I can only assume that these people think I will e-mail them to find out who they are (using the e-mail address that only I see).

along with using honey pot fields, we can ban there IP automatically (which don't work for dynamic IPs) and especially any links posted back by bots.

Akismet is a good alternative, they check your posts for spam and works very efficiently.
You just need to load their librabry.
http://akismet.com/development/

checkout some wp antispam plugins for examples and ideas
there're many nice antispam without using captcha.
some i'd recommend: hashcash, nospamnx, typepad antispam.
all these using different methods blocking spam and i use them all. hashcash+nospamnx block almost all spambot. and typepad antispam block most human typed spam.
these are also good ones: spambam, wp-spamfree, anti-captcha, bad-behaviour, httpbl, etc
also with simple .htaccess that block any bot direct POST that do not come from your own site (check referer)
or, simply outsource your comment system to disqus and sleep tight.

Related

How can I reduce spam posted via a simple comment / review system?

I know there is hundreds examples of the question I'm about to ask. But none of them was working for me like a wanted.
So, I have a textarea, in which people can add comments/ reviews. But the commenting box keeps getting spammed.
I guess the regular expression might be the most efficient way to keep spammers out, but I stink at Regex.
Is there any other way to keep the spam out?
Edit: the spammers keep posting something like that:
Brianna
Looking for work Lolita Pics it would of been better if she was fucking in front of the mirror!
its more sexy seeing yourself getting f##$.
just getting horny thinking about it
Preteens Nn Models omg if that
(spoilered, lightly censored to avoid causing folks problems at work)
So i want to block per hyperlink in string
There are many different ways to get rid of spam:
Captcha - for example ReCaptcha, but nowadays you can buy about ~1000 rewritten captcha for less than 3$.
Questions in your language about the most known facts - you can ask your users about some facts that they know, but spammers don't.
Antispam filters - for example Sblam!, Akismet or other anti-spam services. I think it would work best for you.
Alot of Captcha is now bot solvable, and if you're trying to avoid Captcha, then one quick suggestion is to use a simple Text trap.
Under your text area, add a question, such as;
"How many days are in a week?"
Then add another text box, and compare this to say;
7 or Seven etc.
If the test fails, then reject the entry...
You may need to vary your question over time, or even have a list of different questions, but this is a simple and easy method to implement.
The answers here are good, but sometimes fooling bots is a good first step.
The vast majority of bots just read the source code and will fill in all the input fields they can find with garbage, send the request, and then hope it worked. They are pretty stupid, so something like this may fool most bots:
<p style="display:none">Screen readers: Use the next textarea (the first is used to confuse spam bots).</p>
<textarea name="comment" style="display:none"></textarea>
<textarea name="real_comment"></textarea>
And then in your script:
if (isset($_POST['comment']) && strlen($_POST['comment']) > 0) {
die('Bots begone!');
}
$comment = $_POST['real_comment'];
In other words, put a dummy textarea in the HTML, hide it using CSS, and then wait for bots to fill it in.
The simpliest way to do what you want is to search for the string http://
The following if-statement allows up to 3 links in $text
if (substr_count($text,"http://") > 3)
But that's not really a sufficent check, because there is a lot of spam, which doesn't contain any links at all - just rubish.
So the second you have to do, is a black list with "bad words"
$lower = strtolower($text);
foreach ($blockword as $word) {
if (strpos($lower, strtolower($word))>0) {
//handle spam here..
}
}
and after all, you have to maintain a growing list with words and still have to delete a lot of spam..
So you have the option to add an invisible field with random values, which are stored in the session and check if this value is submited correctly
$_SESSION["random_secret"] = //create random string..
and later check
if ($_POST["secret"] == $_SESSION["random_secret"])
with this check, you get rid of a lot of automated spam(but still not all)
and so after all you ended up with captchas

PHP and post arrays

Is there anything bad in using post arrays for post variables?
<input type="text" id="stuff" name="stuff[text]"/>
instead of
<input type="text" id="stuff" name="stuff"/>
Tips when to use them?
No, there is no reason not to do this.
However, PHP is pretty much the only language which allows to create arrays like this - so if you ever change your backend to a different language you might have to change things.
It isn't a bad way. Typically programmers use array in such manner to post array with non-determined lenght. Still. Don't know for certain, but when You want to change method to GET then on IE <= 8 is a limit to 2048 chars in address lenght. And dynamic generated array can easy depleat this limit. On other browsers limit is much higher or there is none.
Another drawback of this method is that PHP will preceed correctly, but other server side languages may not. This isn't specified in official HTML docs, as far I know.
So it is more convinient to put it in a single cell in post array, that do subarray. If You want to do some namespacing, then You can write name in such way:
name="styff.text"
as do some of forum engines (for certain Vanilla 2 does).
If it has no diffrence to You I would stay to single variable name in html names. Mostly because of backend.
For tips how to use them I could recommend to use such array to cover dynamic generated content on site. Still it can be handled with normal names, but it is pretty ugly. If we have a case that You want to do a picture adding system, that I would name each input file with "pic[]" and on server iterate whole table.
The same thing for generating documents on client side. I would then do names like "content[][name]" "content[][type]" "content[][value]" and so on. Whathever I woud have in document part class I would throw in this kind of naming, and on server just check is set and do certain things for certain block of document.
This could be talked for a long time since every programmer have own technics and they tend to stick with it. For example I throw a in every form I have on site, and each action is parsed by a generall controller, and then passed by do specific controllers.
Nothing bad. If it is useful four you, use it.

How to avoid automated posting on classifieds? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
When the bots attack!
My free classified portal is coded in php. Now am facing sequence of ads which seems to be posted by tools like auto submitter. So I need to detect and avoid these type of ads by an automation which is done in any ways.
Have you considered using a "Captcha"?
http://www.captcha.net/
They are pretty easy to integrate and look like this:
There are several ways.
A simple one is to place a text box on the input form that says
Type the word "human": [ ]
If the trimmed uppercased version of that input != "HUMAN", then you either have a dumb human, or a dumb bot.
Obviously, this is super-easy for someone to defeat, but only if they take the time to code something for your site. In other words, it helps but is not foolproof.
On a more advanced level, integrate a recaptcha system, such as http://www.google.com/recaptcha
One of the simplest solutions would be honeypots, the idea is to include some fields with common names like email, address etc and hide them from user and use something less common for original names. If one of hidden fields aren't empty it was submitted by bot.
To make it more complicate you can dynamically encrypt every field names using md5 example:
$encodedFieldName = md5('email_address' . session_id());
Of course you need to create array with allowed fields and retrieve them back before submitting results to database.

Tracking Quiz Results with URL, No Database Allowed!

I need to create a 10 page quiz for a mobile browser. It is only a mobile webpage, so no considerations need to be taken for other browsers.
Here's the problem I'm having: I can't use JavaScript, because not every mobile browser supports it. I'm not very skilled in other languages, but I thought perhaps something could be done in PHP as it is server-side.
If my first URL is domain and I enter the correct quiz answer, the URL to the next page could be domain/?p=1. The URL doesn't need to do anything but hold a count of the number of correct results.
As for the actual code, I was thinking it could be included in the HTML itself, as I'm not very concerned about people viewing the source on their mobile phones.
Is it possible to write a line of code that increments the 'p=' attribute in the URL by one when clicked and only attach it to the correct answers?
Here's an image of what I mean: http://i.imgur.com/HbJ5U.jpg
And, what's to stop me from manually incrementing the "correct answer" counter in my address bar?
Do you not want to use a database because you don't have one available to you in your hosting, or because you don't know how?
I'm not a fan of the idea, but you can get the number of "correct answers" with the following code.
<?php
/* Gets current correct answer Count */
$answer_count = $_GET["p"];
/* checks to see if the submitted answer is the same as the correct answer */
if ($_POST["submitted-answer"] == "correct-answer") {
$answer_count++;
}
?>
Now, you just add the modified answer count to the link to the next question.
Next Question
If this is "just for fun" I don't see why you couldn't do it like this. It's definitely a simple way to solve the problem.
The standard way to do this is to store things in hidden form variables. Of course, if there is anything riding on this, that's a terrible way to do it, because it's really easy for the end user to put his own values in those hidden form values.
Aren't file-based sessions the obvious answer here?

best approach to analyze text in PHP?

I need to analyze a users' post and categorize it. For example: I have to categorize every post as a "buy" post or a "sell" post based on the text - "I'm looking to sell my house" is categorized as "sell". The problem is that often its not so simple - "I'm looking to get rid of my old house" also needs to be categorized as "sell". "I'm looking for a house" becomes "buy". I also would like to categorize these posts based on the item in question - for example, the post above would be categorized as "buy" and as "house".
Can anyone recommend a good approach / good framework / technique when it comes to analyzing and understanding user input?
Thanks.
What you're talking about is basically a Bayesian filtering problem, also used for spam filtering. See also this talk. It's a reasonably complicated area.
You're right; it's a hard thing to do.
Yahoo! has a Term Extraction API/Web service you can use. It's a pretty good way to use language analysis on your own text without writing a million lines of code to do it yourself. I haven't used it, so I've no idea how well it works with similar meanings, as your question asks.

Categories