BOT/Spider Trap Ideas

BOT/Spider Trap Ideas - php

I have a client whose domain seems to be getting hit pretty hard by what appears to be a DDoS. In the logs it's normal looking user agents with random IPs but they're flipping through pages too fast to be human. They also don't appear to be requesting any images. I can't seem to find any pattern and my suspicion is it's a fleet of Windows Zombies.
The clients had issues in the past with SPAM attacks--even had to point MX at Postini to get the 6.7 GB/day of junk to stop server-side.
I want to setup a BOT trap in a directory disallowed by robots.txt... just never attempted anything like this before, hoping someone out there has a creative ideas for trapping BOTs!
EDIT: I already have plenty of ideas for catching one.. it's what to do to it when lands in the trap.

You can set up a PHP script whose URL is explicitly forbidden by robots.txt. In that script, you can pull the source IP of the suspected bot hitting you (via $_SERVER['REMOTE_ADDR']), and then add that IP to a database blacklist table.
Then, in your main app, you can check the source IP, do a lookup for that IP in your blacklist table, and if you find it, throw a 403 page instead. (Perhaps with a message like, "We've detected abuse coming from your IP, if you feel this is in error, contact us at ...")
On the upside, you get automatic blacklisting of bad bots. On the downside, it's not terribly efficient, and it can be dangerous. (One person innocently checking that page out of curiosity can result in the ban of a large swath of users.)
Edit: Alternatively (or additionally, I suppose) you can fairly simply add a GeoIP check to your app, and reject hits based on country of origin.

What you can do is get another box (a kind of sacrificial lamb) not on the same pipe as your main host then have that host a page which redirects to itself (but with a randomized page name in the url). this could get the bot stuck in a infinite loop tieing up the cpu and bandwith on your sacrificial lamb but not on your main box.

I tend to think this is a problem better solved with network security more so than coding, but I see the logic in your approach/question.
There are a number of questions and discussions about this on server fault which may be worthy of investigating.
https://serverfault.com/search?q=block+bots

Well I must say, kinda disappointed--I was hoping for some creative ideas. I did find the ideal solutions here.. http://www.kloth.net/internet/bottrap.php
<html>
<head><title> </title></head>
<body>
<p>There is nothing here to see. So what are you doing here ?</p>
<p>Go home.</p>
<?php
/* whitelist: end processing end exit */
if (preg_match("/10\.22\.33\.44/",$_SERVER['REMOTE_ADDR'])) { exit; }
if (preg_match("Super Tool",$_SERVER['HTTP_USER_AGENT'])) { exit; }
/* end of whitelist */
$badbot = 0;
/* scan the blacklist.dat file for addresses of SPAM robots
to prevent filling it up with duplicates */
$filename = "../blacklist.dat";
$fp = fopen($filename, "r") or die ("Error opening file ... <br>\n");
while ($line = fgets($fp,255)) {
$u = explode(" ",$line);
$u0 = $u[0];
if (preg_match("/$u0/",$_SERVER['REMOTE_ADDR'])) {$badbot++;}
}
fclose($fp);
if ($badbot == 0) { /* we just see a new bad bot not yet listed ! */
/* send a mail to hostmaster */
$tmestamp = time();
$datum = date("Y-m-d (D) H:i:s",$tmestamp);
$from = "badbot-watch#domain.tld";
$to = "hostmaster#domain.tld";
$subject = "domain-tld alert: bad robot";
$msg = "A bad robot hit $_SERVER['REQUEST_URI'] $datum \n";
$msg .= "address is $_SERVER['REMOTE_ADDR'], agent is $_SERVER['HTTP_USER_AGENT']\n";
mail($to, $subject, $msg, "From: $from");
/* append bad bot address data to blacklist log file: */
$fp = fopen($filename,'a+');
fwrite($fp,"$_SERVER['REMOTE_ADDR'] - - [$datum] \"$_SERVER['REQUEST_METHOD'] $_SERVER['REQUEST_URI'] $_SERVER['SERVER_PROTOCOL']\" $_SERVER['HTTP_REFERER'] $_SERVER['HTTP_USER_AGENT']\n");
fclose($fp);
}
?>
</body>
</html>
Then to protect pages throw <?php include($DOCUMENT_ROOT . "/blacklist.php"); ?> on the first line of every page.. blacklist.php contains:
<?php
$badbot = 0;
/* look for the IP address in the blacklist file */
$filename = "../blacklist.dat";
$fp = fopen($filename, "r") or die ("Error opening file ... <br>\n");
while ($line = fgets($fp,255)) {
$u = explode(" ",$line);
$u0 = $u[0];
if (preg_match("/$u0/",$_SERVER['REMOTE_ADDR'])) {$badbot++;}
}
fclose($fp);
if ($badbot > 0) { /* this is a bad bot, reject it */
sleep(12);
print ("<html><head>\n");
print ("<title>Site unavailable, sorry</title>\n");
print ("</head><body>\n");
print ("<center><h1>Welcome ...</h1></center>\n");
print ("<p><center>Unfortunately, due to abuse, this site is temporarily not available ...</center></p>\n");
print ("<p><center>If you feel this in error, send a mail to the hostmaster at this site,<br>
if you are an anti-social ill-behaving SPAM-bot, then just go away.</center></p>\n");
print ("</body></html>\n");
exit;
}
?>
I plan to take Scott Chamberlain's advice and to be safe I plan to implement Captcha on the script. If user answers correctly then it'll just die or redirect back to site root. Just for fun I'm throwing the trap in a directory named /admin/ and of coursed adding Disallow: /admin/ to robots.txt.
EDIT: In addition I am redirecting the bot ignoring the rules to this page: http://www.seastory.us/bot_this.htm

You could first take a look at where the ip's are coming from. My guess is that they are all coming from one country like china or Nigeria, in which case you could set up something in htaccess to disallow all ip's from those two countries, as for creating a trap for bots, i havent the slightest idea

Related

Limiting Resource Usage for an IP on a Shared Host

Recently, I had a single IP call the same page 15,000 times in quick succession which used up a lot of server resources (with resulting warning email from Host service). I am on a shared host so can't load new modules and therefore have no real options to truly limit bandwidth to an IP.
So, I'm trying to figure out how I can use the least amount of resources in spotting an offending IP and redirecting it to a 403 Forbidden page. I am already checking for common hacks and using Project HoneyPot, but to do this for each of the 15,000 page hits is not efficient (and, like this one, it doesn't catch them all).
I currently log access to each page to a mysql table called visitors. I can imagine a couple of ways to go about this:
Option 1: Using MySql:
1) Query the visitors table for the number of hits from the IP over the last 10 seconds.
2) If greater than a certain number (15?), flag the last entry in visitors as blocked for this IP.
3) With each subsequent page request, the query on the visitors table will show the ip as blocked and I can then redirect to 403 Forbidden page.
Option 2: Modifying an Include File on the fly which contains blacklisted IPs:
1) Include a file which returns an array of blacklisted IPs
2) If the current IP is not on the list, query the visitors table as in Option 1 to see if the number of hits from the IP over the last 10 seconds is greater than a certain number.
3) If the IP is offending, modify the include file to include this IP address as shown below.
In essence, my question is: Which uses more resources (x 15,000): a query to the database, or the code below which uses include to read a file and then array_search(), OR is there a better way to do this?
<?php
$ip = $_SERVER['REMOTE_ADDR'];
$filename='__blacklist.php';
if (file_exists($filename)){
// get array of excluded ip addresses
$array = (include $filename);
// if current address is in list, send to 403 forbidden page
var_dump($array);
if (is_array($array) && array_search($ip, $array) !== false){
blockAccess("Stop Bugging Me!");
}
} else {
echo "$filename does not exist";
}
// evaluate some condition which if true will cause IP to be added to blacklist - this will be a query to a MySql table determining number of hits to the site over a period of time like the last 10 seconds.
if (TRUE){
blockip($ip);
}
// debug - let's see what is blocked
// $array = (include $filename);
// var_dump($array);
// add the ip to the blacklist
function blockip($ip){
$filename='__blacklist.php';
if (! file_exists($filename)){
// create the include file
$handle = fopen($filename, "w+");
// write beginning of file - 111.111.111.111 is a placeholder so all new ips can be added
fwrite($handle, '<?php return array("111.111.111.111"');
} else {
// let's block the current IP
$handle = fopen($filename, 'r+');
// Don't use filesize() on files that may be accessed and updated by parallel processes or threads
// (as the filesize() return value is maintained in a cache).
// use fseek & ftell instead
fseek($handle, 0 ,SEEK_END);
$filesize = ftell($handle);
if ($filesize > 20){
// remove ); from end of file so new ip can be added
ftruncate($handle, $filesize-2);
// go to end of file
fseek($handle, 0 ,SEEK_END);
} else {
// invalid file size - truncate file
$handle = fopen($filename, "w+");
// write beginning of file with a placeholder so a new ip can be added
fwrite($handle, '<?php return array("111.111.111.111"');
}
}
//add new ip and closing of array
fwrite($handle, "," . PHP_EOL . '"' . $ip . '");');
fclose($handle);
}
function blockAccess($message) {
header("HTTP/1.1 403 Forbidden");
echo "<!DOCTYPE html>\n<html>\n<head>\n<meta charset='UTF-8' />\n<title>403 Forbidden</title>\n</head>\n<body>\n" .
"<h1>Forbidden</h1><p>You don't have access to this page.</p>" .
"\n</body>\n</html>";
die();
}
?>

There are a lot of points to address here.
Query vs Include
This is basically going to come down to the server its hosted on. Shared hosting is not known for good IO and you will have to test this. This also depends on how many IPs you will be blacklisting.
Blocking Malicious IPs
Ideally you don't want malicious users to hit PHP once you determine that they are malicious. The only real way to do that in a shared environment on apache is to block them from htaccess. Its not recommended but it is possible to modify htaccess from PHP.
Order Deny,Allow
Deny from xxx.xxx.xxx.xxx
Caching
The main concern I have from reading your question is that you seem to not understand the problem. If you are receiving 15,000 hits in a timeframe of seconds, you should not have 15,000 database connections and you should not have all of those requests hitting PHP. You need to cache these requests. If this is happening your system is fundamentally flawed. It should not be physically possible for 1 user on home internet to spike your resource usage that much.
Shared hosting is a bad idea
I suggest in your situation to get a VPS or something else that will allow you to use a reverse proxy and employ far more caching/blacklisting/resource monitoring.

How to validate phone numbers and stop injection headers - PHP Secure form

What else is needed to:
make this php script send an auto-response back?
sanitize and check the phone number and email that is not junk as my current formmail from dbmasters I get junk like dasawewdjz89)$%&*_sasa779%7fmsdls in almost every field including the input areas.
It is mentioned to take out the bcc and cc code, yet, I had code to sent to a different recipient based on the state, so is there a way to keep the bcc and cc fields too without compromising security?
Maybe this is 3 questions in 1, but this is essentially building upon the answer here
Replacing deprecated eregi() with stristr(). Is this php mail script secure from header injections? since it is a deprecated form and I get error logs each day now.
I believe I only need validation on input fields NOT select or radio fields, right?
I am an html/css guy so would this actual code go into the php page or as a separate contact.php page.
EDIT: The script I cannot post for some reason here with the code given (like in other forums). so I made a link to it in BOLD
..Validate without Javascript

To answer your questions:
Question 1: Don't quite understand what you mean here. Once you are in your script you can send output to the screen, generate and email, etc. This question is very vague.
Question 2: You can use regular expressions to validate various pieces of information. For example this will check a phone number in the format of XXX-XXX-XXXX and tell you if it is valid.
function validatePhone($number)
{
$test = "/^\d{3}-\d{3}-\d{4}$/";
return (preg_match($test, $number) != 0) ? true : false;
}
var_dump(validatePhone("815-555-1234"));
var_dump(validatePhone("8158791359"));
var_dump(validatePhone("blah blah 209#&$#)(##1;llkajsdf"));
This will produce:
bool(true)
bool(false)
bool(false)
Keep in mind this function is far from robust. Valid phone numbers in different formats will fail (e.g. 815 555-8846), so you will need to adjust the regexp or craft multiple regexps to meet your needs. But that should be enough to illustrate the process.
Question 3: For email, I don't really see how the BCC and CC fields are going to compromise security. What you need to focus on in that area is preventing email header injections.

Spammers have recently been using mail header injection to send spam e-mail from contact forms that have in the past viewed as secure.
If you are a webmaster you can edit your forums to ensure they are secure and safe from spammers
Anyway, I have several websites that all use a common contact form. Every contact form posts to the same script.
This is how I defend against header injections. (I typically use this script as an include file)
This script requires your html form to use action="post". Make sure this is only used on the script that the html form will be posted to. If you use this script on a regular page request, it will die().
More error checking should be done when testing posted values for bad strings. Possibly a regular expression.
<?php
// First, make sure the form was posted from a browser.
// For basic web-forms, we don't care about anything
// other than requests from a browser:
if(!isset($_SERVER['HTTP_USER_AGENT'])){
die("Forbidden - You are not authorized to view this page");
exit;
}
// Make sure the form was indeed POST'ed:
// (requires your html form to use: action="post")
if(!$_SERVER['REQUEST_METHOD'] == "POST"){
die("Forbidden - You are not authorized to view this page");
exit;
}
// Host names from where the form is authorized
// to be posted from:
$authHosts = array("domain.com", "domain2.com", "domain3.com");
// Where have we been posted from?
$fromArray = parse_url(strtolower($_SERVER['HTTP_REFERER']));
// Test to see if the $fromArray used www to get here.
$wwwUsed = strpos($fromArray['host'], "www.");
// Make sure the form was posted from an approved host name.
if(!in_array(($wwwUsed === false ? $fromArray['host'] : substr(stristr($fromArray['host'], '.'), 1)), $authHosts)){
logBadRequest();
header("HTTP/1.0 403 Forbidden");
exit;
}
// Attempt to defend against header injections:
$badStrings = array("Content-Type:",
"MIME-Version:",
"Content-Transfer-Encoding:",
"bcc:",
"cc:");
// Loop through each POST'ed value and test if it contains
// one of the $badStrings:
foreach($_POST as $k => $v){
foreach($badStrings as $v2){
if(strpos($v, $v2) !== false){
logBadRequest();
header("HTTP/1.0 403 Forbidden");
exit;
}
}
}
// Made it past spammer test, free up some memory
// and continue rest of script:
unset($k, $v, $v2, $badStrings, $authHosts, $fromArray, $wwwUsed);
?>

How to send a message securely via html and php (post)

I use a php code on my server to send messages to my clients. The programming tool I use (Game Maker) allows me to send messages via php by executing a shell so that the link appears in a browser.
Example is here ...
with all the other stuff added. So in effect, the message I'm sending and all the stuff I'm sending are seen in the browser. I use the php get method. everything works perfectly now, except that it may not be secured. Someone suggested php post method, but when I replaced get in my php cod on my server to post, and pasted the same thing in the browser, my code didn't work. It's hard to explain, but here's the php code on my server:
<?php
// Some checks on $_SERVER['HTTP_X_REFERRER'] and similar headers
// might be in order
// The input form has an hidden field called email. Most spambot will
// fall for the trap and try filling it. And if ever their lord and master checks the bot logs,
// why not make him think we're morons that misspelled 'smtp'?
if (!isset($_GET['email']))
die("Missing recipient address");
if ('' != $_GET['email'])
{
// A bot, are you?
sleep(2);
die('DNS error: cannot resolve smpt.gmail.com');
// Yes, this IS security through obscurity, but it's only an added layer which comes almost for free.
}
$newline = $_GET['message'];
$newline = str_replace("[N]","\n","$newline");
$newline = str_replace("[n]","\n","$newline");
// Add some last-ditch info
$newline .= <<<DIAGNOSTIC_INFO
---
Mail sent from $_SERVER[REMOTE_ADDR]:$_SERVER[REMOTE_PORT]
DIAGNOSTIC_INFO;
mail('info#site.com','missing Password Report',$newline,"From: ".$_GET['from']);
header( 'Location: http://site.com/report.html' ) ;
?>
I then call this php code on my site. so that in the end, the whole thing ends up in the browser address bar. I hope this makes sense. How do I make things more secured by using post so that at least the sent information cannot be seen in users history and all that.

If you replace to POST in your form you need to replace the request to POST too:
<?php
// Some checks on $_SERVER['HTTP_X_REFERRER'] and similar headers
// might be in order
// The input form has an hidden field called email. Most spambot will
// fall for the trap and try filling it. And if ever their lord and master checks the bot logs,
// why not make him think we're morons that misspelled 'smtp'?
if (!isset($_POST['email']))
die("Missing recipient address");
if ('' != $_POST['email'])
{ // A bot, are you?
sleep(2);
die('DNS error: cannot resolve smpt.gmail.com');
// Yes, this IS security through obscurity, but it's only an added layer which comes almost for free.
}
$newline = $_POST['message'];
$newline = str_replace("[N]","\n","$newline");
$newline = str_replace("[n]","\n","$newline");
// Add some last-ditch info
$newline .= <<<DIAGNOSTIC_INFO
---
Mail sent from $_SERVER[REMOTE_ADDR]:$_SERVER[REMOTE_PORT]
DIAGNOSTIC_INFO;
mail('info#site.com','missing Password Report',$newline,"From: ".$_POST['from']);
header( 'Location: http://site.com/report.html' ) ;
?>
Unless you are sending it with real GET parameters like http://www.mysite.com/send.php?email=etc; in this case you do need to set it to GET to retrieve the variables.

php header location wont work

I tried to make an ip based deny list which stored ip numbers in mysql. Basicly i try to header redirect if user's ip in array but it wont work. What is wrong?
$ip_array = array();
$ip_ban_query = mysql_query("SELECT ip FROM banned_ips");
while ($deny = mysql_fetch_assoc($ip_ban_query)){
$add_ip = $deny['ip'];
$ip_array[] = $add_ip;
}
if (in_array ($_SERVER['REMOTE_ADDR'], $ip_array)) {
header("Location: http://www.google.com/");
exit();
}

We can greatly simplify your code here.. reducing complexity almost always flushes away bugs. :)
// There are several different methods to accomplish this, and you really
// should be using statements here, but both are out of scope of this question.
$ipBanQuery = sprintf("SELECT ip FROM banned_ips WHERE ip = '%s'", mysql_real_escape_string($_SERVER['REMOTE_ADDR']));
$result = mysql_query($ipBanQuery);
if (mysql_num_rows($result)) {
header('Location: http://www.google.com/');
exit();
}
It also depends on where you're calling this code. Be extra-sure that this is being called before any output to the browser - stray spaces, HTML, or other debugging info will prevent any additional headers from being sent. Check your webserver's error log to see if there's something wonky going on.

small php code for detecting real search spiders from a spammer

hi I just want your opinions about this code I found on a website for detect real search spiders from spammer is it good?? and do you have any recommendations for other scripts or methods for this subject
<?php
$ua = $_SERVER['HTTP_USER_AGENT'];
$spiders=array('msnbot','googlebot','yahoo');
$pattern=array("/\.google\.com$/","/search\.live\.com$/","/\.yahoo\.com$/");
for($i=0;$i < count($spiders) and $i < count($pattern);$i++)
{
if(stristr($ua, $spiders[$i])){
//it's pretending to be MSN's bot or Google's bot
$ip = $_SERVER['REMOTE_ADDR'];
$hostname = gethostbyaddr($ip);
if(!preg_match($pattern[$i], $hostname))
{
//the hostname does not belong to either live.com or googlebot.com.
//Remember the UA already said it is either MSNBot or Googlebot.
//So it's a spammer.
echo "spammer";
exit;
}
else{
//Now we have a hit that half-passes the check. One last go:
$real_ip = gethostbyname($hostname);
if($ip != $real_ip){
//spammer!
echo "Please leave Now spammr";
break;
}
else{
//real bot
}
}
}
else
{
echo "hello user";
}
}
note: it used user agent switcher with this code and it worked perfectly but am not sure if it will work in real world, so what do you think??

What would keep a spammer from simply giving an entirely correct user agent string?
I think this is fairly pointless. You would have to at least compare IP ranges (or their name servers) as well in order to get reliable results. This is possible for Google:
Google Webmaster Central: How to verify Googlebot
but even if you test for Google and Bing this way, a spambot can enter your site simply by giving a browser user-agent. Therefore, it is ultimately impossible to detect a spam-bot. They are a reality, and there is no good way to keep them out from a web site.

you can also have htaccess so that things like this will be prevented just like on this tutorial
http://perishablepress.com/press/2007/06/28/ultimate-htaccess-blacklist/

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

BOT/Spider Trap Ideas - php

I tend to think this is a problem better solved with network security more so than coding, but I see the logic in your approach/question. There are a number of questions and discussions about this on server fault which may be worthy of investigating. https://serverfault.com/search?q=block+bots

You could first take a look at where the ip's are coming from. My guess is that they are all coming from one country like china or Nigeria, in which case you could set up something in htaccess to disallow all ip's from those two countries, as for creating a trap for bots, i havent the slightest idea

Related

Limiting Resource Usage for an IP on a Shared Host

How to validate phone numbers and stop injection headers - PHP Secure form

How to send a message securely via html and php (post)

php header location wont work

small php code for detecting real search spiders from a spammer

Categories

Resources