small php code for detecting real search spiders from a spammer

small php code for detecting real search spiders from a spammer - php

hi I just want your opinions about this code I found on a website for detect real search spiders from spammer is it good?? and do you have any recommendations for other scripts or methods for this subject
<?php
$ua = $_SERVER['HTTP_USER_AGENT'];
$spiders=array('msnbot','googlebot','yahoo');
$pattern=array("/\.google\.com$/","/search\.live\.com$/","/\.yahoo\.com$/");
for($i=0;$i < count($spiders) and $i < count($pattern);$i++)
{
if(stristr($ua, $spiders[$i])){
//it's pretending to be MSN's bot or Google's bot
$ip = $_SERVER['REMOTE_ADDR'];
$hostname = gethostbyaddr($ip);
if(!preg_match($pattern[$i], $hostname))
{
//the hostname does not belong to either live.com or googlebot.com.
//Remember the UA already said it is either MSNBot or Googlebot.
//So it's a spammer.
echo "spammer";
exit;
}
else{
//Now we have a hit that half-passes the check. One last go:
$real_ip = gethostbyname($hostname);
if($ip != $real_ip){
//spammer!
echo "Please leave Now spammr";
break;
}
else{
//real bot
}
}
}
else
{
echo "hello user";
}
}
note: it used user agent switcher with this code and it worked perfectly but am not sure if it will work in real world, so what do you think??

What would keep a spammer from simply giving an entirely correct user agent string?
I think this is fairly pointless. You would have to at least compare IP ranges (or their name servers) as well in order to get reliable results. This is possible for Google:
Google Webmaster Central: How to verify Googlebot
but even if you test for Google and Bing this way, a spambot can enter your site simply by giving a browser user-agent. Therefore, it is ultimately impossible to detect a spam-bot. They are a reality, and there is no good way to keep them out from a web site.

you can also have htaccess so that things like this will be prevented just like on this tutorial
http://perishablepress.com/press/2007/06/28/ultimate-htaccess-blacklist/

Related

PHP referral by link script. Is mine secure? Do you know any better ones?

I want to reward users if they refer a friend. I've been using the following code to do it, but I'm worried that it might not be secure (users make fake accounts to game it). Can I improve this code? Are there any other alternative scripts that do this better?
if (isset($_GET['refer']) || isset($_GET['r'])) {
global $database, $session;
if (!$session->logged_in) {
$username = mysql_safe($_GET['refer']);
if($database->usernameTaken($username)) {
$userip= getRealIP();
$q="SELECT uname FROM " . TBL_USERS . " WHERE ipad = '$userip'";
$result=mysql_query($q, $database->connection);
$result = mysql_numrows($result);
if ($result == 0) {
$_SESSION['referer'] = $username;
}
}
}
function getRealIP()
{
if (!empty($_SERVER['HTTP_CLIENT_IP'])) //check ip from share internet
{
$ip=$_SERVER['HTTP_CLIENT_IP'];
}
elseif (!empty($_SERVER['HTTP_X_FORWARDED_FOR'])) //to check ip is pass from proxy
{
$ip=$_SERVER['HTTP_X_FORWARDED_FOR'];
}
else
{
$ip=$_SERVER['REMOTE_ADDR'];
}
return $ip;
}

It depends on what level of abuse you're expecting.
Non-technically:
Are rewards transferable, are they tangible or not? Can I just create a bunch of acounts, then use ALL of them to refer a bunch of other accounts, and reap rewards on my fake accounts and send them to my main? If I create 20 accounts, and use each to refer once, do I wind up with 20x the rewards (spread across my fake accounts).
I can create fake accounts and log in from different places, easily.
Options: make it harder to claim the reward. If the user just has to create an account, it's trivial. If they have to log in and then do X, Y, and Z, it's harder to do, and you'll see less fakes.
Technically:
First off: you're relying on headers for IPs (X_FORWARDED_FOR, etc.) which can be spoofed, fairly easily. So if you're trying to limit it to one-per-IP, this is one flaw.
Second, while you're sanitizing the username, it appears, you do not appear to be sanitizing the IP before using it in a query. If you're going to do manual sanitization, do it consistently, or you have gaping holes. In this case, you can spoof the IP string - I don't know what PHP will do with a bogus string, but if it doesn't barf on it, you're asking for attacks.
Thirdly: I can come from an array of sites. If I hard reset my DSL, I get a new IP most of the time. I can log in from work. I can log in from my webserver box. All have unique IPs. I can find proxies which may or may not actually set those fields.
You can look at other identification. Simplest: cookies. Crazily more complex: things like this: https://panopticlick.eff.org/index.php?action=log&js=yes

Proxies and piracy - website security

I am building my first website. It is an Online Real Estate Agency. Users can create themselves a profile and then insert an ad and upload pictures.
I was told that I should detect multiple logging attempts to protect against Brute Force attacks. Well, with the following code I detect the IP's :
if(isset($_SERVER['HTTP_X_FORWARDED_FOR']))
{ $ip=$_SERVER['HTTP_X_FORWARDED_FOR'];} else
{ $ip=$_SERVER['REMOTE_ADDR'];}
The system counts missed logging attempts within a certain delay and holds a ban list in a DB.
It works great ... at least when a I test it myself !
Then as I was told 'Beware of piracy through false IP's ', I get the impression my protection system mentionned above is made uneffective.
There are :
1) sofwares available to pirats that encompass a Proxy which can hide their real IP
2) proxies on the web that can also hide real IP's.
What 's the difference between 1) and 2) ?
I would like to know how proxies can be used and what they are able to do in term of illicit practices
Can sombody change at will it's Ip ?
Can somebody in China or in Russia 'simulate' a Western Europe or US ip ?
Can I do more than what I've done to detect any suspicious activity ?
Thanks a lot.

Anyone can change ip, proxy, vpn....
I use this function to detect REAL IP address if it's valid:
function getrealip() {
if (getenv('HTTP_CLIENT_IP') && long2ip(ip2long(getenv('HTTP_CLIENT_IP')))==getenv('HTTP_CLIENT_IP') && validip(getenv('HTTP_CLIENT_IP')))
return getenv('HTTP_CLIENT_IP');
if (getenv('HTTP_X_FORWARDED_FOR') && long2ip(ip2long(getenv('HTTP_X_FORWARDED_FOR')))==getenv('HTTP_X_FORWARDED_FOR') && validip(getenv('HTTP_X_FORWARDED_FOR')))
return getenv('HTTP_X_FORWARDED_FOR');
if (getenv('HTTP_X_FORWARDED') && long2ip(ip2long(getenv('HTTP_X_FORWARDED')))==getenv('HTTP_X_FORWARDED') && validip(getenv('HTTP_X_FORWARDED')))
return getenv('HTTP_X_FORWARDED');
if (getenv('HTTP_FORWARDED_FOR') && long2ip(ip2long(getenv('HTTP_FORWARDED_FOR')))==getenv('HTTP_FORWARDED_FOR') && validip(getenv('HTTP_FORWARDED_FOR')))
return getenv('HTTP_FORWARDED_FOR');
if (getenv('HTTP_FORWARDED') && long2ip(ip2long(getenv('HTTP_FORWARDED')))==getenv('HTTP_FORWARDED') && validip(getenv('HTTP_FORWARDED')))
return getenv('HTTP_FORWARDED');
$ip = htmlspecialchars($_SERVER['REMOTE_ADDR']);
/* Added support for IPv6 connections. otherwise ip returns null */
if (strpos($ip, '::') === 0) {
$ip = substr($ip, strrpos($ip, ':')+1);
}
return long2ip(ip2long($ip));
}
More info for X-Forwarded

Proxy is a server that can mask your ip. It will send your request as if it was its and then send you back response that got.
Can sombody change at will it's Ip ?
No, they can't just change their ip to whatever they like to. But they can mask it.
Can somebody in China or in Russia 'simulate' a Western Europe or US ip ?
Yes
Can I do more than what I've done to detect any suspicious activity ?
If you detect that some user name is logging in with wrong password too many times using brute force techniques, you could slow down him by using sleep function. This technique you wouldn't cut off users that are using the proxy without bad intends and you will slow the brute force hacking.
if($wrongAttempts > 5) sleep(3000);
if($password == $_GET[pass])
{
// ...
}
You could also start including captcha images to raise security or block the account for some time.

As Dagon says, IP address is a pretty weak way of identifying users, and hackers will almost certainly not use their own IP address, but rather a stolen machine, or a botnet; on the other hand, many corporate users may appear to all come from the same IP address, and you could easily end up blocking every user from that building/company if someone forgets their password.
The first defense against a brute force attack is to have a strong password policy; commonly, this is assumed to be at least 7 characters, with at least one number and one punctuation mark. This often annoys users, and makes them hate your site.
The next defense - if you think you're really at risk - is CAPTCHA; this makes users hate you even more.
The bottom line is: if you are building your first website, I'd look at an off-the-shelf framework, rather than inventing everything yourself. Consider PEAR:auth.

BOT/Spider Trap Ideas

I have a client whose domain seems to be getting hit pretty hard by what appears to be a DDoS. In the logs it's normal looking user agents with random IPs but they're flipping through pages too fast to be human. They also don't appear to be requesting any images. I can't seem to find any pattern and my suspicion is it's a fleet of Windows Zombies.
The clients had issues in the past with SPAM attacks--even had to point MX at Postini to get the 6.7 GB/day of junk to stop server-side.
I want to setup a BOT trap in a directory disallowed by robots.txt... just never attempted anything like this before, hoping someone out there has a creative ideas for trapping BOTs!
EDIT: I already have plenty of ideas for catching one.. it's what to do to it when lands in the trap.

You can set up a PHP script whose URL is explicitly forbidden by robots.txt. In that script, you can pull the source IP of the suspected bot hitting you (via $_SERVER['REMOTE_ADDR']), and then add that IP to a database blacklist table.
Then, in your main app, you can check the source IP, do a lookup for that IP in your blacklist table, and if you find it, throw a 403 page instead. (Perhaps with a message like, "We've detected abuse coming from your IP, if you feel this is in error, contact us at ...")
On the upside, you get automatic blacklisting of bad bots. On the downside, it's not terribly efficient, and it can be dangerous. (One person innocently checking that page out of curiosity can result in the ban of a large swath of users.)
Edit: Alternatively (or additionally, I suppose) you can fairly simply add a GeoIP check to your app, and reject hits based on country of origin.

What you can do is get another box (a kind of sacrificial lamb) not on the same pipe as your main host then have that host a page which redirects to itself (but with a randomized page name in the url). this could get the bot stuck in a infinite loop tieing up the cpu and bandwith on your sacrificial lamb but not on your main box.

I tend to think this is a problem better solved with network security more so than coding, but I see the logic in your approach/question.
There are a number of questions and discussions about this on server fault which may be worthy of investigating.
https://serverfault.com/search?q=block+bots

Well I must say, kinda disappointed--I was hoping for some creative ideas. I did find the ideal solutions here.. http://www.kloth.net/internet/bottrap.php
<html>
<head><title> </title></head>
<body>
<p>There is nothing here to see. So what are you doing here ?</p>
<p>Go home.</p>
<?php
/* whitelist: end processing end exit */
if (preg_match("/10\.22\.33\.44/",$_SERVER['REMOTE_ADDR'])) { exit; }
if (preg_match("Super Tool",$_SERVER['HTTP_USER_AGENT'])) { exit; }
/* end of whitelist */
$badbot = 0;
/* scan the blacklist.dat file for addresses of SPAM robots
to prevent filling it up with duplicates */
$filename = "../blacklist.dat";
$fp = fopen($filename, "r") or die ("Error opening file ... <br>\n");
while ($line = fgets($fp,255)) {
$u = explode(" ",$line);
$u0 = $u[0];
if (preg_match("/$u0/",$_SERVER['REMOTE_ADDR'])) {$badbot++;}
}
fclose($fp);
if ($badbot == 0) { /* we just see a new bad bot not yet listed ! */
/* send a mail to hostmaster */
$tmestamp = time();
$datum = date("Y-m-d (D) H:i:s",$tmestamp);
$from = "badbot-watch#domain.tld";
$to = "hostmaster#domain.tld";
$subject = "domain-tld alert: bad robot";
$msg = "A bad robot hit $_SERVER['REQUEST_URI'] $datum \n";
$msg .= "address is $_SERVER['REMOTE_ADDR'], agent is $_SERVER['HTTP_USER_AGENT']\n";
mail($to, $subject, $msg, "From: $from");
/* append bad bot address data to blacklist log file: */
$fp = fopen($filename,'a+');
fwrite($fp,"$_SERVER['REMOTE_ADDR'] - - [$datum] \"$_SERVER['REQUEST_METHOD'] $_SERVER['REQUEST_URI'] $_SERVER['SERVER_PROTOCOL']\" $_SERVER['HTTP_REFERER'] $_SERVER['HTTP_USER_AGENT']\n");
fclose($fp);
}
?>
</body>
</html>
Then to protect pages throw <?php include($DOCUMENT_ROOT . "/blacklist.php"); ?> on the first line of every page.. blacklist.php contains:
<?php
$badbot = 0;
/* look for the IP address in the blacklist file */
$filename = "../blacklist.dat";
$fp = fopen($filename, "r") or die ("Error opening file ... <br>\n");
while ($line = fgets($fp,255)) {
$u = explode(" ",$line);
$u0 = $u[0];
if (preg_match("/$u0/",$_SERVER['REMOTE_ADDR'])) {$badbot++;}
}
fclose($fp);
if ($badbot > 0) { /* this is a bad bot, reject it */
sleep(12);
print ("<html><head>\n");
print ("<title>Site unavailable, sorry</title>\n");
print ("</head><body>\n");
print ("<center><h1>Welcome ...</h1></center>\n");
print ("<p><center>Unfortunately, due to abuse, this site is temporarily not available ...</center></p>\n");
print ("<p><center>If you feel this in error, send a mail to the hostmaster at this site,<br>
if you are an anti-social ill-behaving SPAM-bot, then just go away.</center></p>\n");
print ("</body></html>\n");
exit;
}
?>
I plan to take Scott Chamberlain's advice and to be safe I plan to implement Captcha on the script. If user answers correctly then it'll just die or redirect back to site root. Just for fun I'm throwing the trap in a directory named /admin/ and of coursed adding Disallow: /admin/ to robots.txt.
EDIT: In addition I am redirecting the bot ignoring the rules to this page: http://www.seastory.us/bot_this.htm

You could first take a look at where the ip's are coming from. My guess is that they are all coming from one country like china or Nigeria, in which case you could set up something in htaccess to disallow all ip's from those two countries, as for creating a trap for bots, i havent the slightest idea

How to redirect an entire range/block of IP addresses with PHP?

I'm using the following snippet to redirect an array of IP addresses. I was wondering how I would go about adding an entire range/block of IP addresses to my dissallowed array...
<?php // Let's redirect certain IP addresses to a "Page Not Found"
$disallowed = array("76.105.99.106");
$ip = $_SERVER['REMOTE_ADDR'];
if(in_array($ip, $disallowed)) {
header("Location: http://google.com");
exit;
}
?>
I tried using "76.105.99.*", "76.105.99", "76.105.99.0-76.105.99.255" without any luck.
I need to use PHP rather than mod_rewrite and .htaccess for other reasons.

Here's an example of how you could check a particular network/mask combination:
$network=ip2long("76.105.99.0");
$mask=ip2long("255.255.255.0");
$remote=ip2long($_SERVER['REMOTE_ADDR']);
if (($remote & $mask)==$network)
{
header("Location: http://example.com");
exit;
}
This is better than using a string based match as you can test other masks that align within an octet, e.g. a /20 block of IPs

Try the substr function:
$ip = '76.105.99.';
if (substr($_SERVER['REMOTE_ADDR'], 0, strlen($ip)) === $ip) {
// deny access
}

You can approach the problem in a different way.
If you want to ban 76.105.99.* you could do:
if (strpos($_SERVER['REMOTE_ADDR'], "76.105.99.")!==FALSE)
{
header ('Location: http://google.com');
}

Who exactly are you interested in blocking? You can use PHP or apache to block (or allow) a bunch of specific IP addresses.
If you are interested in blocking people from an entire country for example, then there are tools that give you the IP addresses you need to block. Unfortunately, it's not as simple as just specifying a range.
Check out http://www.blockacountry.com/ which generates a bunch of ip addresses you can stick in your .htaccess to block whole countries.

What you need to do is to have a test to see if a particular address lives inside a particular address range as defined by CIDR
So for instance, you need to be able to say
is 192.168.1.5
inside
192.168.1.0/24
That function is easy to write, assuming you have some basic tools to do CIDR work.
Assuming you are on a 32bit system, this class http://snipplr.com/view/15557/cidr-class-for-ipv4/
Pay attention to the IPisWithinCIDR function

It would be better to do this in apache(or any other server)

I believe that you'll need to create a for loop to add each IP address (within the range) to your array.
pseudo code
for i = 0 to 255
disallowed[i] = "76.105.99." + i
next

$blocked_ip_range_array = array('109.237.108.0','109.238.0.0');
for($i=0;$i<count($blocked_ip_range_array);$i++){
$network=ip2long($blocked_ip_range_array[$i]);
$blipr = explode(".",$blocked_ip_range_array[$i]);
if($blipr[2]=='0'){
$mask=ip2long("255.255.0.0");
}
else{
$mask=ip2long("255.255.255.0");
}
$remote=ip2long($_SERVER['REMOTE_ADDR']);
if (($remote & $mask)==$network)
{
header("Location: http://xurcun.info");
exit;
}
}

Below is a URL showing something rather similar to what Mr. Dixon and Ameer are discussing:
http://www.blackdog.ie/blog/blocking-ip-ranges-with-php/
Hope this helps.
Respectfully,
Wil

Is it possible to capture search term from Google search?

This may be a stupid question, but is it possible to capture what a user typed into a Google search box, so that this can then be used to generate a dynamic page on the landing page on my Web site?
For example, let's say someone searches Google for "hot dog", and my site comes up as one of the search result links. If the user clicks the link that directs them to my Web site, is it possible for me to somehow know or capture the "hot dog" text from the Google search box, so that I can call a script that searches my local database for content related to hot dogs, and then display that? It seems totally impossible to me, but I don't really know. Thanks.

I'd do it like this
$referringPage = parse_url( $_SERVER['HTTP_REFERER'] );
if ( stristr( $referringPage['host'], 'google.' ) )
{
parse_str( $referringPage['query'], $queryVars );
echo $queryVars['q']; // This is the search term used
}

This is an old question and the answer has changed since the original question was asked and answered. As of October 2011 Google is encrypting this referral information for anyone who is logged into a Google account: http://googleblog.blogspot.com/2011/10/making-search-more-secure.html
For users not logged into Google, the search keywords are still found in the referral URL and the answers above still apply. However, for authenticated Google users, there is no way to for a website to see their search keywords.
However, by creating dedicated landing pages it might still be possible to make an intelligent guess. (Visitors to the "Dignified charcoal sketches of Jabba the Hutt" page are probably...well, insane.)

Yes, it is possible. See HTTP header Referer. The Referer header will contain URL of Google search result page.
When user clicks a link on a Google search result page, the browser will make a request to your site with this kind of HTTP header:
Referer: http://www.google.fi/search?hl=en&q=http+header+referer&btnG=Google-search&meta=&aq=f&oq=
Just parse URL from request header, the search term used by user will be in q -parameter. Search term used in above example is "http header referer".
Same kind of approach usually works also for other search engines, they just have different kind of URL in Referer header.
This answer shows how to implement this in PHP.
Referer header is only available with HTTP 1.1, but that covers just about any somewhat modern browser. Browser may also forge Referer header or the header might be missing altogether, so do not make too serious desicions based on Referer header.

This is an old question but I found out that google no more gives out the query term because it by default redirects every user to https which will not give you the "q"parameter. Unless someone manually enters the google url with http (http://google.com) and then searches, there is no way as of now to get the "q" parameter.

Yes, it comes in the url:
http://www.google.com/search?hl=es&q=hot+dog&lr=&aq=f&oq=
here is an example:
Google sends many visitors to your site, if you want to get the keywords
they used to come to your site, maybe to impress them by displaying it
back on the page, or just to store the keyword in a database, here's the
PHP code I use :
// take the referer
$thereferer = strtolower($_SERVER['HTTP_REFERER']);
// see if it comes from google
if (strpos($thereferer,"google")) {
// delete all before q=
$a = substr($thereferer, strpos($thereferer,"q="));
// delete q=
$a = substr($a,2);
// delete all FROM the next & onwards
if (strpos($a,"&")) {
$a = substr($a, 0,strpos($a,"&"));
}
// we have the results.
$mygooglekeyword = urldecode($a);
}
and we can use <?= $mygooglekeywords ?> when we want to output the
keywords.

You can grab the referring URL and grab the search term from the query string. The search will be in the query as "q=searchTerm" where searchTerm is the text you want.

Same thing, but with some error handling
<?php
if (#$_SERVER['HTTP_REFERER']) {
$referringPage = parse_url($_SERVER['HTTP_REFERER']);
if (stristr($referringPage['host'], 'google.')) {
parse_str( $referringPage['query'], $queryVars );
$google = $queryVars['q'];
$google = str_replace("+"," ",$google); }
else { $google = false; }}
else { $google = false; }
if ($google) { echo "You searched for ".$google." at Google then came here!"; }
else { echo "You didn't come here from Google"; }
?>

Sorry, a little more
Adds support for Bing, Yahoo and Altavista
<?php
if (#$_SERVER['HTTP_REFERER']) {
$referringPage = parse_url($_SERVER['HTTP_REFERER']);
if (stristr($referringPage['host'], 'google.')
|| stristr($referringPage['host'], 'bing.')
|| stristr($referringPage['host'], 'yahoo.')) {
parse_str( $referringPage['query'], $queryVars );
if (stristr($referringPage['host'], 'google.')
|| stristr($referringPage['host'], 'bing.')) { $search = $queryVars['q']; }
else if (stristr($referringPage['host'], 'yahoo.')) { $search = $queryVars['p']; }
else { $search = false; }
if ($search) { $search = str_replace("+"," ",$search); }}
else { $search = false; }}
else { $search = false; }
if ($search) { echo "You're in the right place for ".$search; }
?>

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

small php code for detecting real search spiders from a spammer - php

you can also have htaccess so that things like this will be prevented just like on this tutorial http://perishablepress.com/press/2007/06/28/ultimate-htaccess-blacklist/

Related

PHP referral by link script. Is mine secure? Do you know any better ones?

Proxies and piracy - website security

BOT/Spider Trap Ideas

How to redirect an entire range/block of IP addresses with PHP?

Is it possible to capture search term from Google search?

Categories

Resources