php crawler detection

php crawler detection - php

I'm trying to write a sitemap.php which acts differently depending on who is looking.
I want to redirect crawlers to my sitemap.xml, as that will be the most updated page and will contain all the info they need, but I want my regular readers to be show a html sitemap on the php page.
This will all be controlled from within the php header, and I've found this code on the web which by the looks of it should work, but it's not. Can anyone help crack this for me?
function getIsCrawler($userAgent) {
$crawlers = 'firefox|Google|msnbot|Rambler|Yahoo|AbachoBOT|accoona|' .
'AcioRobot|ASPSeek|CocoCrawler|Dumbot|FAST-WebCrawler|' .
'GeonaBot|Gigabot|Lycos|MSRBOT|Scooter|AltaVista|IDBot|eStyle|Scrubby';
$isCrawler = (preg_match("/$crawlers/i", $userAgent) > 0);
return $isCrawler;
}
$iscrawler = getIsCrawler($_SERVER['HTTP_USER_AGENT']);
if ($isCrawler) {
header('Location: http://www.website.com/sitemap.xml');
exit;
} else {
echo "not crawler!";
}
It looks pretty simple, but as you can see i've added firefox into the agent list, and sure enough I'm not being redirected..
Thanks for any help :)

You have a mistake in your code:
$crawler = getIsCrawler($_SERVER['HTTP_USER_AGENT']);
should be
$isCrawler = getIsCrawler($_SERVER['HTTP_USER_AGENT']);
If you develop with notices on you'll catch these errors much more easily.
Also, you probable want to exit after the header
Warning: Cloaking can get you in trouble with search providers. This article explains why.

http://develobert.blogspot.com/2008/11/php-robot-check.html

Related

Simple PHP code not working on mobile

THis may be super basic, but I have not been able to resolve this after spending hours!
I am running PHP 7 on Ubuntu 16.1.
The PHP file is EXACTLY as follows
<?php
header("Content-type: application/javascript");
header("HTTP/1.1 200 OK");
ExpandShortLink();
function ExpandShortLink()
{
// get URL
$URL_To_Expand = $_REQUEST['url'];
// for short links, get the full links
// get full URL
$arr_URL_Header = get_headers($URL_To_Expand, 1);
$strLink = $arr_URL_Header['Location'];
//echo $URL_To_Expand;
//print_r($arr_URL_Header);
if ($strLink) {
if (is_array($strLink)) {
$Full_URL = array_pop($strLink);
} else {
$Full_URL = $strLink;
}
} else {
$Full_URL = $URL_To_Expand;
}
echo $Full_URL;
}
--> produces the url I enter as a "url" parameter on desktop. But nothing on mobile!
On some reading, I found that in sometimes PHP interprets everything after "//" as a comment and that may be happening here. But then why does it happen on mobile only? Also, andy suggestions on resolving this will be great!
Thanks much for your help,
You can see this live here
If you click this on desktop, you will see http:// example. com. However, on mobile it will return http:

Not sure if this qualifies as answer, but I wanted to put a note here for anyone else who may be facing a similar problem.
I was using the PHP pasted above for an ajax call. I tried using text/plain instead of application/json and now it works across all browsers and all devices (as far as I could test).
Not sure why application/javascript was causing problems on mobile chrome, but I think text/plain makes sense as I was just passing back a text string instead of a javascript.
As I said it probably is not the fully qualified answer, but hopefully it helps someone in future!

PHP / jQuery Ajax / Sessions Strange behaviour only on Chrome for Android

I have been struggling with a couple of strange issues in my free time project I'm workig on. It's my first "big" PHP / JS project, and to be honest I'm using ajax for the first time so I might be just missing something.
Anyways, here how it is. I'm programming a very simple invoicing system using PHP and jQuery technologies with mPDF library to generate an PDF file from HTML/CSS. I'm using mainly Session variables inside the template that gets sent to mPDF to generate an PDF invoice.
Issue I'm experiencing is on Chrome for Android, tested on latest version on OnePlus One. The Session variables are not showing in the PDF itself. I think it worked like once or twice totally at random. My friend with Android device and Google Chrome also confirms same issue.
test.php:
<?php
session_start();
error_reporting(E_ALL);
if (!isset($_SESSION['GLO_IS_LOGGED_IN'])) {
header("Location: index.php");
exit;
}
include('libs/mPDF/mpdf.php');
ob_start();
include('protected/templates/template.php');
$data = ob_get_clean();
$mpdf = new mPDF();
$mpdf->WriteHTML($data);
$mpdf->Output('protected/invoices/Faktura ' . date('j-m-Y-H-i-s') . '.pdf');
$mpdf->Output('Faktura ' . date('j-m-Y-H-i-s') . '.pdf', 'D');
unset($_SESSION['VAR_DESCRIPTION_ARRAY']);
unset($_SESSION['VAR_AMOUNT_ARRAY']);
unset($_SESSION['VAR_PRICE_ARRAY']);
unset($_SESSION['VAR_TO_ADDRESS']);
unset($_SESSION['VAR_INVOICE_NUMBER']);
Here is generateInvoice.php file that you might have noticed in the invoice-script.js:
<?php
session_start();
error_reporting(E_ALL);
if (!isset($_SESSION['GLO_IS_LOGGED_IN'])) {
header("Location: index.php");
exit;
}
if (!empty($_POST['invoice-number'])) {
$_SESSION['VAR_INVOICE_NUMBER'] = trim($_POST['invoice-number']);
} else {
echo('Please add invoice number');
exit;
}
if (!empty($_POST['to-address'])) {
$_SESSION['VAR_TO_ADDRESS'] = ($_POST['to-address']);
} else {
echo('Internal Error');
exit;
}
$_SESSION['VAR_DESCRIPTION_ARRAY'] = $_POST['invoice-description'];
$_SESSION['VAR_AMOUNT_ARRAY'] = $_POST['invoice-amount'];
$_SESSION['VAR_PRICE_ARRAY'] = $_POST['invoice-price'];
I don't want to make this post very-long so I'll stop posting any code snippets here. Believe me that I have done everything I could to find out myself what's going on and it feels really bad that I cant figure it out myself and that I need to ask others for help. Anyways thanks for any feedback and help. Cheers!

'invoice-form' doesn't contain any fields - the input tags should be within the form

Ok people, I think I have found a solution for this.
Commenting away all of the unset methods at the end in test.php solved the Chrome for Android issue.
I don't understand why this was happening in the first place. Why were the session variables unset BEFORE the invoice was generated? They shouldn't be, right? Or I am really missing something? I know I shouldn't ask for clarification in my own answer but I think at this point I really need to.
Cheers and thanks to IanMcL for solving my Edge issue!

How to validate user input

I'm using a script, written in PHP and Jquery, that allows to scrape a static website:
<?php
if(isset($_GET['site'])){
$f = fopen($_GET['site'], 'r');
$html = '';
while (!feof($f)) {
$html .= fread($f, 24000);
}
fclose($f);
echo $html;
}
?>
The Jquery part:
$(function(){
var site = $(input).val();
$.get('proxy.php', { site:site }, function(data){
$('#myDiv').append(data);
}, 'html');
});
As you can see the website that needs to be scraped has to be value in input. I want to give my visitors the ability to set there own website to be scraped.
The problem is that I cant figure out how to secure the PHP part. As I understand the input value is a big security risk because anything can be sent with value. I already experienced slow performance and several 'pc crashes' working with this code. Im not sure if the crashes are related but they only happen when I work on the code.
Anyway I would really like to know how to validate the value(from input) sent to my server, only REAL urls should be aloud. I googled for days but I cant figure it out (new at PHP)
ps If you spot any other security risks please let me know..

I think your main security issue, is that you're using fopen to read the content of the url, if the user wants to read a file in your system, then he has to send the path to that file and if the script has enough permissions, then they'll be able to access the content of your hard drive.
I would recommend using other methods like Curl or at least, validating the user input to make sure that it's a valid url, for this, I would check out some regular expressions
Good luck with your code!
Edit on validation
Here is a little example of what I meant by validation.
<?php
if(isset($_GET['site'])){
if(validURL($_GET['site']) {
$f = fopen($_GET['site'], 'r');
$html = '';
while (!feof($f)) {
$html .= fread($f, 24000);
}
fclose($f);
echo $html;
} else {
echo "Invalid URL, please enter a valid web url (i.e: http://www.google.com)";
}
}
function validURL($url){
//here goes your validation code, returns true if the url is valid
}
?>
But if you're too new to understand this, I would suggest going for simpler examples, since this is pretty basic logic.

Its so sad that you could not find anything on the internet about this topic. Its a common thing. Please refer the links below. It may be of help.
PHP validate input alphanumeric plus a few symbols
http://phpmaster.com/input-validation-using-filter-functions/

Facebook GraphAPI via another webpage php?

First: please forgive me - Im a bit of a novice as some of this...
I have a working test site which is running the php facebook SDK to perform some simple graphAPI requests successfully. Namely read a group's feed, which the user is a member of, and process this and display it back on a webpage.
This all works fine, the problem I have encountered is when trying to perform the same request via a php curl POST to another webpage (on the same domain). It seems that the SDK does not carry the expected session to another page when a post request is formed (see "AUTH ERROR2" in code)...this works fine when the following file is included via a "require_once" but not when a curl is made.
I would much rather do a "curl" as Im finding when a "require_once" is done from a page in a different directory level, Im getting php errors of the page not being found - which is expected.
I may just be tackling this problem all wrong...there may be a simpler way to make sure when files are includes, their correct directly level remains intact, or there may be a way to send over the currently authorised facebook sdk session via a curl post. All of which I have tried to no avail, and I would really appreciate any help or advise on this.
Thank you for your time.
//readGroupPosts.inc.php
function readGroupPosts($postVars)
{
//$access_token = $postVars[0];
// ^-- I'm presuming I need this? I have been experimenting appending it to
// the graphAPI request to no success...
$groupID = $postVars[1];
$limit = $postVars[2];
require_once("authFb.inc.php"); //link to the facebookSDK & other stuff
if ($user) {
try {
$groupFeed = $facebook->api("/$groupID/feed?limit=$limit"); //limit=0 returns all;
$groupFeed = $groupFeed['data']; //removes first tier of array for simpler access
$postArray;
for($i=0; $i<count($groupFeed); $i++)
{
$postArray[$i] = array($groupFeed[$i]['from']['name'], $groupFeed[$i]['message'], $groupFeed[$i]['updated_time'], count($groupFeed[$i]['likes']['data']));
}
return $postArray;
} catch (FacebookApiException $e) {
error_log($e);
$user = null;
return "AUTH ERROR1"; //for testing..
}
}
else
{
return "AUTH ERROR2"; //no user is authenticated i.e. $user == null..
}
}

I would much rather do a "curl" as Im finding when a "require_once" is done from a page in a different directory level, Im getting php errors of the page not being found - which is expected.
I may just be tackling this problem all wrong...
Definitively.
Using cURL as a “workaround” just because you’re not able to find your way around your server’s file system is an outrageous idea. Don’t do it. Stop even thinking about it. Now.
there may be a simpler way to make sure when files are includes, their correct directly level remains intact
Yes – for example, to use absolute paths instead of relative ones. Prefixing the path with the value of $_SERVER['DOCUMENT_ROOT'] for example – that way, once you’ve given the path correctly in respect to this “base path”, it does not matter where you’re requiring the file from, because an absolute path is the same no matter from where you look at it.
(And since this is not a Facebook-related problem at all, but just concerns basics of PHP and server-side programming, I’ll edit the tags.)

Mediawiki custom tag Stops page parsing

I created a few mediawiki custom tags, using the guide found here
http://www.mediawiki.org/wiki/Manual:Tag_extensions
I will post my code below, but the problem is after it hits the first custom tag in the page, it calls it, and prints the response, but does not get anything that comes after it in the wikitext. It seems it just stops parsing the page.
Any Ideas?
if ( defined( 'MW_SUPPORTS_PARSERFIRSTCALLINIT' ) ) {
$wgHooks['ParserFirstCallInit'][] = 'tagregister';
} else { // Otherwise do things the old fashioned way
$wgExtensionFunctions[] = 'tagregister';
}
function tagregister(){
global $wgParser;
$wgParser->setHook('tag1','tag1func');
$wgParser->setHook('tag2','tag2func');
return true;
}
function tag1func($input,$params)
{
return "It called me";
}
function tag2func($input,$params)
{
return "It called me -- 2";
}
Update: #George Mauer -- I have seen that as well, but this does not stop the page from rendering, just the Mediawiki engine from parsing the rest of the wikitext. Its as if hitting the custom function is signaling mediawiki that processing is done. I am in the process of diving into the rabbit hole but was hoping someone else has seen this behavior.

Never used Mediawiki but that sort of problem in my experience is indicative of a PHP error that occurred but was suppressed either with the # operator or because PHP error output to screen is turned off.
I hate to resort to this debugging method but when absolutely and utterly frustrated in PHP I will just start putting echo statements every few lines (always with a marker so I remember to remove them later), to figure out exactly where the error is coming from. Eventually, you'll get to the bottom of the rabbit hole and figure out exactly what the problematic line of code is.

Silly me.
Had to close the tags.
Instead of<tag1> I had to change it to <tag1 /> or <tag1></tag1>
Now all works!

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

php crawler detection - php

http://develobert.blogspot.com/2008/11/php-robot-check.html

Related

Simple PHP code not working on mobile

PHP / jQuery Ajax / Sessions Strange behaviour only on Chrome for Android

How to validate user input

Facebook GraphAPI via another webpage php?

Mediawiki custom tag Stops page parsing

Categories

Resources