How to handle ?_escaped_fragment_= for AJAX crawlers?

How to handle ?_escaped_fragment_= for AJAX crawlers? - php

I'm struggling to make AJAX-based website SEO-friendly. As recommended in tutorials on the web, I've added "pretty" href attributes to links: контакт and, in a div where content is loaded with AJAX by default, a PHP script for crawlers:
$files = glob('./pages/*.php');
foreach ($files as &$file) {
$file = substr($file, 8, -4);
}
if (isset($_GET['site'])) {
if (in_array($_GET['site'], $files)) {
include ("./pages/".$_GET['site'].".php");
}
}
I have a feeling that at the beginning I need to additionaly cut the _escaped_fragment_= part from (...)/index.php?_escaped_fragment_=site=about because otherwise the script won't be able to GET the site value from URL , am I right?
but, anyway, how do I know that the crawler transforms pretty links (those with #!) to ugly links (containing ?_escaped_fragment_=)? I've been told that it happens automatically and I don't need to provide this mapping, but Fetch as Googlebot doesn't provide me with any information about what happens to URL.

Google bot will automatically query for ?_escaped_fragment_= urls.
So from www.example.com/index.php#!site=about
Google bot will query: www.example.com/index.php?_escaped_fragment_=site=about
On PHP site you will get it as $_GET['_escaped_fragment_'] = "site=about"
If you want to get the value of the "site" you need to do something like this:
if(isset($_GET['_escaped_fragment_'])){
$escaped = explode("=", $_GET['_escaped_fragment_']);
if(isset($escaped[1]) && in_array($escaped[1], $files)){
include ("./pages/".$escaped[1].".php");
}
}
Take a look at the documentation:
https://developers.google.com/webmasters/ajax-crawling/docs/specification

Related

query string in php url which fetches values from files in directories

for security reasons we need to disable a php/mysql for a non-profit site as it has a lot of vulnerabilities. It's a small site so we want to just rebuild the site without database and bypass the vulnerability of an admin page.
The website just needs to stay alive and remain dormant. We do not need to keep updating the site in future so we're looking for a static-ish design.
Our current URL structure is such that it has query strings in the url which fetches values from the database.
e.g. artist.php?id=2
I'm looking for a easy and quick way change artist.php so instead of fetching values from a database it would just include data from a flat html file so.
artist.php?id=1 = fetch data from /artist/1.html
artist.php?id=2 = fetch data from /artist/2.html
artist.php?id=3 = fetch data from /artist/3.html
artist.php?id=4 = fetch data from /artist/4.html
artist.php?id=5 = fetch data from /artist/5.html
The reason for doing it this way is that we need to preserve the URL structure for SEO purposes. So I do not want to use the html files for the public.
What basic php code would I need to achieve this?

To do it exactly as you ask would be like this:
$id = intval($_GET['id']);
$page = file_get_contents("/artist/$id.html");
In case $id === 0 there was something else besides numbers in the query parameter. You could also have the artist information in an array:
// datafile.php
return array(
1 => "Artist 1 is this and that",
2 => "Artist 2..."
)
And then in your artist.php
$data = include('datafile.php');
if (array_key_exists($_GET['id'], $data)) {
$page = $data[$_GET['id']];
} else {
// 404
}

HTML isn't your best option, but its cousin is THE BEST for static data files.
Let me introduce you to XML! (documentation to PHP parser)
XML is similar to HTML as structure, but it's made to store data rather than webpages.
If instead your html pages are already completed and you just need to serve them, you can use the url rewriting from your webserver (if you're using Apache, see mod_rewrite)
At last, a pure PHP solution (which I don't recommend)
<?php
//protect from displaying unwanted webpages or other vulnerabilities:
//we NEVER trust user input, and we NEVER use it directly without checking it up.
$valid_ids = array(1,2,3,4,5 /*etc*/);
if(in_array($_REQUEST['id'])){
$id = $_REQUEST['id'];
} else {
echo "missing artist!"; die;
}
//read the html file
$html_page = file_get_contents("/artist/$id.html");
//display the html file
echo $html_page;

using href to two links(php and a webpage) [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
While($enreg=mysql_fetch_array($res))
{
$link_d.="<font color=\"red\">clic here to download</font></td>"
}
i want to use the href so it leads to download link, also to send the id to a php file so i can get how many times the files have been downloaded !
How can we use href to multiple links !

You can't. A link can only point to one resource.
Instead, what you should do is have your PHP script redirect to the file. The link points at your PHP script with the counter, and then set a Location: header (which automatically sets a 302 status code for redirection) with the value being the URL you want to redirect to.
Also, you should really use htmlspecialchars() around any variable data you use in an HTML context, to ensure you are generating valid HTML.

Ideally you would have some checks to see if it's a human downloading (Web crawlers may trigger it - we will put no-follow in the link which will help though). You could also use a database but that gets more complicated. My preferred way would be to use Google Analytics Events. But here is a simple PHP script that might fulfill your needs without the complexity of the other solutions.
First modify your links to have a tracker script and to urlencode
$link_d.= '<a style="color:red" href="tracker.php?url='.urlencode($enreg[link]).'" target="_blank">click here to download</a>';
}
Then create a script that will record downloads (tracker.php)
<?php
// keep stats in a file - you can change the path to have it be below server root
// or just use a secret name - must be writeable by server
$statsfile = 'stats.txt';
// only do something if there is a url
if(isset($_GET['url'])) {
//decode it
$url = urldecode($_GET['url']);
// Do whatever check you want here to see if it's a valud link - you can add a regex for a URL for example
if(strpos($url, 'http') != -1) {
// load current data into an array
$lines = file($statsfile);
// parse array into something useable by php
$stats = array();
foreach($lines as $line) {
$bits = explode('|', $line);
$stats[(string)$bits[0]] = (int)$bits[1];
}
// see if there is an entry already
if(!isset($stats[$url])) {
// no so let's add it with a count of 1
$stats[$url] = 1;
} else {
// yes let's increment
$stats[$url]++;
}
// get a blank string to write to file
$data = null;
// get our array into some human readabke format
foreach($stats as $url => $count) {
$data .= $url.'|'.$count."\n";
}
// and write to file
file_put_contents($statsfile, $data);
}
// now redirect to file
header('Location: ' . $url);
}

You can't.
Anchor are meant to lead to one ressource.
What you want to do is tipically addressed by using an intermediate script that count the hit and redirect to the ressource.
eg.
Click here to download
redirect.php
// Increment for example, a database :
// UPDATE downloads SET hits = (hits + 1) WHERE id=42
// Get the URI
// SELECT uri FROM downloads WHERE id=42
// Redirect to the URI
// (You may also need to set Content-type header for file downloads)
header( "Location: $uri" );
You may optimize this by passing the uri as a second parameter so that you won't need to fetch it at redirect time.
Click here to download
Another way of collecting this kind of statistics is to use some javascript tools provided by your statistics provider, like Google Analytics or Piwik, adding a listener to the click event.
It is less invasive for your base code but won't let you easily reuse collected data in your site (for example if you want to show a "top download" list).

Create a file with download script for example download.php and route all your downloads through it. Update your counter in this page and use appropriate headers for download.
eg:
url may be download.php?id=1&file=yourfile
in download.php
//get id and file
//database operation to update your count
//headers for download

Changing base URL on part of a page only

I have a page on my site that fetches and displays news items from the database of another (legacy) site on the same server. Some of the items contain relative links that should be fixed so that they direct to the external site instead of causing 404 errors on the main site.
I first considered using the <base> tag on the fetched news items, but this changes the base URL of the whole page, breaking the relative links in the main navigation - and it feels pretty hackish too.
I'm currently thinking of creating a regex to find the relative URLs (they all start with /index.php?) and prepending them with the desired base URL. Are there any more elegant solutions to this? The site is built on Symfony 2 and uses jQuery.

Here is how I would tackle the problem:
function prepend_url ($prefix, $path) {
// Prepend $prefix to $path if $path is not a full URL
$parts = parse_url($path);
return empty($parts['scheme']) ? rtrim($prefix, '/').'/'.ltrim($path, '/') : $path;
}
// The URL scheme and domain name of the other site
$otherDomain = 'http://othersite.tld';
// Create a DOM object
$dom = new DOMDocument('1.0');
$dom->loadHTML($inHtml); // $inHtml is an HTML string obtained from the database
// Create an XPath object
$xpath = new DOMXPath($dom);
// Find candidate nodes
$nodesToInspect = $xpath->query('//*[#src or #href]');
// Loop candidate nodes and update attributes
foreach ($nodesToInspect as $node) {
if ($node->hasAttribute('src')) {
$node->setAttribute('src', prepend_url($otherDomain, $node->getAttribute('src')));
}
if ($node->hasAttribute('href')) {
$node->setAttribute('href', prepend_url($otherDomain, $node->getAttribute('href')));
}
}
// Find all nodes to export
$nodesToExport = $xpath->query('/html/body/*');
// Iterate and stringify them
$outHtml = '';
foreach ($nodesToExport as $node) {
$outHtml .= $node->C14N();
}
// $outHtml now contains the "fixed" HTML as a string
See it working

You can override the base tag by putting http:\\ in front of the link. That is, give a full url, not a relative URL.

Well, not actually a solution, but mostly a tip...
You could start playing aroung with ExceptionController.
There, just for example, you could seek for 404 error and check query string appended to request:
$request = $this->container->get('request');
....
if (404 === $exception->getStatusCode()) {
$query = $request->server->get('QUERY_STRING');
//...handle your logic
}
The other solution would be to define special route with its controller for such purposes, which would catch requests to index.php and do redirects and so on. Just define index.php in requirements of route and move this route on the top of your routing.
Not a clearest answer ever, but at least I hope I gave you a direction...
Cheers ;)

Clean URLS in MVC structure using PHP

I am creating a website using the MVC structure. Below is a code I have used to use clean URLS and load the appropriate files. However it only works for the first level.
Say I wanted to visit mywebsite.com/admin it would work, however mywebsite.com/admin/dashboard would not. The problem is in the arrays, how could I get the array to load content after the 2nd level along with the second level.
Would it be best to create an array like this?
Array
- controller
- view
- dashboard
Any help here would be great. Also as a side question. What would be the best way to set up "custom" urls. So if I were to put in mywebsite.com/announcement it would check to see if its got controllers, failing that, check to see if it's got custom content (maybe a file of the same name in "customs" folder, and then if there's nothing execute the 404 page not found stuff) This isn't a priority question though, but loosely associated in how the code works so I thought it best to add.
function hook() {
$params = parse_params();
$url = $_SERVER['REQUEST_URI'];
$url = str_replace('?'.$_SERVER['QUERY_STRING'], '', $url);
$urlArray = array();
$urlArray = explode("/",$url);
var_dump($urlArray);
if (isset($urlArray[2]) & !empty($urlArray[2])) {
$route['controller'] = $urlArray[2];
} else {
$route['controller'] = 'front'; // Default Action
}
if (isset($urlArray[3]) & !empty($urlArray[3])) {
$route['view'] = $urlArray[3];
} else {
$route['view'] = 'index'; // Default Action
}
include(CONTROLLER_PATH.$route['controller'].'.php');
include(VIEW_PATH.$route['controller'].DS.$route['view'].'.php');
var_dump($route['controller']);
var_dump($route['view']);
var_dump($urlArray);
var_dump($params);
// reseting messages
$_SESSION['flash']['notice'] = '';
$_SESSION['flash']['warning'] = '';
}
// Return form array
function parse_params() {
$params = array();
if(!empty($_POST)) {
$params = array_merge($params, $_POST);
}
if(!empty($_GET)) {
$params = array_merge($params, $_GET);
}
return $params;
}

Can you clarify this: "The problem is in the arrays, how could I get the array to load content after the 2nd level along with the second level."
I don't understand how you want this thing to work. I checked your code and it works. Maybe you just need to put $urlArray[1] instead of $urlArray[2] and 2 instead of 3? First element in the array is at index 0.
Usually it's done like this:
Url format:
/controller/action/param1/param2/...
-controller- should be a class. That class has a method/function called -action-.
ex. /shoes/show/121/ --> this will load controller shoes
and execute the method/function show(121)
that will show the shoes that have the id 121 in the
database.
ex. /shoes/list/sport --> this will load controller shoes
and execute function list('sport') that will list all
shoes in the sport category.
As you can see, you only load one controller and from that controller you run only one function and that function will get the rest of the path and use it as parameters.
If you want to have multiple controllers for one URL, then the rest of the controllers will have to be loaded from the main controller. Most MVCs (like CodeIgniter) load only one controller per URL.
Second question:
Best way for pretty urls would be to save them in the db. This means you can have URLs like this:
/I-can-write-anything-here-No-need-to-add-ids-or-controller-names
Then you take this URL and search it in db and get the -controller- and -action- that you need for this URL.
But I have yet to see a popular MVC framework do this. I guess the reason is that the db will get a lot of queries for text matches and that will slow things down.
Popular MVC frameworks use:
/controller/action/param1/param2
This has the benefit that you can directly find the controller/action from the url.
The downside is that you will get urls like:
/shoes/list/sport
//when what you really want is
/shoes/sport
//or just
/sport //if the website only sells shoes
This can be fixed by redirecting /shoes/sport to /shoes/list/sport
If you make your own MVC then you should use OOP because if not, thing will get ugly quick: all actions/functions are in the same namespace.

Personally I would recommend that you use one of the many PHP frameworks that exist as that will take care of the routing for you and let you concentrate on writing your application. CakePHP is one that I've used for a while and it makes my life so much easier.

What I do:
I create a .htaccess file that redirects an url like www.example.com/url/path/or/something to www.example.com/index.php?url=url/path/or/something, so it will be pretty easy to do an explode on your $_GET['url']
Second, it's better because everything a user input, will be redirected to your index.php, so you have FULL control over EVERYTHING.
If you want I can PM you the url to my mvc (bitbucket) so you can have a look on how I do this ;)
(Sorry for the others, but I don't like to put url's to my site in public)
edit:
To be more precise to your particular question; It will solve your problem, because everything goes to index.php and you have full control over the requested url.

Checking up on a link exchange

I have made a link exchange with another site. 3 days later, the site has removed my link.
Is there a simple php script to help me control link exchanges and notify me if my link has been removed?
I need it as simple as possible, and not a whole ad. system manager.

If you know the URL of webpage where your ad(link) exists then you can use Simple HTML DOM Parser to get all links of that webpage in array and then use php in_array function to check that your link exists in that array or not. You can run this script on daily bases using crontab.
// Create DOM from URL
$html = file_get_html('http://www.example.com/');
// Find all links
$allLinks = array();
foreach($html->find('a') as $element) {
$allLinks[] = $element->href;
}
// Check your link.
$adLink = "http://www.mylink.com";
if ( in_array($adLink , $allLinks ) ) {
echo "My link exists.";
} else {
echo "My link is removed.";
}

Technically there's no way to know if someone's website has a link to yours unless you have traffic directed from their website or you look at their website.
Your best bet would be either:
A script which records every time they link to your image. This is simple enough by mixing PHP and .htaccess
.htaccess:
RewriteRule path/to/myImage.jpg path/to/myScript.php
myScript.php:
/* Record (database, file, or however) that they accessed */
header("Content-type: image/jpeg");
echo file_get_contents("path/to/myImage.jpg");
Or a script which looks at their website every X amount of minutes/hours/days and searches the returned HTML for the link to your image. The challenge here is making the script run periodically. This can be done with crontab or similar
myScript.php:
$html = file_get_contents("http://www.theirsite.com");
if(strpos($html, 'path/to/myImage.jpg') !== FALSE)
/* Happiness */
else
/* ALERT! */

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How to handle ?_escaped_fragment_= for AJAX crawlers? - php

Related

query string in php url which fetches values from files in directories

using href to two links(php and a webpage) [closed]

Changing base URL on part of a page only

Clean URLS in MVC structure using PHP

Checking up on a link exchange

Categories

Resources