Extract common path from URI and Web Root

Extract common path from URI and Web Root - php

I have a self-built MVC fw with a router routing URLs such that the common example.com/controller/action is used. I'm running into issues when my application is deployed within a sub-directory such as
example.com/my_app/controller/action/?var=value
My router thinks my_app is the name of the controller now and controller is the method.
My current solution is to manually ask for any sub directory name in a config at install. I'd like to do this manually. See my question below and let me know if I'm going about solving this the wrong way and asking the wrong question.
My question:
if I have two paths, how do I truncate the common pieces from the end of one and remove it from the end of the other.
A = /var/www/my_app/pub
B = /my_app/pub/cntrl/actn
What's your quickest one liner to remove /my_app/pub from B and remain with /cntrl/actn?
Basically looking for a perl-esque way of getting the common denominator like string.
Thanks for any input

my #physical_parts = split qr{/}, $physical_path;
my #logical_parts = split qr{/}, $logical_path;
my #physical_suffix;
my #logical_prefix;
my $found = 0;
while (#physical_parts && #logical_parts) {
unshift #physical_suffix, pop(#physical_parts);
push #logical_prefix, shift(#logical_parts);
if (#physical_suffix ~~ #logical_prefix) {
$found = 1;
last;
}
}

The way I would solve this is adding this logic to the front controller (the file to which your server sends all nonexistant file requests, usually index.php).
$fontControllerPath = $_SERVER['SCRIPT_NAME'];
$frontControllerPathLength = strlen($fontControllerPath);
$frontControllerFileName = basename($fontControllerPath);
$frontControllerFileNameLength = strlen($frontControllerFileName);
$subdirectoryLength = $frontControllerPathLength - $frontControllerFileNameLength;
$url = substr($_SERVER['REQUEST_URI'], $subdirectoryLength - 1);
Here's a codepad demo.
What does this do? If the front controller is located (relative to the www root) in: /subdir/myapp/, then it's $_SERVER['SCRIPT_NAME'] would be /subdir/myapp/index.php. The actual request URI is contained in $_SERVER['REQUEST_URI']. Let's say, for example, that it is /subdir/myapp/controller/action?extras=stuff. To remove the subdirectory prefix we need to find the length of it. That is found by subtracting the length of the script name (retrieved from basename()) from the length of the script's name relative to the www root.
File that receives request: /subdir/myapp/index.php (length = 23)
Filename: index.php (length = 9)
-
-------------------------------------------------------------------
14 chars to remove
/subdir/mpapp/controller/action?extras=stuff
^
Cut off everything before here

Related

PHP - load a file depending on the name of the last directory in URL

How can I load a PHP file depending on the name of the last directory in the URL
example 1 url: http://www.example.com/en/auto/
example 2 url: http://www.example.com/en/auto/mercedes/
If the last directory in the url is auto (as in first url example) then load file auto.php
if the last directory in the url is mercedes (as in second url example) then load file mercedes.php

Apart from above solution you can try following:
$url = trim('http://www.example.com/en/auto', '/');
$loadUrl=substr($url, strrpos($url, '/')+1);
header("Location:".$loadUrl.".php");

As indicated by Michael Berkowski, the .htaccess is the right way to do it, however, you could achieve a (very) hack version with something similar to this (Note: I have no idea how you intend to "load" the file and from where, so I am using include()):
<?php
// Remove trailing forward slash
$str = rtrim('http://www.example.com/test/is/best/',"/");
// Simple match characters from back
$exp = preg_match("!/([a-zA-Z0-9]{1,})$!",$str,$match);
// $match[1] gives you: best
if(isset($match[1])) {
// $redirect gives you: best.php
if(is_file($redirect = $match[1].".php")) {
// Include the best.php
include($redirect);
}
}
?>

Changing base URL on part of a page only

I have a page on my site that fetches and displays news items from the database of another (legacy) site on the same server. Some of the items contain relative links that should be fixed so that they direct to the external site instead of causing 404 errors on the main site.
I first considered using the <base> tag on the fetched news items, but this changes the base URL of the whole page, breaking the relative links in the main navigation - and it feels pretty hackish too.
I'm currently thinking of creating a regex to find the relative URLs (they all start with /index.php?) and prepending them with the desired base URL. Are there any more elegant solutions to this? The site is built on Symfony 2 and uses jQuery.

Here is how I would tackle the problem:
function prepend_url ($prefix, $path) {
// Prepend $prefix to $path if $path is not a full URL
$parts = parse_url($path);
return empty($parts['scheme']) ? rtrim($prefix, '/').'/'.ltrim($path, '/') : $path;
}
// The URL scheme and domain name of the other site
$otherDomain = 'http://othersite.tld';
// Create a DOM object
$dom = new DOMDocument('1.0');
$dom->loadHTML($inHtml); // $inHtml is an HTML string obtained from the database
// Create an XPath object
$xpath = new DOMXPath($dom);
// Find candidate nodes
$nodesToInspect = $xpath->query('//*[#src or #href]');
// Loop candidate nodes and update attributes
foreach ($nodesToInspect as $node) {
if ($node->hasAttribute('src')) {
$node->setAttribute('src', prepend_url($otherDomain, $node->getAttribute('src')));
}
if ($node->hasAttribute('href')) {
$node->setAttribute('href', prepend_url($otherDomain, $node->getAttribute('href')));
}
}
// Find all nodes to export
$nodesToExport = $xpath->query('/html/body/*');
// Iterate and stringify them
$outHtml = '';
foreach ($nodesToExport as $node) {
$outHtml .= $node->C14N();
}
// $outHtml now contains the "fixed" HTML as a string
See it working

You can override the base tag by putting http:\\ in front of the link. That is, give a full url, not a relative URL.

Well, not actually a solution, but mostly a tip...
You could start playing aroung with ExceptionController.
There, just for example, you could seek for 404 error and check query string appended to request:
$request = $this->container->get('request');
....
if (404 === $exception->getStatusCode()) {
$query = $request->server->get('QUERY_STRING');
//...handle your logic
}
The other solution would be to define special route with its controller for such purposes, which would catch requests to index.php and do redirects and so on. Just define index.php in requirements of route and move this route on the top of your routing.
Not a clearest answer ever, but at least I hope I gave you a direction...
Cheers ;)

PHP + HTACCESS + mod_rewrite + different length url segments

Right, Good afternoon all (well, it is afternoon here in the UK!)
I am in the process of writing a (PHP/MySQL) site that uses friendly URLs.
I have set up my htaccess (mod_rewrite enabled) and have a basic script that can handle "/" and then everything else is handled after "?" in the same script. I.e. I am able to work out whether a user has tried to put example.com/about, example.com/about/the-team or example.com/join/?hash=abc123etc etc.
My question is how do I handle variable length URLs such as (examples):
example.com/about (node only)
example.com/about/the-team (node + seo-page-title)
example.com/projects (node only)
example.com/projects/project-x (node + sub-node)
example.com/projects/project-x/specification (node + sub-node + seo-friendly-title)
example.com/news/article/new-article (node + sub-node + seo-friendly-title)
example.com/join/?hash=abc123etc&this=that (node + query pair)
BUT, the "nodes" (first argument), "sub-nodes" (second argument) or "seo-friendly page titles" may be missing or unknown (database controlled) so I cannot put the processing in .htaccess specifically. Remember: I have already (I think!) got a working htaccess to forwards everything correctly to my PHP processing script. Everything not found will be forwarded to a CMS "404".
I think my client will have a maximum of THREE arguments (and then everything else will be after "?").
Has anyone tried this or have a place to start with a database structure or how to handle whether I have put any of the above possibilities?
I have tried in a previous project but have always had to resort to writing the CMS to force the user to have (whilst adding pages) at least a node OR a node + subnode + seo-friendly-title which I would like to get away from...
I don't want a script that will put too much strain on database searches by trying to find every single possibility of the arguments until a match is found... or is this the only way if I want to implement what I'm asking?
Many Thanks!

You can cater for different numbers of matches like this:
RewriteRule ^/([^/])* /content.php?part1=$1 [L,QSA,NC]
RewriteRule ^/([^/])*/([^/])* /content.php?part1=$1&part2=$2 [L,QSA,NC]
RewriteRule ^/([^/])*/([^/])/([^/])* /content.php?part1=$1&part2=$2&part3=$3 [L,QSA,NC]
Where [ ^ / ] to matches any character other than '/' - and then because that term was enclosed in () brackets, it can be used in the re-written URL.
QSA would handle all the parameters and correctly attach them to the re-written URL.
How you match up the parts with things that you know about is up to you but I imagine that something like this would be sensible:
$knownKnodes = array(
'about',
'projects',
'news',
'join',
);
$knownSubNodes = array(
'the-team',
'project-x',
'the-team'
);
$node = FALSE;
$subNode = FALSE;
$seoLinks = array();
if(isset($part1) == TRUE){
if(in_array($part1, $knownNodes) == TRUE){
$node = $part1;
}
else{
$seoLinks[] = $part1;
}
}
if(isset($part2) == TRUE){
if(in_array($part2, $knownSubNodes) == TRUE){
$subNode = $part2;
}
else{
$seoLinks[] = $part2;
}
}
if(isset($part3) == TRUE){
$seoLinks[] = $part3;
}
if(isset($part4) == TRUE){
$seoLinks[] = $part4;
}
Obviously the list of nodes and subNodes could be pulled from a DB rather than being hard-coded. The exact details of how you match up the known things with the free text is really up to you.

in wich structure does the php script get the information?
if the structure for 'example.com/news/article/new-article' is
$_GET[a]=news
$_GET[b]=article
$_GET[c]=new-article
you could check if $_GET[c] is empty; if not the real site is $_GET[b], and so one...
an other way is that $_GET[a] will return someting like 'news_article_new-article'
in this case you have an unique name for DB-search
I hope I understood you right

Including pages based on URL in PHP

Is this a terrible way to include pages based on the URL? (using mod_rewrite through index.php)
if($url === '/index.php/'.$user['username']) {
include('app/user/page.inc.php');
}
// Upload *
else if($url === '/index.php/'.$user['username'].'/Upload') {
include('app/user/upload.inc.php');
}
// Another page *
else if($url === '/index.php/AnotherPage') {
include('page/another_page.inc.php');
}
I'm using $_GET['variables'] through mod_rewrite for
^(.+)$ index.php?user=$1 [NC]
and a couple other base pages. But, those are just for the first argument on base files. The above if / else examples are also case sensitive which is really not good.
What are your thoughts on this?
How would I mod_rewrite these 2nd / 3rd etc. argument off of the index.php?
Would that be totally SEO incompatible with the aforementioned example?

I don't fully understand your question, per se.
What do you mean by "these 2nd / 3rd etc. argument"?
You can do the same steps in a more readable/maintainable manner as follows:
$urls = array(
'/index.php/'.$user['username'] => 'app/user/page.inc.php',
'/index.php/'.$user['username'].'/Upload' => 'app/user/upload.inc.php',
'/index.php/AnotherPage' => 'page/another_page.inc.php'
);
$url = $urls[$url];
If the '.inc.php' is consistant, you can remove that from each item of the array and add it at the end:
$url = $urls[$url].'inc.php'
Along the same lines, you can write the array in reverse (switch the keys and values in above array) and use preg_grep to search it. This will allow you to search the url without being case sensitive, as well as allowing wildcards.
$url = key( preg_grep("/$url/i", $urls));
See Here for a live interactive example.
Note that this is far less efficient, though for wildcard matches it is the best way.
(And for most pages, the inefficiency is livable.)

How do I detect subdomain and filter it?

I do have a domain search function. In search box you have the option to enter any kind of domain names. what I am looking into is how do I filter sub domain from search or else trim sub domain and keep only main.
for example if a user entered mail.yahoo.com then that to be convert to yahoo.com or it can be omitted from search.

Here's a more concise way to grab the domain and a likely subdomain from a URL.
function find_subdomain($url) {
$parts = parse_url($url);
$domain_parts = explode('.', $parts['host']);
while(count($domain_parts) > 4)
array_shift($domain_parts);
return join('.', $domain_parts);
}
Keep in mind that not everything that looks like a subdomain is really a subdomain. Some countries have their own country-specific domains that everyone uses, like .co.uk and .com.au. You can not rely on the number of dots in the URL to tell you what is and is not a subdomain. In fact, you might need the opposite approach - first remove the top-level domain, then see what's left. Unfortunately then you're left with the second-level domain problem.
Can you tell us more about what exactly you are trying to accomplish? Why are you trying to detect subdomains? You mentioned a search box. What is being searched?
Edit: I have updated the function to up to four of the right-most parts of the domain. Given "http://one.two.three.four.five.six.com" it will return 'four.five.six.com'

I customized an utility function that i'm using, it's close to perfection (but that's what you could get without hard-coding all the possible list of domain extensions).
Here's the catch: the assumes that the main domain contains at least 4 characters. i.e for: sub.mail.com, it returns mail.com But for sub.aol.com it returns sub.aol.com
function get_main_domain($host='') {
if(empty($host))$host=$_SERVER['HTTP_HOST'];
$domain_parts = explode('.',$host);
$count=count($domain_parts);
if($count<=2)return $host;
$permit=0;
for($i=$count-1;$i>=0;$i--){
$permit++;
if(strlen($domain_parts[$i])>3)break;
}
while(count($domain_parts) >$permit)array_shift($domain_parts);
return join('.', $domain_parts);
}

Well that doesnt work for all domain if you forgot to mention it in array...
Here is my solution...but I need to compress it to few lines...is it possible??
function subdomain($domainb){$bits = explode('/', $domainb);
if ($bits[0]=='http:' || $bits[0]=='https:'){
$domainb= $bits[2];
} else {$domainb= $bits[0];}
unset($bits);
$bits = explode('.', $domainb); $idz=0;
while (isset($bits[$idz])){$idz+=1;}
$idz-=4; $idy=0;
while ($idy<$idz){ unset($bits[$idy]);
$idy+=1;} $part=array();
foreach ($bits AS $bit){$part[]=$bit;}
unset($bit); unset($bits); unset($domainb);
if (strlen($part[1])>4){ unset($part[0]);}
foreach($part AS $bit){$domainb.=$bit.'.';}
unset($bit);
return preg_replace('/(.*)\./','$1',$domainb);}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Extract common path from URI and Web Root - php

Related

PHP - load a file depending on the name of the last directory in URL

Changing base URL on part of a page only

PHP + HTACCESS + mod_rewrite + different length url segments

Including pages based on URL in PHP

How do I detect subdomain and filter it?

Categories

Resources