strange .htaccess rule malfunction - php

I have the following data structure in PHP (shortened for brevity) in the array $job:
(
[jobid] => 33541166
[country] => South Africa
[subcounty] => Somerset West
[position] => Administrator (R7 500 p.m.)
)
I have the following .htacccess rule:
RewriteRule job/(.*)/(.*)/(.*)$ /info.php?jobid=$1&position=$2&city=$3
This rule intermittently works, but the reason why eludes me. Take the examples below:
https://site.co.za/job/33541166/administrator_r_7_500_pm/somerset_west
https://site.co.za/job/33541166/administrator_r_7_500_px/somerset_west
The job structure in the PHP array is the same for both. The URL is purely cosmetic, really, as the only criteria I use to retrieve the job from the database is the job ID (e.g., 33541168).
As you can see, the first URL has "pm" and the second one has "px", otherwise they are the same. The first link DOES NOT display the job, redirects to homepage, and the second one DOES display the job correctly, yet, the "px" is not in the position string above.
Then there is a completely different job:
https://site.co.za/job/33541168/receptionist_for_general_practitioner/durban_north
And it works 100% with no anomalies.
The code used to construct the URL to be clicked on the page is this ($jobid is used in $position and $city whenever these fields are not present (historical data issues):
if (!empty($v['position'])) {
$position = preg_replace("/\p{P}/", '', trim($v['position']));
$position = strtolower(str_replace(' ', '_', $position));
} else {
$position = $jobid;
}
if (!empty($v['subcounty'])) {
$city = preg_replace("/\p{P}/", '', trim($v['subcounty']));
$city = strtolower(str_replace(' ', '_', $city));
} else {
if (!empty($v['country'])) {
$city = preg_replace("/\p{P}/", '', trim($v['country']));
$city = strtolower(str_replace(' ', '_', $city));
} else {
$city = $jobid;
}
}
And the link is structured as follows:
<a style="color: #ffffff !important;"
href="<?php echo $fullurl; ?>/job/<?php echo $job['jobid']; ?>/<?php echo $position; ?>/<?php echo $city; ?>">
<?php echo $job['position']; ?>
</a>
Notes, in case needed:
`$job` is the PHP array containing the entire job particulars, as shown in shortened fashion above,
`$position` and `$city` are the modified strings for use in the URL.
I have initially thought maybe a duplicate ID (even though the DB has autoincrement on the jobid column) or a duplicate description, but that does not appear to be the case. I also considered that the . and the ( and ) in the administrator job position might be causing havoc, but I believe it shouldn't because of the regex I used in my PHP code. To confirm that, I removed these characters in a test, and still does not work consistently.
Every job I have checked shows the URL to click on in the browser in the correct format, so I do not think the PHP code above is fundamentally flawed, except maybe not optimal.
I do not have access to Apache server logs at this time.
Any ideas will be much appreciated.

The issue was with conflicting .htaccess rules. The job rule was before some obscure rules added by a previous developer:
RewriteRule .* / [R=301,L]
RewriteRule m/$ / [R=301,L]
RewriteRule m$ / [R=301,L]
RewriteRule m/(.*)$ / [R=301,L]
Not sure what the intention of those rules were in the first place. I have not seen these rules as they were between a bunch of deny from statements near the bottom of the 10kb file, which tells me that it possibly was done to attempt to mitigate some sort of an attack.

Related

Issue with & in a string submitted with $_GET

I'm building an "away"-page for my website and when a user posted a link to another website, each visitor clicking that link will be redirected first to the away.php file with an info that I am not responsible for the content of the linked website.
The code in away.php to fetch the incoming browser URI is:
$goto = $_GET['to'];
So far it works, however there's a logical issue with dynamic URIs, in example:
www.mydomain.com/away.php?to=http://example.com
is working, but dynamic URIs like
www.mydomain.com/away.php?to=http://www.youtube.com/watch?feature=fvwp&v=j1p0_R8ZLB0
aren't working since there is a & included in the linked domain, which will cause ending the $_GET['to'] string to early.
The $goto variable contains only the part until the first &:
echo $_GET['to'];
===> "http://www.youtube.com/watch?feature=fvwp"
I understand why, but looking for a solution since I haven't found it yet on the internet.
Try using urlencode:
$link = urlencode("http://www.youtube.com/watch?feature=fvwp&v=j1p0_R8ZLB0") ;
echo $link;
The function will convert url special symbols into appropriate symbols that can carry data.
It will look like this and may be appended to a get parameter:
http%3A%2F%2Fwww.youtube.com%2Fwatch%3Ffeature%3Dfvwp%26v%3Dj1p0_R8ZLB0
To get special characters back (for example to output the link) there is a function urldecode.
Also function htmlentities may be useful.
You can test with this:
$link = urlencode("http://www.youtube.com/watch?feature=fvwp&v=j1p0_R8ZLB0") ;
$redirect = "{$_SERVER['PHP_SELF']}?to={$link}" ;
if (!isset($_GET['to'])){
header("Location: $redirect") ;
} else {
echo $_GET['to'];
}
EDIT:
Ok, I have got a solution for your particular situation.
This solution will work only if:
Parameter to will be last in the query string.
if (preg_match("/to=(.+)/", $redirect, $parts)){ //We got a parameter TO
echo $parts[1]; //Get everything after TO
}
So, $parts[1] will be your link.

Reduce link (URL) size

Is it possible to reduce the size of a link (in text form) by PHP or JS?
E.g. I might have links like these:
http://www.example.com/index.html <- Redirects to the root
http://www.example.com/folder1/page.html?start=true <- Redirects to page.html
http://www.example.com/folder1/page.html?start=false <- Redirects to page.html?start=false
The purpose is to find out, if the link can be shortened and still point to the same location. In these examples the first two links can be reduces, because the first points to the root, and the second has parameters that can be omitted.
The third link is then the case, where the parameters can't be omitted, meaning that it can't be reduced further than to remove the http://.
So the above links would be reduced like this:
Before: http://www.example.com/index.html
After: www.example.com
Before: http://www.example.com/folder1/page.html?start=true
After: www.example.com/folder1/page.html
Before: http://www.example.com/folder1/page.html?start=false
After: www.example.com/folder1/page.html?start=false
Is this possible by PHP or JS?
Note:
www.example.com is not a domain I own or have access to besides through the URL. The links are potentially unknown, and I'm looking for something like an automatic link shortener that can work by getting the URL and nothing else.
Actually I was thinking of something like a linkchecker that could check if the link works before and after the automatic trim, and if it doesn't then the check will be done again at a less trimmed version of the link. But that seemed like overkill...
Since you want to do this automatically, and you don't know how the parameters change the behaviour, you will have to do this by trial and error: Try to remove parts from an URL, and see if the server responds with a different page.
In the simplest case this could work somehow like this:
<?php
$originalUrl = "http://stackoverflow.com/questions/14135342/reduce-link-url-size";
$originalContent = file_get_contents($originalUrl);
$trimmedUrl = $originalUrl;
while($trimmedUrl) {
$trialUrl = dirname($trimmedUrl);
$trialContent = file_get_contents($trialUrl);
if ($trialContent == $originalContent) {
$trimmedUrl = $trialUrl;
} else {
break;
}
}
echo "Shortest equivalent URL: " . $trimmedUrl;
// output: Shortest equivalent URL: http://stackoverflow.com/questions/14135342
?>
For your usage scenario, your code would be a bit more complicated, as you would have to test for each parameter in turn to see if it is necessary. For a starting point, see the parse_url() and parse_str() functions.
A word of caution: this code is very slow, as it will perform lots of queries to every URL you want to shorten. Also, it will likely fail to shorten many URLs because the server might include stuff like timestamps in the response. This makes the problem very hard, and that's the reason why companies like google have many engineers that think about stuff like this :).
Yea, that's possible:
JS:
var url = 'http://www.example.com/folder1/page.html?start=true';
url = url.replace('http://','').replace('?start=true','').replace('/index.html','');
php:
$url = 'http://www.example.com/folder1/page.html?start=true';
$url = str_replace(array('http://', '?start=true', '/index.html'), "", $url);
(Each item in the array() will be replaced with "")
Here is a JS for you.
function trimURL(url, trimToRoot, trimParam){
var myRegexp = /(http:\/\/|https:\/\/)(.*)/g;
var match = myRegexp.exec(url);
url = match[2];
//alert(url); // www.google.com
if(trimParam===true){
url = url.split('?')[0];
}
if(trimToRoot === true){
url = url.split('/')[0];
}
return url
}
alert(trimURL('https://www.google.com/one/two.php?f=1'));
alert(trimURL('https://www.google.com/one/two.php?f=1', true));
alert(trimURL('https://www.google.com/one/two.php?f=1', false, true));
Fiddle: http://jsfiddle.net/5aRpQ/

PHP + HTACCESS + mod_rewrite + different length url segments

Right, Good afternoon all (well, it is afternoon here in the UK!)
I am in the process of writing a (PHP/MySQL) site that uses friendly URLs.
I have set up my htaccess (mod_rewrite enabled) and have a basic script that can handle "/" and then everything else is handled after "?" in the same script. I.e. I am able to work out whether a user has tried to put example.com/about, example.com/about/the-team or example.com/join/?hash=abc123etc etc.
My question is how do I handle variable length URLs such as (examples):
example.com/about (node only)
example.com/about/the-team (node + seo-page-title)
example.com/projects (node only)
example.com/projects/project-x (node + sub-node)
example.com/projects/project-x/specification (node + sub-node + seo-friendly-title)
example.com/news/article/new-article (node + sub-node + seo-friendly-title)
example.com/join/?hash=abc123etc&this=that (node + query pair)
BUT, the "nodes" (first argument), "sub-nodes" (second argument) or "seo-friendly page titles" may be missing or unknown (database controlled) so I cannot put the processing in .htaccess specifically. Remember: I have already (I think!) got a working htaccess to forwards everything correctly to my PHP processing script. Everything not found will be forwarded to a CMS "404".
I think my client will have a maximum of THREE arguments (and then everything else will be after "?").
Has anyone tried this or have a place to start with a database structure or how to handle whether I have put any of the above possibilities?
I have tried in a previous project but have always had to resort to writing the CMS to force the user to have (whilst adding pages) at least a node OR a node + subnode + seo-friendly-title which I would like to get away from...
I don't want a script that will put too much strain on database searches by trying to find every single possibility of the arguments until a match is found... or is this the only way if I want to implement what I'm asking?
Many Thanks!
You can cater for different numbers of matches like this:
RewriteRule ^/([^/])* /content.php?part1=$1 [L,QSA,NC]
RewriteRule ^/([^/])*/([^/])* /content.php?part1=$1&part2=$2 [L,QSA,NC]
RewriteRule ^/([^/])*/([^/])/([^/])* /content.php?part1=$1&part2=$2&part3=$3 [L,QSA,NC]
Where [ ^ / ] to matches any character other than '/' - and then because that term was enclosed in () brackets, it can be used in the re-written URL.
QSA would handle all the parameters and correctly attach them to the re-written URL.
How you match up the parts with things that you know about is up to you but I imagine that something like this would be sensible:
$knownKnodes = array(
'about',
'projects',
'news',
'join',
);
$knownSubNodes = array(
'the-team',
'project-x',
'the-team'
);
$node = FALSE;
$subNode = FALSE;
$seoLinks = array();
if(isset($part1) == TRUE){
if(in_array($part1, $knownNodes) == TRUE){
$node = $part1;
}
else{
$seoLinks[] = $part1;
}
}
if(isset($part2) == TRUE){
if(in_array($part2, $knownSubNodes) == TRUE){
$subNode = $part2;
}
else{
$seoLinks[] = $part2;
}
}
if(isset($part3) == TRUE){
$seoLinks[] = $part3;
}
if(isset($part4) == TRUE){
$seoLinks[] = $part4;
}
Obviously the list of nodes and subNodes could be pulled from a DB rather than being hard-coded. The exact details of how you match up the known things with the free text is really up to you.
in wich structure does the php script get the information?
if the structure for 'example.com/news/article/new-article' is
$_GET[a]=news
$_GET[b]=article
$_GET[c]=new-article
you could check if $_GET[c] is empty; if not the real site is $_GET[b], and so one...
an other way is that $_GET[a] will return someting like 'news_article_new-article'
in this case you have an unique name for DB-search
I hope I understood you right

Including pages based on URL in PHP

Is this a terrible way to include pages based on the URL? (using mod_rewrite through index.php)
if($url === '/index.php/'.$user['username']) {
include('app/user/page.inc.php');
}
// Upload *
else if($url === '/index.php/'.$user['username'].'/Upload') {
include('app/user/upload.inc.php');
}
// Another page *
else if($url === '/index.php/AnotherPage') {
include('page/another_page.inc.php');
}
I'm using $_GET['variables'] through mod_rewrite for
^(.+)$ index.php?user=$1 [NC]
and a couple other base pages. But, those are just for the first argument on base files. The above if / else examples are also case sensitive which is really not good.
What are your thoughts on this?
How would I mod_rewrite these 2nd / 3rd etc. argument off of the index.php?
Would that be totally SEO incompatible with the aforementioned example?
I don't fully understand your question, per se.
What do you mean by "these 2nd / 3rd etc. argument"?
You can do the same steps in a more readable/maintainable manner as follows:
$urls = array(
'/index.php/'.$user['username'] => 'app/user/page.inc.php',
'/index.php/'.$user['username'].'/Upload' => 'app/user/upload.inc.php',
'/index.php/AnotherPage' => 'page/another_page.inc.php'
);
$url = $urls[$url];
If the '.inc.php' is consistant, you can remove that from each item of the array and add it at the end:
$url = $urls[$url].'inc.php'
Along the same lines, you can write the array in reverse (switch the keys and values in above array) and use preg_grep to search it. This will allow you to search the url without being case sensitive, as well as allowing wildcards.
$url = key( preg_grep("/$url/i", $urls));
See Here for a live interactive example.
Note that this is far less efficient, though for wildcard matches it is the best way.
(And for most pages, the inefficiency is livable.)

How do I create dynamic URLs?

I have a social network that allows users to write blogs and ask questions. I am wanting to create dynamic URLs that post the title of the blog or question on the end of the URL via PHP.
Example:
www.blah.com/the_title_here
Looking for the cleanest most efficient way to accomplish this.
You would usually store the URL-friendly "slug" in the database row, and then have a PHP script that finds posts matching that slug.
For example, if you have a script called index.php that took a parameter called slug...
<?php
if (isset($_GET['slug'])) {
$sql = "SELECT * FROM `your_table` WHERE slug = ? LIMIT 1";
$smt = $pdo->prepare($sql);
$smt->execute(array($_GET['slug']));
$row = $smt->fetchObject();
// do something with the matching record here...
}
else {
// display home page
}
...You could then re-write requests using .htaccess:
RewriteEngine on
RewriteRule ^(.+)$ index.php?slug=$1
Using the database to do this would be sad :(
There may be many cases where you do not need to lookup the database and you will with this method. eg:- www.blah.com/signup (no point here). And db connections eats up resources, serious resources...
RewriteEngine on
RewriteRule ^(.+)$ index.php?slug=$1
as shown by martin gets you the path or slug.
Most frameworks use filesystem to achieve cleaner URLs.
One folder to hold all files and
something which is similar in theory to
<?php
$default = "home";
//function to make sure the slug is clean i.e. doesnot contain ../ or something
if(isset($_GET['slug'])) $slug = clean($_GET['slug']);
if(!isset($slug)) $slug = $default;
$files = explode('/',$slug);// or any other function according to your choice
$file = "./commands/".$files[0].".php";
if(file_exists($file))
require_once($file);
else
require_once("./commands/".$default.".php");
You can make this as simple to as complicated as you want. You can even use the database to determine the default case like what Martin did, but that should be in the $default and not the first logic you use...
Advantages of doing it this way
It is way faster than querying the database
You can scale this a lot. Vertically eg: site.com/users/piyushmishra and site.com/forums/mykickassforum or even on deeper levels like site.com/category/category-name/post-name/comments/page-3
You can setup libraries and packages easier.Scaling horizontally (add more directories to check and each directory can have one/more modules setup) eg : ./ACLcommands/users.php , ./XMLRPC/ping.php
There are lots of open source software that do this, you can look at WordPress.org or MediaWiki.org to do this. You'll need a combination of .htaccess or Apache configuration settings to add mod_rewrite rules to them.
Next, you'll want a controller file as Martin Bean wrote to look up the post... but make sure you escape/sanitize/validate input properly, otherwise you can be vulnerable to SQL injection or XSS if you have JavaScript on your site.
So it's better to use the id method and only use the slug for pretty-url purposes. WordPress.org software also suggests that going only by the slug makes it slow once you have a lot of posts. So, you can use a combination of www.blah.com/slug-phrase-goes-before-the-numeric_id and write a RegExp to match: .*(\d+)$
"www.blah.com/$id/".preg_replace('/^[a-z-]+/','',preg_replace('/[ ,;.]+/','-',strtolower($title)))
and use only $id
from title
"How do I create dynamic URLs?"
it creates url
www.blah.com/15/how-do-i-create-dynamic-urls

Categories