Couple of questions about .htaccess and friendly urls - php

I've always been bad at apache and used very simple solutions. Right now I have built a cms software.. but the .htaccess is starting to be a huge downsize.
I will first explain, how my friendly-urls work and look like. My language-switch is url based and always contains two characters. And it looks like this: stackoverflow.com/en/ this makes the switching really easy and since its url based.. it works well in the SEO terms. Also, if no language-id is set, then the default language will be used (stackoverflow.com/).
There are no page-ids in numbers. I have unique page-ids in text: stackoverflow.com/services.html and for SEO and folder-directories-anti-conflict purposes .html at the end..
For subpages I have "$current_page" and "$parent_page" style variables: stackoverflow.com/services/translating.html Services being the parent and translating being the current page.
Some sample code too (I nerfed it alot, so you don't think its incomplete):
RewriteRule ^(et|en|fi)\/(.+)\/(.+)\.html index.php?language=$1&pagelink=$3&parentlink=$2 [L,NC,QSA]
RewriteRule ^(.+)\/(.+)\.html index.php?language=0&pagelink=$2&parentlink=$1 [L,NC,QSA]
RewriteRule ^(.+)\.html index.php?language=0&pagelink=$1&parentlink=0 [L,NC,QSA]
How can I make the language-switch part more dynamic?
This method ..^(et|en|fi)\/.. means, that when I set up the cms, I must manually set the languages list. Best bet would be to set it somehow from the cms settings. Because, this way there are no conflicts related to folders. Is it possible global apace variable via php and then display it the .htaccess file? Something like this: ..^(LANGUAGELISTS)\/..? If this isn't possible, then next best thing would be to match 2 characters in that location and pass it as $_GET['language'].
How can I have unlimited parents dynamically?
Meaning, that the "$parent_page" is not set statically and I have unlimited children, similar to this: stackoverflow.com/services/translating/english/somesubpage.html. If that is possible, then also, how will it be used in the php, with an array?
Bounty edit
First part of the question is basically solved, unless somebody comes up with some php -> apache-array -> .htaccess way.
However, the second part of the question is still not solved. Since this is been the problem with all my projects and could possibly help somebody else in the future, I decided to add bounty to this question.

To answer your first question:
You could use RewriteRule ^([a-zA-Z]{2})([/]?)(.*)$ path/file.php?language=$1
This limits the first string to two characters and passes it on to $_GET['language']
Edit: adding RewriteCond %{REQUEST_FILENAME} !-f
and RewriteCond %{REQUEST_FILENAME} !-d will prevent conflicts with existing directories / files
Second question is much more difficult..
Update:
What Shad and toopay say is a good start in my opinion.
Using explode() to seperate levels and comparing it to the slug is quite simple.
But it's getting complicated once you want to add flexibility to the script.
function get_URL_items() {
$get_URL_items_url = $_SERVER['REQUEST_URI'];
$get_URL_items_vars = explode("/",$get_URL_items_url);
for ($get_URL_items_i = 0; $get_URL_items_i < count($get_URL_items_vars) ; $get_URL_items_i++) {
if(strlen(trim($get_URL_items_vars[$get_URL_items_i])) == 0) {
unset($get_URL_items_vars[$get_URL_items_i]);
}
}
return $get_URL_items_vars;
Let's say you you've got a website with a sub-section called "Festival" and a database filled with info for 100+ artist and you want your URLs to look like website.com/festival/<artistgenre>/<artistname>/.
You don't want to create 100+ pages in your CMS so <artistgenre> and <artistname> are some kind of wildcards.
I found it hard to achieve this without a lot of if/else statements like:
$item = get_URL_items();
if(is_user($item[2]) && is_genre($item[1]) && is_festival($item[0])) {
// do mysql stuff here
}

If I were you, I would use something like this:
.htaccess:
Options +FollowSymLinks
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !/main.php$
RewriteRule ^([a-zA-Z]{2})?(.*)$ main.php?lang=$1&path=$2 [L,QSA]
main.php:
$langs = array('en','de','ru'); // list of supported languages
$currentLang = isset($_GET['lang'])&&in_array($_GET['lang']) ? $_GET['lang'] : $defaultLang; // current selected language
$path = $_GET['path']; // current path
Then, in main.php you may parser path according to your needs

In answer to your bounty question I would use this:
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([A-Z]{2}\/)*(([A-Z]+\/)*)([A-Z]+)\.html$ index.php?lang=$1&parents=$2&pagelink=$4 [NC,QSA,L]
Since you want to be able to handle any number of generations/levels in your URL, have you thought about how you want to catch them in you PHP script?
You definitely don't want to be going and checking isset($_GET['parent1']);isset($_GET['parent2']) etc etc etc.
As some of the other responses have indicated, you really need to be parsing your URL inside your PHP script; to that end, my RewriteRule hands off the entire 'parents' section of the URL, which your script can then explode and parse; but doesn't interfere with normal no-parent urls. =)

I somehow think this answer won't be very popular but here goes anyway. :)
mod_rewrite reaches a point where using it the old fashioned way with regular expressions becomes annoying. I suggest you skip all the pain and swap to using an external program/script to do your rewrites. I wouldn't suggest you do rewrites on all files using this method, but instead just for the urls that most users will see and type. As long as you know how to write efficient code you can even redirect to a php script to do the rewrites (as I have done in the past on a very high traffic site) and it will not have a noticeable effect on load times. If you ever reach a point where the rewrites are the main thing slowing down your site you can then switch it out for a program written in a quicker language, however I'd be surprised if you reach that.
Some things be aware of:
You need to set a rewrite lock directive or you will get lots of crazy output.
Remember that the rewrite script is a command line PHP script. It has no knowledge of things such as the $_SERVER global. This is surprisingly easy to forget.
This script is loaded at server start so any changes to it require a server restart before they take effect.
Always test this on the command line by passing a url and checking the output before restarting the server. If your script is broken restarting the server will result in anything from non functioning rewrites to the server not starting at all.
It a bit more hassle in the beginning, but once you have set this up you will find adding new rewrite rules to be an absolute breeze and a hell of a lot more flexible.
Here is the only tutorial I was able to find on how to do this using PHP...
Using MySQL to control mod_rewrite via PHP
This is far from the standard way of doing rewrites so I imagine I'm going to cop a lot of flack for this answer. Oh well. :)

Well, for SEO part, i think its better to have slug for each article (referencing you are use this for CMS). Means in your database, you have some "translation" table which translate the requesting uri/slug and associated it with $parent_page.

Related

Rewrite URL to all lower case letters and redirect some URL to home page using .htaccess file

My problem is I want to write my own rule for .htaccess file. I have searched the internet, they didn't told me how to write my own specific rule. All I know is that it uses regular expressions. I have little knowledge of regex.
What I am looking for is I have the following links
http://example.com/work/project-1
http://example.com/work/project-2
when user enter like this
http://example.com/work/PrOJect-2
I want them to redirect to (converting all caps to small)
http://example.com/work/project-2
But when user enter like this
http://example.com/work/
or
http://example.com/work
I want to redirect to home page.
For doing so I have htaccess file
which have some rules like
RewriteEngine On
RewriteRule ^work(.*)/([^/]*) work-single.php$1?slug=$2 [QSA]
Please help me to write these rules.
What you are trying to achieve is called URL rewriting.
Here's an example of a working .htaccess file:
RewriteEngine ON
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php?url_params=$1 [L,QSA]
The result of using the above rules set is that the full URL from the request is being passed on the server-side to index.php, where the string can be fetched using:
$urlString = $_GET['url_params'];
and further processed, for an example slicing the string into an array of tokens:
$urlTokens = explode('/', $urlString);
foreach($urlTokens as $token){
// do something with each token
}
and finally analyse the tokens (a task that most people these days dont actually manually do becuase of the use of frameworks that take care of this automatically. e.g. Zend framework), for each valid "token" you take the appropriate action (in modern frameworks this is called "Routing").
For a complete tutorial on how to setup url rewriting, you can check:
https://code.tutsplus.com/tutorials/using-htaccess-files-for-pretty-urls--net-6049
It can be pretty straight forward, or completely complicated - depends on your needs.
Bear in mind that this is only an example and might not work for your exact situation, you might want to customise it to fit your needs - or at least learn from it and solve your problem.
I would recommend that you read and learn more about the subject, it will make it easier for you to use the technique.
Hope it helps a bit

Using the url query parameter with a controller

I am attempting to implement an oembed provider using the Silverstripe framework but have come across an issue.
I have a controller routed from the url /omebed.json and it works fine if I call something like /omebed.json?mediaurl=mymovie.mp4.
However the Oembed standard states it should be /omebed.json?url=mymovie.mp4
But Silverstripe internally checks the $_GET['url'] variable and will attempt to route to that page/controller.
So SilverStripe is trying to route to /mymovie.mp4 skipping my controller and hitting the ErrorPage_Controller creating a 404.
I'm thinking im going to have to extend the ErrorPage_Controller and rejig it if the url is oembed.json, but this seems a little hackish.
Any suggestions?
Cheers
Extending on #Stephen's answer, here is a way to get around that issue without duplicating main.php and without modifying it directly.
What I did was create a _ss_environment.php file which is added early on in the loading process of Silverstripe.
_ss_environment.php
global $url;
$url = $_GET['raw_url'];
if (isset($_GET['url']))
{
unset($_GET['url']);
}
// IIS includes get variables in url
$i = strpos($url, '?');
if($i !== false)
{
$url = substr($url, 0, $i);
}
.htaccess
RewriteCond %{REQUEST_URI} ^(.*)$
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_URI} !\.php$
RewriteRule .* framework/main.php?raw_url=%1 [QSA]
So here is what is happening:
The .htaccess is now using raw_url instead of url
_ss_environment.php is being called early in the loading process, setting the global $url variable that main.php normally sets. This is set with raw_url rather than url.
To prevent main.php to just override it again when it sees your url query string parameter, it is unset (Silverstripe seems to reset this later as far as my test is concerned).
Lastly is a little block of code that main.php would normally run if $_GET['url'] is set, copied as-is for apparent support in IIS. (If you don't use IIS, you likely won't need it.)
This has a few benefits:
No update to main.php allows upgrading Silverstripe slightly easier in the future
Runs the minimal amount of code needed to "trick" Silverstripe into thinking it is running normally.
The one obvious drawback to any solution for changing away form the url query string parameter is if anything looks at the parameter directly. With how Silverstripe works, it is more likely that code uses the $url global variable or the Director class rather than looking at the query string for the current URL.
I tested this on a 3.1 site by doing the changes I mentioned and:
Creating a controller called TestController
In the init function of the controller, I am running the following:
var_dump($_GET['url']);
var_dump($this->getRequest()->getVars());
Visited /TestController?url=abc123, saw the value of both dumps have "abc123" as the value for the URL parameter.
Navigated to a few other custom pages on the site to make sure they were still working (no issues that I saw)
Unfortunately, I haven't been able to find documentation for the order of inclusion in regards to _config.php and _ss_environment.php. However, after browsing through the code, I have worked out it is this:
main.php runs, first main task is to require core/Constants.php
Constants.php's first task is to search for _ss_environment.php in the base folder and potential parent folders. If it finds it, it will be included.
Going back to main.php (and after the $_GET['url'] check is done in main.php), it will start an ErrorControlChain which it internally does another require for core/Core.php
Inside Core.php, it performs calls for the config manifest
ConfigManifest.php exposes the functions to actually add _config.php files and for them to be required.
I could probably go on however I think this gives a pretty good picture of what is going on. I don't really see a way around not using the _ss_environment.php file. Nothing else gets included early enough that you can hook into without modifying core code.
I had a quick play with this the other day. And looking at what main.php does it might be best to hack away at it rather than ErrorPage_controller.
For startes SS's default .htaccess file does this:
<IfModule mod_rewrite.c>
SetEnv HTTP_MOD_REWRITE On
RewriteEngine On
# RewriteBase /silverstripe
RewriteCond %{REQUEST_URI} ^(.*)$
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule .* framework/main.php?url=%1&%{QUERY_STRING} [L]
</IfModule>
Note the ?url changing that to something else and then changing main.php's usage as well may/should help or will cause a heap of extra errors and sadness.
To avoid hacking the core/framework, you could change the .htaccess to target a copy of main.php in mysite (with appropriate include changes).

redirect all http requests

My company uses Xerox Docushare for document management. We are consolidating 2 docushare servers into one. Assuming users have a lot of docushare pages bookmarked in their browser, is it possible to place a php file in the root folder which will receive all these requests and perform a redirect.
For example
http://old-server/docushare/dsweb/View/Collection-xxxx
would get redirected to
http://new-server/docushare/dsweb/View/Collection-yyyy
The collection-xxxx to collection-yyyy would probably come from a file we intend to generate as part of the conversion.
I did take a look at
http://php.net/manual/en/function.header.php
but that is on a url level whereas i am looking to convert all requests on the older path.
Thanks.
By my opinion, the simplest way is to put .htaccess file. In the root of your document root
RewriteEngine on
Options +FollowSymLinks
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} ^old-server
RewriteRule http://old-server/docushare/dsweb/View/(.*)$ http://new-server/docushare/dsweb/View/$1 [R=301,L]
For more inspiration check this page
The PHP way
In front controller or whatever is hitten as first by web server, will be condition, using $_SERVER variable, similar to this
if($_SERVER['SERVER_NAME'] == 'old-server')
{
$redirectionPath = str_replace('http://old-server/docushare/dsweb/View/', 'http://new-server/docushare/dsweb/View/', $_SERVER['SERVER_NAME'].$_SERVER['REQUEST_URI']);
header(sprintf('Location: %s', $redirectionPath), 301);
}
This is the ugly way and you should not use it unless you have no other choice. Not to mention my blind written code ;)
I don't know exactly in what situation you are, but i think the .htaccess file solution solves issue you are experiencing

Safe routing in my own framework using PHP - how

As almost every programmer, I'm writing my own PHP framework for educational purposes. And now I'm looking at the problem with parsing URLs for MVC routing.
Framework will use friendly URLs everywhere. But the question is how to parse them in front controller. For example the link /foo/bar/aaa/bbb may mean "Call the controller's foo action bar and pass parameter aaa with value bbb. But in case someone installs a framework into the subdirectory of the domain root, the directory part should be stripped before determining controller name and action name. And I'm looking for a way to do it safely.
Also I would like to support a fallback case if URL rewriting is not supported on the server.
On different systems different sets of $_SERVER variables are defined. For example, on my local machine from the set of PATH_INFO, REQUEST_URI, REQUEST_URL, ORIG_REQUEST_URI, SCRIPT_NAME, PHP_SELF only REQUEST_URI, SCRIPT_NAME and PHP_SELF are defined. I wonder, if I can rely on them.
Mature frameworks like Symfony or ZF have some compicated algorithms of parsing URLs (at least it seemed to be so). So, I can't just take a part from there for mine.
Two workarounds:
Add config variable with url / instalation directory to your application, and strip it from $_SERVER['REQUEST_URI']
Make apache rewrite it to get variable
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule (.*) index.php?myrequest=$1 [QSA,L]
I'm currently doing the same research. But everything I see is so complicated that I'll most probably continue using mod_rewrite anyway. After all you end up with the same thing rather you use SEF with PHP or mod_rewrite with apache. Anyway I'll be monitoring this topic.. it's interesting :)
Hope the php gurus around here have some more info about this :)
Edit:
It really depends on what you want to do. For my needs I hardcoded most of the pages so they looked SEF. But something like the example below should work as well.
RewriteEngine on
RewriteRule ^/posts/([A-Za-z0-9_\-]+)/([A-Za-z0-9_\-]+)\.html$ posts.php?$1=$2 [NC]
With this example above:
http://localhost/posts/view/23
http://localhost/posts/delete/23
is equal to:
http://localhost/posts.php?view=23
http://localhost/posts.php?delete=23
It really depends on what exactly you're doing :)
The example above should be working but I haven't tested them.
I usually use the following for determining an application base URL path, assuming all your requests always goes through the same gateway script:
$base = dirname($_SERVER['PHP_SELF']);
For your second question, if you want to check if mod_rewrite is enabled, you can use:
if (in_array('mod_rewrite', apache_get_modules())) {
// rewrite is enabled
}
However, it doesn't necessarily means that RewriteEngine is enabled, so you probably should use an extra condition:
if (in_array('mod_rewrite', apache_get_modules()) &&
preg_match('/RewriteEngine +On/i', file_get_contents('/path/to/.htaccess'))) {
// rewrite is enabled and active
}
Maybe you could take PHP_SELF and remove the first n chars where n is the length of SCRIPT_NAME.
Edit: Oops... seems like you can just take PHP_SELF: http://php.about.com/od/learnphp/qt/_SERVER_PHP.htm

mod_rewrite to text/type/id

My current code is something like this
store.php?storeid=12&page=3
and I'm looking to translate it to something like this
mysite.com/roberts-clothing-store/store/12/3
and something like this:
profile.php?userid=19
to
mysite.com/robert-ashcroft/user/19
I understand that it's best to have the SEO-friendly text as far left as possible, ie not
mysite.com/user/19/robert-ashcroft
(what stackoverflow does)
I can't figure out how to do this in apache's mod_rewrite.
Actually, you may have to think "upside-down" with mod_rewrite.
The easiest way is that to make your PHP emit the rewritten mysite.com/roberts-clothing-store/store/12/3 links.
mod_rewrite will then proxy the request to one PHP page for rewrite.php?path=roberts-clothing-store/store/12/3 that will decode the URL and sets the arguments (here storeid and page)
and dynamically include the correct PHP file, or just emit 301 for renamed pages.
A pure solution with mod_rewrite is possible, but this one is much easier to get right, especially when you don't master mod_rewrite.
The main prob could be with the overhead that might be significant but is the price of simplicity & flexibility. mod_rewrite is much faster
Update:
The other posts do answer the question, but they don't solve the typical duplicate-content problem that avoided by having canonical urls (and using 301/404 for all those URLs that seems ok, but aren't).
Try these rules:
RewriteRule ^[^/]+/store/([0-9]+)/([0-9]+)$ store.php?storeid=$1&page=$2
RewriteRule ^[^/]+/user/([0-9]+)/ profile.php?userid=$1
But I wouldn’t use such URLs. They don’t make sense when you think of the URL path as a hierarchy and the path segments as their levels.
RewriteRule ^roberts-clothing-store/store/([^.]+)/([^.]+)$ store.php?id=$1&page=$2
RewriteRule ^robert-ashcroft/user/([^.]+)$ profile.php?userid=$1
Then you can just use RewriteRule directive in a .htaccess like:
RewriteRule roberts-clothing-store/store/(\d+)/(\d+)$ store.php?storeid=$1&page=$2 [L]
See http://httpd.apache.org/docs/1.3/mod/mod_rewrite.html for help, or google.
My approach is to make the .htaccess as easy as possible and to do all the hard work in PHP:
RewriteRule ^(.*?)$ index.php?$1
This basically means to take everything and reroute it to my index.php file (in css/javascript/image directories I simply use "RewriteEngine off" to grand access to these files). In PHP I than just split("/", $param, 5) the string and run a foreach() to check all the parameters. Encapsulated in a nice function this works fine for me.
Update:
For this easy case I highly recommend the use of explode() instead of using split(), because explode() doesn't come with the overhead by using regular expressions.

Categories