Detecting language and keeping current url schema - php

Currently I just have one language in my site,
And I implemented the friendly urls vía the .htaccess, like:
RewriteRule ^post/(.+)/(.+) post.php?id=$2&friendly=1
So:
domain.com is the homepage and domain.com/the-title/5 is the page for the post with ID 5.
Now I would like to make that as the default language urls, and for example, next language would be:
domain.com/es is the homepage and domain.com/es/the-title/6 is the page for the post with ID 6 in spanish. (but previous rule should work, too)
Question is,
How should I adapt my (or additional) rewrite rules to check for the 2 first chars of the url (first split) and add it as a param, like: &lan=es and if it's not found then don't add this parameter?
Lets say:
^post/(.+)/(.+) post.php?id=$2&friendly=1 (english)
^es/post/(.+)/(.+) post.php?id=$2&friendly=1&lan=es (spanish)
But if posible,
To just work with more languages (and add, if needed, the extra parameter),
To just work wit other rules, like:
^es/photo/(.+)/(.+) photo.php?id=$2&friendly=1&lan=es (spanish)
Any suggestions?

Something like this might work. I haven't tested it but you can use RewriteCond to check for a specific structure of the uri and if it matches, use the following rule. If it doesn't then continue on to the original rule.
#Does the uri match 2 characters followed by /post/?
RewriteCond %{REQUEST_URI} ^../post/
#then use this rule and stop processing rules
RewriteRule ^(..)/post/(.+)/(.+) post.php?id=$3&friendly=1&lan=$1 [L]
#Else use this rule
RewriteRule ^post/(.+)/(.+) post.php?id=$2&friendly=1&lan=en
Edit: I added a default language to the end of the second rule. This way there is always a $_GET['lan'] parameter. You could leave it off and set a default in php. Your choice, no difference.

I can only answer you with advice cause we need more context...
Use default pages to do a temporary redirect (302) to the default langauge or the user language.
Use always the same scheme to get the language from the same pattern (http://mydomain.com/en/mypage.php)
Use complete language codes if you will have a large public or for much content, like en_US, fr_FR, fr_CA ...
Prefer negative search in your regex to avoid to capture the following characters, like "before/([^/]+)/after", in some cases, this is mandatory.
If you don't have the language information, the user is not coming from a valid url, redirect him to a page with language informations (default or user language).
If user is using direct php link, redirect him to the official link, to avoid duplicate content. You can use $_SERVER['REQUEST_URI'] to check it.
Use a framework to manage it or at least a base to control the routes.
With these advices, you could use only the following rewrite rule for all your website:
RewriteRule ^([^\/]+)/([^\.]+)\.([\.]+)$ index.php?lang=$1&route=$2&format=$3 [L,QSA]
Here I capture the language (es, en, en_US, fr...), the route (post/5, gotabeer, cats/postit/thumb/2) and the format (html, json, jpeg...).
(I didn't try the rewrite rule but it should work)

Here is what I would suggest:
RewriteRule ^/?((en|es)/)?post/(.+)/(.+)$ post.php?id=$4&friendly=1&lan=$2
Where /? allows optional forward slash at begining of string. This makes rule able to be moved interchangeably between htaccess directory contact and httpd.conf server context
((en|es)/)? Allows for optional specification of one of two accepted language codes.
Note that I did not suggest a wildcard for the language part, as I assume you are only working with a known subset of languages, so using something other than a known language code (or missing the entire thing) should fall through to handling be other rules (or perhaps result in 404).
If this is not the case you can change the first portion of the regex from (en|es) to (.{2}) if you expect exactly two characters, or perhaps (.{2}(-.{2})) if you expect to also handle language codes like es-ES.

This should work for you:
RewriteEngine On
RewriteRule ^([a-z]{2})/post/([^/]+)/([0-9]+)/?$ post.php?id=$3&friendly=1&lan=$1 [L,QSA]
RewriteRule ^post/([^/]+)/([0-9]+)/?$ post.php?id=$2&friendly=1&lan=en [L,QSA]

Related

How do I change ugly URLs to pretty URLs using .htaccess? [duplicate]

I need to grab some of my website's old URLs and do a 301 redirect to the new ones, since they are already indexed and we don't want to loose relevance after the change. The old URL is in fact very ugly and for some reason everything I try to do to rewrite it does not work. Here it is:
http://www.mywebsite.com/ExibeCurso.asp?Comando=TreinamentoGeral&codCurso=136&Titulo=Como%20Estruturar%20um%20Sistema%20Gerencial%20de%20Controles%20Organizacionais,13
Basically, I need to translate it into something like:
http://www.mywebsite.com/curso/136
From the old URL I need to check if the user typed "ExibeCurso.asp"; then I know I must send him here: /curso. I must also grab the integer that was in the querystring parameter "codCurso" (136). What is the regular expression I must use for this. I am using ISAPI_Rewrite 3, which basically implements htaccess on IIS, so there should be no difference in terms of syntax. Thanks.
Try this rule:
RewriteCond %{QUERY_STRING} ^([^&]*&)*codCurso=([0-9]+)(&.*)?$
RewriteRule ^/ExibeCurso\.asp$ /curso/%2? [L,R=301]
But I’m not sure whether ISAPI Rewrite requires the pattern to begin with a slash.
Off the top of my head, something like this should work:
RewriteRule ^ExibeCurso.asp(.*)$ http://www.mywebsite.com/curso/$1 [L,R=301]
That would at least send the traffic to /curso/ with all parameters attached. Maybe it's best to process it from there.

How to load a specific page for any given pathname URL

Let's say I have a web-page called www.mysite.com
How can I make it so whenever a page is loaded like www.mysite.com/58640 (or any random number) it redirects to www.mysite.com/myPHPpage.php?id=58640.
I'm very new to website development so I don't even really know if I asked this question right or what languages to tag in it...
If it helps I use a UNIX server for my web hosting with NetWorkSolutions
Add this to your .htaccess file in the main directory of your website.
RewriteEngine on
RewriteBase /
RewriteRule ^([0-9]+)$ myPHPpage.php?id=$1 [L]
Brief explanation: it says to match:
^ from start of query/page
[0-9] match numbers
+ any matches of 1 or more
$ end of page requested
The parentheses part say to look for that bit and store it. I can then refer to these replacement variables in the new url. If I had more than one parentheses group then I would use $2, $3 and so on.
If you experience issues with the .htaccess file please refer to this as permissions can cause problems.
If you needed to capture something else such as alphanumeric characters you'd probably want to explore regex a bit. You can do things such as:
RewriteRule ^(.+)$ myPHPpage.php?id=$1 [NC, L]
which match anything or get more specific with things like [a-zA-Z0-9], etc..
Edit: and #Jonathon has a point. In your php file wherever you handle the $_GET['id'] be sure to sanitize it if used in anything resembling an sql query or mail. Since you are using only numbers that makes it easy:
$id = (int)$_GET['id']; // cast as integer - any weird strings will give 0
Keep in mind that if you are not going to just use numbers then you will have to look for some sanitizing function (which abound on google - search for 'php sanitize') to ensure you don't fall to an sql injection attack.

Rewrite URL in PHP

I would like to rewrite the following URL
www.mysite.com/mypage.php?userid=ca49b6ff-9e90-446e-8a92-38804f3405e7&roleid=037a0e55-d10e-4302-951e-a7864f5e563e
to
www.mysite.com/mypage/userid/ca49b6ff-9e90-446e-8a92-38804f3405e7/roleid/037a0e55-d10e-4302-951e-a7864f5e563e
The problem here is that the php file can be anything. Do i have to specify rules for each page on the .htaccess file?
how can i do this using the rewrite engine in php?
To get the rewrite rule to work, you have to add this to your apache configs (in the virtualhost block):
RewriteEngine On
RewriteRule ^([^/]*)/userid/([^/]*)/roleid/(.*)$ /$1.php?userid=$2&roleid=$3 [L,NS]
RewriteRule basically accepts two arguments. The first one is a regex describing what it should match. Here it is looking for the user requesting a url like /<mypage>/<pid>/roleid/<rid>. The second argument is where it should actually go on your server to do the request (in this case, it is your php file that is doing the request). It refers back to the groups in the regex using $1, $2, and $3.
RewriteEngine on
RewriteBase /
RewriteRule ^mypage\/userid\/(([a-z0-9]).+)\/roleid\/(([a-z0-9]).+)$ www.mysite.com/mypage.php?userid=$1&roleid=$2
No you don't need a separate rule for every php file, you can make the filename variable in your regex something like this:
RewriteRule ^(a-z0-9)/userid/([a-z0-9].+)/roleid/([a-z0-9].+)$ $1.php?userid=$2&roleid=$3
If you want to rewrite the latter URL that is entered in the browser TO the first format, you would want to use a .htaccess file.
However, if you want to produce the pretty URLs in PHP (e.g. for use in link tags), then you have two options.
First, you could simply build the URL directly (instead of converting) which in my opinion is preferred.
Second, you could rewrite the first (ugly) URL to the pretty latter URL. You would then need to use preg_replace() in PHP. See http://php.net/manual/en/function.preg-replace.php for more info. Basically, you would want to use something like
$rewrittenurl = preg_replace("#mysite\.com\/mypage.php?userid=(([a-z0-9\-]).+)\&roleid=(([a-z0-9\-]).+)$", "mysite.com/userid/$1/roleid/$2", $firsturl);
Good luck!

PHP - How to add a pages title to the URL? And how to create a clean url using PHP

I was wondering how can I create clean urls using PHP. Do I do this all in PHP or do I need to use mod_rewrite in some way? Can someone explain this to me in laymans terms?
Here is my current url a element link and how it looks in the browser
http://www.example.com/members/1/posts/page.php?aid=123
But I want it to read the pages title.
http://www.example.com/members/1/posts/title-of-current-page/
First you need to generate "title-of-current-page" from PHP, using this function eg:
function google($string){
$string = strtolower($string);
$string = preg_replace('/[^a-zA-Z0-9]/i','-',$string);
$string = preg_replace("/(-){2,}/",'$1',$string);
return $string;
}
Second thing, you need to make a rewrite, but you should keep aid in form of "/123-title-of-current-page"
Rewrite would go something like this (I am ignoring your entire URL)
RewriteRule ^([0-9]+)-(.*?)$ page.php?aid=$1 [L,QSA]
You can do this using mod_rewrite:
You'll need to edit a file called .htaccess at the top level of your web folder. This is where you can specify certain settings to control the way Apache accesses items in this folder and below.
First things first. Let's turn on mod_rewrite: RewriteEngine On
RewriteRule ^([a-z]+)/([a-z\-]+)$ /$1/$2.php [L]
The rule matches any URL which is formed of lower case letters, followed by a /, then more lower case letters and/or hyphens, and appends .php to the end. It keeps track of anything wrapped in brackets () and refers to them later as $1 and $2, i.e. the first and second match. So if someone visits these URLs:
http://example.com/weblog/archive
it will be converted to following:
http://example.com/weblog/archive.php
You will find more details on :
http://wettone.com/code/clean-urls
You have to use a rewrite to direct all requests to an existing php file, otherwise you get all 404 not found errors because you are trying to get a page that simply is not there.
Unless you rewrite your 404 page to handle all requests and you definitely don´t want to go there...

mod_rewrite: no ? and # in REQUEST_URI

What I'm trying to do:
have pretty URLs in the format 'http://domain.tld/one/two/three', that get handled by a PHP script (index.php) by looking at the REQUEST_URI server variable.
In my example, the REQUEST_URI would be '/one/two/three'. (Btw., is this a good idea in general?)
I'm using Apache's mod_rewrite to achieve that.
Here's the RewriteRule I use in my .htaccess:
RewriteRule ^/?([a-zA-Z/]+)/?$ /index.php [NC,L]
This works really well thus far; it forwards every REQUEST_URI that consists of a-z, A-Z or a '/' to /index.php, where it is processed.
Only drawback: '?' (question marks) and '#' (hash keys) seem to still be allowed in the REQUEST_URI, maybe even more characters that I've yet to find.
Is it possible to restrict those via my .htaccess and an adequate addition to the RewriteRule?
Thanks!
The fragment identifer, e.g. #some-anchor, is controlled by the browser, not the server. JavaScript would be needed to redirect and remove this, although why you would want to do so I am not sure.
[SNIPPED after clarification]
To rewrite only when the query string is empty:
RewriteCond %{QUERY_STRING} ^$
RewriteRule ^/?([a-zA-Z/]+)/?$ /index.php [NC,L]
In mod_rewrite and PHP the variable REQUEST_URI refers to two different part of the URI. In mod_rewrite, %{REQUEST_URI} contains the current URI path; in PHP, $_SERVER['REQUEST_URI'] contains the URI path and query. But in both cases the URI fragment as this part of the URI is not transmitted to the server but only used by the client.
So, when /one/two/three?foo#bar is requested, mod_rewrite’s %{REQUEST_URI} contains /one/two/three and PHP’s $_SERVER['REQUEST_URI'] contains /one/two/three?foo.
The $_SERVER['REQUEST_URI'] variable will contain the original REQUEST_URI as received by the server, before you perform the rewrite. Therefore it's impossible (as far as I know this early in the morning) to remove the query string portion from the REQUEST_URI's attribute, but you naturally have the option of removing it when you process the $_SERVER['REQUEST_URI'] variable in your script.
If you want to only perform your RewriteRule when the query string is not specified, the following should work:
RewriteCond %{QUERY_STRING} !^.+$
RewriteRule ^/?([a-zA-Z/]+)/?$ /index.php [NC,L]
Note that this might be problematic though, since if there's accidentally a query string in a URL that someone uses to link to your site, your script wouldn't be handling it (since the rewrite never happens), so they'll get a 404 response (or whatever the case may be) that might not be as user-friendly as if you had just chosen to silently ignore the trailing information.
If i understand, you want to forbid using of ? and # for your site?
You shouldn't do that, because:
hash (#) is used in AJAX URLs google specification,
question mark (?) is used for example in Google AdWords and Analytics or any Affiliation Program,
So if you force Apache to reject url request containing question mark, people who click on your Ad in AdWords will only see 404 error page.
There is nothing bad in letting people to use both of them. The case is to prevent your site against XSS attacks.
Btw. there is another very importand sign - percent (%) which is used to encode special chars (like Polish or German national letters)

Categories