mod_rewrite: no ? and # in REQUEST_URI - php

What I'm trying to do:
have pretty URLs in the format 'http://domain.tld/one/two/three', that get handled by a PHP script (index.php) by looking at the REQUEST_URI server variable.
In my example, the REQUEST_URI would be '/one/two/three'. (Btw., is this a good idea in general?)
I'm using Apache's mod_rewrite to achieve that.
Here's the RewriteRule I use in my .htaccess:
RewriteRule ^/?([a-zA-Z/]+)/?$ /index.php [NC,L]
This works really well thus far; it forwards every REQUEST_URI that consists of a-z, A-Z or a '/' to /index.php, where it is processed.
Only drawback: '?' (question marks) and '#' (hash keys) seem to still be allowed in the REQUEST_URI, maybe even more characters that I've yet to find.
Is it possible to restrict those via my .htaccess and an adequate addition to the RewriteRule?
Thanks!

The fragment identifer, e.g. #some-anchor, is controlled by the browser, not the server. JavaScript would be needed to redirect and remove this, although why you would want to do so I am not sure.
[SNIPPED after clarification]
To rewrite only when the query string is empty:
RewriteCond %{QUERY_STRING} ^$
RewriteRule ^/?([a-zA-Z/]+)/?$ /index.php [NC,L]

In mod_rewrite and PHP the variable REQUEST_URI refers to two different part of the URI. In mod_rewrite, %{REQUEST_URI} contains the current URI path; in PHP, $_SERVER['REQUEST_URI'] contains the URI path and query. But in both cases the URI fragment as this part of the URI is not transmitted to the server but only used by the client.
So, when /one/two/three?foo#bar is requested, mod_rewrite’s %{REQUEST_URI} contains /one/two/three and PHP’s $_SERVER['REQUEST_URI'] contains /one/two/three?foo.

The $_SERVER['REQUEST_URI'] variable will contain the original REQUEST_URI as received by the server, before you perform the rewrite. Therefore it's impossible (as far as I know this early in the morning) to remove the query string portion from the REQUEST_URI's attribute, but you naturally have the option of removing it when you process the $_SERVER['REQUEST_URI'] variable in your script.
If you want to only perform your RewriteRule when the query string is not specified, the following should work:
RewriteCond %{QUERY_STRING} !^.+$
RewriteRule ^/?([a-zA-Z/]+)/?$ /index.php [NC,L]
Note that this might be problematic though, since if there's accidentally a query string in a URL that someone uses to link to your site, your script wouldn't be handling it (since the rewrite never happens), so they'll get a 404 response (or whatever the case may be) that might not be as user-friendly as if you had just chosen to silently ignore the trailing information.

If i understand, you want to forbid using of ? and # for your site?
You shouldn't do that, because:
hash (#) is used in AJAX URLs google specification,
question mark (?) is used for example in Google AdWords and Analytics or any Affiliation Program,
So if you force Apache to reject url request containing question mark, people who click on your Ad in AdWords will only see 404 error page.
There is nothing bad in letting people to use both of them. The case is to prevent your site against XSS attacks.
Btw. there is another very importand sign - percent (%) which is used to encode special chars (like Polish or German national letters)

Related

URL Decoded Prior to htaccess Rewrite Rule

I have the following rewrite rule in .htaccess :-
RewriteRule ^.*/-y.* /handleurl.php [L]
Its purpose is to display appropriate pages depending on the values in the url, for example:
example.com/books/BookA/-y?act=x will display bookA page
the variable holding the book name is encoded such that ...
example.com/books/Book B/-y?act=x becomes example.com/books/book+B/-y?act=x
... which is fine (it's decoded in handleurl.php)
however if the book is called Book A/B I have ...
example.com/books/Book A/B/-y?act=x which becomes example.com/books/Book+A%2FB/-y?act=x
It appears that htaccess decodes this before the rewrite rule, so the rewrite rule sees too many elements in the URL delineated by the /.
Is there any way I can get the rewrite rule to ignore the encoded / as intended?
I have seen a previous response to a similar question, but I only need the / to be ignored, not other encoded characters.
It appears that htaccess decodes this before the rewrite rule, so the rewrite rule sees too many elements in the URL delineated by the /
This is not the problem. Regardless of whether the URL-path /books/Book+A%2FB/-y is decoded or not makes no difference here*1. Both would match the (rather generous) regex ^.*/-y.* in the RewriteRule pattern.
(*1 But yes, the URL-path matched by the RewriteRule pattern is URL decoded, ie. %-decoded.)
The problem is likely to be that Apache (by default) rejects - with a 404 - any URL that contains a %-encoded slash ie. %2F (or backslash %5C) in the URL-path portion of the URL. This is a security feature, that otherwise "could potentially allow unsafe paths" (source).
However, this can be overridden with the AllowEncodedSlashes directive. But this directive can only be used in a server or virtualhost context. It cannot be used in .htaccess.
You either need to set AllowEncodedSlashes On to allow encoded slashes, which are also decoded, as with other characters. Or set AllowEncodedSlashes NoDecode to permit encoded slashes, but do not decode them - which is preferred and probably what you are expecting.
Aside#1:
RewriteRule ^.*/-y.* /handleurl.php [L]
The regex ^.*/-y.* is very generic, possibly too generic. This is the same as simply /-y. What is the .* after -y intended to match? From your example URLs it looks like -y is always at the end of the URL-path, so this could be anchored, eg. /-y$. And if the URL that you need to match always starts /books/ then maybe this should also be included in the regex?
Aside#2:
...the book name is encoded such that ...
example.com/books/Book B/-y?act=x becomes example.com/books/book+B/-y?act=x ... which is fine (it's decoded in handleurl.php)
This isn't strictly "URL encoded", you have converted the space into a + in the URL-path. The + is a valid "URL encoding" for a space when used in the query string only. A + in the URL-path is a literal + (and will be seen by search engines as such). In the URL-path, a space would be URL encoded as %20. (You may have used the wrong PHP encoding functions, eg. urlencode() instead of rawurlencode()?)
Of course, you are free to convert/encode the URL however you wish to create a more readable URL - providing it's valid.
The rewrite rule was never the problem. I think it was Apache not liking the encoded '/' and the fact that the downstream url handling program was using '/' as a delimiter when identifying the individual url elements. I have to work out: 1) whether I want to allow '/' in the variables that make up the elements of the freindly url, and 2) if so how to pass it without upsetting Apache and how to subsequently disect the url. Maybe I will convert '/' to '~' for the benefit of the URL then convert back to '/' prior to subsequent display. Thank you Mr White.

.htaccess RewriteRule needed

I am trying to trap old URL's of the form:
http://www.example.com/mpn_engine.php%3Ffamilyname%3Djiyalal+goswami%26menuopt%3D2%26submenuopt%3D1%26Search%3Dstuff
In my .htaccess file, with the help of various wise StackOverflowers as RegEx is alien to me, I have arranged to catch the PHP script 'mpn_engine.php' (both .php3 and newer .php copies) wherever it might be found (in any sub folder) and redirect visitors to the index page.
RewriteRule (^|/)mpn_engine\.php$ /index.html? [L,NC,R=301]
RewriteRule (^|/)mpn_engine\.php3$ /index.html? [L,NC,R=301]
The odd thing I am finding is that the above seems to work providing I seek after the php files exactly, or if I supply conventional parameters of the form:
http://www.example.com/lang/mpn_engine.php?x=fred
but as soon as I substitute a percent mark for the question mark, i.e. something like the following:
http://www.example.com/lang/mpn_engine.php%x=fred
The Rewrite fails, & and I get unpredictable results, usualy a a 404 but occassionally a 'Bad Gateway'.
How can I rewrite this ReWriteRule to catch this .php file in any folder it might be looked for and with any trailing characters, including a percent sign, and redirect it gracefully to the index page?
Thanks!
Your question has a number of sub-questions:
If you want to "catch this .php file in any folder it might be
looked for" then as long as your .htaccess file is in the root folder of your website (and not in a subfolder), then you are covered.
If you want to cover ANY trailing character, then you can make one of two changes to your rewrite rule:
Remove the ending $:
RewriteRule (^|/)mpn_engine\.php /index.html? [L,NC,R=301]
or
Add a wildcard after "php":
RewriteRule (^|/)mpn_engine\.php(.*)$ /index.html? [L,NC,R=301]
In the first case, if the $ present, this tells Apache to ONLY match if "php" is at the end of the URL. In the second case, this tells Apache to match if "php" is followed by zero or more of any other characters at the end of the URL. In either case, you do not need your second rewrite rule concerning "php3" -- either of these above will match for those instances as well.
The reason your first example with the "%" worked but subsequent attempts gave 404 errors is because the server translates "%3F" to "?", and "?" has a special meaning for web servers and is essentially ignored by your regex matcher -- thus the server acts as if "php" is the final part of the URL, and the rewrite succeeds.

How do I change ugly URLs to pretty URLs using .htaccess? [duplicate]

I need to grab some of my website's old URLs and do a 301 redirect to the new ones, since they are already indexed and we don't want to loose relevance after the change. The old URL is in fact very ugly and for some reason everything I try to do to rewrite it does not work. Here it is:
http://www.mywebsite.com/ExibeCurso.asp?Comando=TreinamentoGeral&codCurso=136&Titulo=Como%20Estruturar%20um%20Sistema%20Gerencial%20de%20Controles%20Organizacionais,13
Basically, I need to translate it into something like:
http://www.mywebsite.com/curso/136
From the old URL I need to check if the user typed "ExibeCurso.asp"; then I know I must send him here: /curso. I must also grab the integer that was in the querystring parameter "codCurso" (136). What is the regular expression I must use for this. I am using ISAPI_Rewrite 3, which basically implements htaccess on IIS, so there should be no difference in terms of syntax. Thanks.
Try this rule:
RewriteCond %{QUERY_STRING} ^([^&]*&)*codCurso=([0-9]+)(&.*)?$
RewriteRule ^/ExibeCurso\.asp$ /curso/%2? [L,R=301]
But I’m not sure whether ISAPI Rewrite requires the pattern to begin with a slash.
Off the top of my head, something like this should work:
RewriteRule ^ExibeCurso.asp(.*)$ http://www.mywebsite.com/curso/$1 [L,R=301]
That would at least send the traffic to /curso/ with all parameters attached. Maybe it's best to process it from there.

Detecting language and keeping current url schema

Currently I just have one language in my site,
And I implemented the friendly urls vía the .htaccess, like:
RewriteRule ^post/(.+)/(.+) post.php?id=$2&friendly=1
So:
domain.com is the homepage and domain.com/the-title/5 is the page for the post with ID 5.
Now I would like to make that as the default language urls, and for example, next language would be:
domain.com/es is the homepage and domain.com/es/the-title/6 is the page for the post with ID 6 in spanish. (but previous rule should work, too)
Question is,
How should I adapt my (or additional) rewrite rules to check for the 2 first chars of the url (first split) and add it as a param, like: &lan=es and if it's not found then don't add this parameter?
Lets say:
^post/(.+)/(.+) post.php?id=$2&friendly=1 (english)
^es/post/(.+)/(.+) post.php?id=$2&friendly=1&lan=es (spanish)
But if posible,
To just work with more languages (and add, if needed, the extra parameter),
To just work wit other rules, like:
^es/photo/(.+)/(.+) photo.php?id=$2&friendly=1&lan=es (spanish)
Any suggestions?
Something like this might work. I haven't tested it but you can use RewriteCond to check for a specific structure of the uri and if it matches, use the following rule. If it doesn't then continue on to the original rule.
#Does the uri match 2 characters followed by /post/?
RewriteCond %{REQUEST_URI} ^../post/
#then use this rule and stop processing rules
RewriteRule ^(..)/post/(.+)/(.+) post.php?id=$3&friendly=1&lan=$1 [L]
#Else use this rule
RewriteRule ^post/(.+)/(.+) post.php?id=$2&friendly=1&lan=en
Edit: I added a default language to the end of the second rule. This way there is always a $_GET['lan'] parameter. You could leave it off and set a default in php. Your choice, no difference.
I can only answer you with advice cause we need more context...
Use default pages to do a temporary redirect (302) to the default langauge or the user language.
Use always the same scheme to get the language from the same pattern (http://mydomain.com/en/mypage.php)
Use complete language codes if you will have a large public or for much content, like en_US, fr_FR, fr_CA ...
Prefer negative search in your regex to avoid to capture the following characters, like "before/([^/]+)/after", in some cases, this is mandatory.
If you don't have the language information, the user is not coming from a valid url, redirect him to a page with language informations (default or user language).
If user is using direct php link, redirect him to the official link, to avoid duplicate content. You can use $_SERVER['REQUEST_URI'] to check it.
Use a framework to manage it or at least a base to control the routes.
With these advices, you could use only the following rewrite rule for all your website:
RewriteRule ^([^\/]+)/([^\.]+)\.([\.]+)$ index.php?lang=$1&route=$2&format=$3 [L,QSA]
Here I capture the language (es, en, en_US, fr...), the route (post/5, gotabeer, cats/postit/thumb/2) and the format (html, json, jpeg...).
(I didn't try the rewrite rule but it should work)
Here is what I would suggest:
RewriteRule ^/?((en|es)/)?post/(.+)/(.+)$ post.php?id=$4&friendly=1&lan=$2
Where /? allows optional forward slash at begining of string. This makes rule able to be moved interchangeably between htaccess directory contact and httpd.conf server context
((en|es)/)? Allows for optional specification of one of two accepted language codes.
Note that I did not suggest a wildcard for the language part, as I assume you are only working with a known subset of languages, so using something other than a known language code (or missing the entire thing) should fall through to handling be other rules (or perhaps result in 404).
If this is not the case you can change the first portion of the regex from (en|es) to (.{2}) if you expect exactly two characters, or perhaps (.{2}(-.{2})) if you expect to also handle language codes like es-ES.
This should work for you:
RewriteEngine On
RewriteRule ^([a-z]{2})/post/([^/]+)/([0-9]+)/?$ post.php?id=$3&friendly=1&lan=$1 [L,QSA]
RewriteRule ^post/([^/]+)/([0-9]+)/?$ post.php?id=$2&friendly=1&lan=en [L,QSA]

Is it possible to use .htaccess to send six digit number URLs to a script but handle all other invalid URLs as 404s?

Is it possible to use .htaccess to process all six digit URLs by sending them to a script, but handle every other invalid URL as an error 404?
For example:
http://mywebsite.com/132483
would be sent to:
http://mywebsite.com/scriptname.php?no=132483
but
http://mywebsite.com/132483a or
http://mywebsite.com/asdf
would be handled as a 404 error.
I presently have this working via a custom PHP 404 script but it's kind of kludgy. Seems to me that .htaccess might be a more elegant solution, but I haven't been able to figure out if it's even possible.
In your htaccess file, put the following
RewriteEngine On
RewriteRule ^([0-9]{6})$ /scriptname.php?no=$1 [L]
The first line turns the mod_rewrite engine on. The () brackets put the contents into $1 - successive () would populate $2, $3... and so on. The [0-9]{6} says look for a string precisely 6 characters long containing only characters 0-9.
The [L] at the end makes this the last rule - if it applies, rule processing will stop.
Oh, the ^ and $ mark the start and end of the incoming uri.
Hope that helps!
<IfModule mod_rewrite.c>
RewriteEngine on
RewriteRule ^([0-9]{6})$ scriptname.php?no=$1 [L]
</IfModule>
To preserve the clean URL
http://mywebsite.com/132483
while serving scriptname.php use only [L].
Using [R=301] will redirect you to your scriptname.php?no=xxx
You may find this useful http://www.addedbytes.com/download/mod_rewrite-cheat-sheet-v2/pdf/
Yes it's possible with mod_rewrite. There are tons of good mod_rewrite tutorials online a quick Google search should turn up your answer in no time.
Basically what you're going to want to do is ensure that the regular expression you use is just looking for digits and no other characters and to ensure the length is 6. Then you'll redirect to scriptname.?no= with the number you captured.
Hope this helps!

Categories