Maybe I'm doing something stupid, but I can't get rid of an issue with htaccess.
I'm trying to match a function name in a documentation site and I'm getting errors I can't understand. I must point that I (think I) know about regular expressions escaping, and I know what dot and backslash-dot mean.
So: i want to allow all of these:
example.com/foofunction
example.com/foofunction.php
example.com/function.foofunction
example.com/function.foofunction.php
These are the lines that I've tried. Those which cause error are misunderstood, so lots of thanks to anyone that can explain any to me:
^function\.([A-Za-z0-9_-]+)(\.php)?$ -> works, but makes function. mandatory
^(function\.)?([A-Za-z0-9_-]+)(\.php)?$ -> internal error... ok, let's not escape dot, in the end, it will match any character and will work...
^(function.)?([A-Za-z0-9_-]+)(\.php)?$ -> internal error too! ok, just for trying, dot outside conditional?
^(function)?\.([A-Za-z0-9_-]+)(\.php)?$ -> works, ok, but it makes dot mandatory. By the way, more crazy things:
^(function)?.([A-Za-z0-9_-]+)(\.php)?$ -> if dot isn't escaped (imagine I want to allow any character), internal error too. Now i`ll try to make dot optional separately
^(function)?(\.)?([A-Za-z0-9_-]+)(\.php)?$-> internal error too, i'm going crazy...
These are my tries up to now, I'm going to try optional lookbehind and update with results... anyway, i'd love to understand whi those regexes cause internal error.
And if anyone knows about an "htaccess special regex exceptions" reference or something like that i must read, wil be very wellcome.
Thanks in advance to all of you guys.
Use non capturing groups for everything apart from the actual function name:
^(?:function\.)?([A-Za-z0-9_\-]+)(?:\.php)?$
Let's break that down:
^ # assert start of string
(?:function\.)? # optionally allow the string "function."
([A-Za-z0-9_\-]+) # capture the function name - this could be shortened to ([-\w]+)
(?:\.php)? # optionally allow the string ".php"
$ # assert end of string
So your .htaccess would look (I guess) something like this:
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(?:function\.)?([A-Za-z0-9_\-]+)(?:\.php)?$ doc.php?functionname=$1 [L,QSA]
IMPORTANT POINT and the actual solution in this case:
You must use a sensible combination of RewriteCond and (usually) the [L] flag to ensure that the rule matches only once.
mod_rewrite behaves in a slightly counter-intuitive way that is not always immediately apparent: it keeps running the rules over and over until there are no more matches. So, let's say I use the rule outlined above:
RewriteRule ^(?:function\.)?([A-Za-z0-9_\-]+)(?:\.php)?$ doc.php?functionname=$1
...and I supply to this rule the input function.myfunc.php. First, it will be rewritten to:
doc.php?functionname=myfunc
However, next time it will match again. And it will be rewritten to:
doc.php?functionname=doc
...and this will keep happening over and over until MaxRedirects is reached and Apache will throw an error - which you will see on the client side as a 500 response.
The solution to this depends on your exact use case, but a common solution (the one I used above) is to check whether the requested file exists before applying the rewrite rule. By doing this, on the second iteration the rule will not be applied, and the request will be allowed to fall through for further processing.
The [L] flag is also commonly (over)used - this causes the current iteration of the rewrite process to stop, and start again at the next iteration. It effectively does the same thing as continue does to a loop in PHP.
Since Apache 2.3, a much more useful flag (to this situation) is available - [END]. This gives the behaviour most people expect from [L], it causes the rewrite process to halt immediately with no further iterations, like the break construct in PHP. Using this would mean that the aforementioned RewriteConds are no longer necessary. However, because this is only available in 2.3+, it can't be safely used unless you know for certain it will be available in every environment you run on.
Related
I am trying to trap old URL's of the form:
http://www.example.com/mpn_engine.php%3Ffamilyname%3Djiyalal+goswami%26menuopt%3D2%26submenuopt%3D1%26Search%3Dstuff
In my .htaccess file, with the help of various wise StackOverflowers as RegEx is alien to me, I have arranged to catch the PHP script 'mpn_engine.php' (both .php3 and newer .php copies) wherever it might be found (in any sub folder) and redirect visitors to the index page.
RewriteRule (^|/)mpn_engine\.php$ /index.html? [L,NC,R=301]
RewriteRule (^|/)mpn_engine\.php3$ /index.html? [L,NC,R=301]
The odd thing I am finding is that the above seems to work providing I seek after the php files exactly, or if I supply conventional parameters of the form:
http://www.example.com/lang/mpn_engine.php?x=fred
but as soon as I substitute a percent mark for the question mark, i.e. something like the following:
http://www.example.com/lang/mpn_engine.php%x=fred
The Rewrite fails, & and I get unpredictable results, usualy a a 404 but occassionally a 'Bad Gateway'.
How can I rewrite this ReWriteRule to catch this .php file in any folder it might be looked for and with any trailing characters, including a percent sign, and redirect it gracefully to the index page?
Thanks!
Your question has a number of sub-questions:
If you want to "catch this .php file in any folder it might be
looked for" then as long as your .htaccess file is in the root folder of your website (and not in a subfolder), then you are covered.
If you want to cover ANY trailing character, then you can make one of two changes to your rewrite rule:
Remove the ending $:
RewriteRule (^|/)mpn_engine\.php /index.html? [L,NC,R=301]
or
Add a wildcard after "php":
RewriteRule (^|/)mpn_engine\.php(.*)$ /index.html? [L,NC,R=301]
In the first case, if the $ present, this tells Apache to ONLY match if "php" is at the end of the URL. In the second case, this tells Apache to match if "php" is followed by zero or more of any other characters at the end of the URL. In either case, you do not need your second rewrite rule concerning "php3" -- either of these above will match for those instances as well.
The reason your first example with the "%" worked but subsequent attempts gave 404 errors is because the server translates "%3F" to "?", and "?" has a special meaning for web servers and is essentially ignored by your regex matcher -- thus the server acts as if "php" is the final part of the URL, and the rewrite succeeds.
Let's say I have a web-page called www.mysite.com
How can I make it so whenever a page is loaded like www.mysite.com/58640 (or any random number) it redirects to www.mysite.com/myPHPpage.php?id=58640.
I'm very new to website development so I don't even really know if I asked this question right or what languages to tag in it...
If it helps I use a UNIX server for my web hosting with NetWorkSolutions
Add this to your .htaccess file in the main directory of your website.
RewriteEngine on
RewriteBase /
RewriteRule ^([0-9]+)$ myPHPpage.php?id=$1 [L]
Brief explanation: it says to match:
^ from start of query/page
[0-9] match numbers
+ any matches of 1 or more
$ end of page requested
The parentheses part say to look for that bit and store it. I can then refer to these replacement variables in the new url. If I had more than one parentheses group then I would use $2, $3 and so on.
If you experience issues with the .htaccess file please refer to this as permissions can cause problems.
If you needed to capture something else such as alphanumeric characters you'd probably want to explore regex a bit. You can do things such as:
RewriteRule ^(.+)$ myPHPpage.php?id=$1 [NC, L]
which match anything or get more specific with things like [a-zA-Z0-9], etc..
Edit: and #Jonathon has a point. In your php file wherever you handle the $_GET['id'] be sure to sanitize it if used in anything resembling an sql query or mail. Since you are using only numbers that makes it easy:
$id = (int)$_GET['id']; // cast as integer - any weird strings will give 0
Keep in mind that if you are not going to just use numbers then you will have to look for some sanitizing function (which abound on google - search for 'php sanitize') to ensure you don't fall to an sql injection attack.
I just know how htaccess works but I am always confused with the writing syntax and I appreciate if anyone could help me solving the below htaccess issue.
I have couple pages linking to redirect to something like
http://mydomain.com.au/product-details.php/142/categoryAbstract
but due to the mistakes of previous developer the images are not loading unless that url is
http://mydomain.com.au/product-details.html/142/categoryAbstract
He converted all php pages to html (I really don't know what's this intention in doing that) but
now the url should work even if it as http://mydomain.com.au/product-details.php/142/categoryAbstract
He used the below htaccess for this but its not working. If I manually change the url from .php to .html everything working fine.
RewriteRule ^product-details.html/(.*)/(.*)$ product-details.php?productid=$1&category=$2
I need a working line of code so that even the url http://mydomain.com.au/product-details.php/142/categoryAbstract should work.
You will just need an OR group (a|b) to account for both possibilities:
RewriteRule ^product-details\.(html|php)/(.*)/(.*)$ product-details.php?productid=$1&category=$2
#---------------------------^^^^^^^^^^^
That can be improved a little though. The (.*) are greedy matches. You are better served to use ([^/]+) as the first grouping to match everything up to the next /. I have also escaped the dot as \. so it is matched as a literal instead of any character.
RewriteRule ^product-details\.(html|php)/([^/]+)/(.*)$ product-details.php?productid=$1&category=$2
The .php extension is commonly modified either through rewriting or actual file renaming and server configuration to parse .html as .php in order to hide some server-side information from end users. To prevent them from knowing what technologies the site runs on the back end. It less common to actually rename files to .html than to use URL rewriting to hide the .php, however.
RewriteRule ^product-details.html/(.*)/(.*)$ product-details.php?productid=$1&category=$2
What this rule does is take everything after product-details.html/ and before the last / and a second bit gets taken after the last / until the end of the line. then it takes those bits and puts them where the $1 and $2 are.
to change it so it accepts .html and .php you can change it with
RewriteRule ^product-details(.html|.php)/(.*)/(.*)$ product-details.php?productid=$2&category=$3
Because it looks like the first bit you are grabbing are numbers and (.*) is a greedy selector it may be better to replace it with ([0-9]*) which will only select numbers. that way if you ever have /s in your catagory you'll be fine. giving you:
RewriteRule ^product-details(.html|.php)/([0-9]*)/(.*)$ product-details.php?productid=$2&category=$3
I'm having a brain cramp. I'm using htaccess to rewrite a page and sometimes the variable that gets passed through will have a / (forward slash) in the variable. Sometimes there will be a slash and sometimes there won't but it is super important that all of this is treated as one variable. I'd really rather not reprogram all my pages with a str_replace() to switch a - for a / and then make a call to a database. For example:
http://www.example.com/accounting/finance.htm
Accounting/Finance is one variable that I need.....it is not in an accounting directory and then there's a page called finance.htm in accounting. So far I've got something like
RewriteRule ^([A-Za-z]+.*[A-Za-z]*)\.htm$ mypage.php?page=$1 [L,NC]
But it doesn't like it.
Can someone help me out?
Thanks in advance.
REPLY TO COMMENTS/ANSWERS
The specific rule that I'm looking for is something like this.....
[start of string]...1 or more letters...[possibility of a / followed by 1 or more letters].htm[end of string]
The two answers given below aren't working...I'm pretty sure it keeps treating it as a directory and not an actual "filename". As soon as I remove the forward slash the page works just fine...
If i get you right, you just need this one:
([A-Za-z/]*)\.htm
it should work with every combination of / or not-/
e.g.
accounting/finance.htm
test.htm
A slash is just another character. Apart from that, your regexp looks unnecessarily complex. For instance, .*[A-Za-z]* is not different from .* and also [A-Za-z] can be shortened to [a-z] if you use the [NC] flag.
Your precise rules are not entirely clear, but you probably want something on this line:
RewriteRule ^([a-z/]+)\.htm mypage.php?page=$1
Is it possible to use .htaccess to process all six digit URLs by sending them to a script, but handle every other invalid URL as an error 404?
For example:
http://mywebsite.com/132483
would be sent to:
http://mywebsite.com/scriptname.php?no=132483
but
http://mywebsite.com/132483a or
http://mywebsite.com/asdf
would be handled as a 404 error.
I presently have this working via a custom PHP 404 script but it's kind of kludgy. Seems to me that .htaccess might be a more elegant solution, but I haven't been able to figure out if it's even possible.
In your htaccess file, put the following
RewriteEngine On
RewriteRule ^([0-9]{6})$ /scriptname.php?no=$1 [L]
The first line turns the mod_rewrite engine on. The () brackets put the contents into $1 - successive () would populate $2, $3... and so on. The [0-9]{6} says look for a string precisely 6 characters long containing only characters 0-9.
The [L] at the end makes this the last rule - if it applies, rule processing will stop.
Oh, the ^ and $ mark the start and end of the incoming uri.
Hope that helps!
<IfModule mod_rewrite.c>
RewriteEngine on
RewriteRule ^([0-9]{6})$ scriptname.php?no=$1 [L]
</IfModule>
To preserve the clean URL
http://mywebsite.com/132483
while serving scriptname.php use only [L].
Using [R=301] will redirect you to your scriptname.php?no=xxx
You may find this useful http://www.addedbytes.com/download/mod_rewrite-cheat-sheet-v2/pdf/
Yes it's possible with mod_rewrite. There are tons of good mod_rewrite tutorials online a quick Google search should turn up your answer in no time.
Basically what you're going to want to do is ensure that the regular expression you use is just looking for digits and no other characters and to ensure the length is 6. Then you'll redirect to scriptname.?no= with the number you captured.
Hope this helps!