.htacces url_rewrite difficulties - php

I have a problem with the configuration of the .htaccess of small website that I'm working on.
I want all pages to be redirected to index.php?page=REQUEST and that file will find in the database the content for the requested page.
The problem occurs when I have installed a forum, so I want these forum pages to redirect to the index.php?page=forum&params
Options +FollowSymlinks
RewriteEngine on
RewriteCond %{REQUEST_URI} /(.*).html
RewriteRule ^(.*)forum/category/(.*)?$ index\.php?page=forum&lang=$1&category=$2 [L]
RewriteRule ^(.*)/(.*)(\.html?)$ index\.php?lang=$1&page=$2 [L]
RewriteRule ^(.*)(\.html?)$ index\.php?page=$1 [L]
Evetything works fine, except the forum part. How do I need to change the .htacces?

RewriteEngine on
RewriteRule \.(jpg|png|gif|svg|css|js)$ - [L]
RewriteRule ^(.*)/forum/topic/(.*)?$ index\.php?page=forum&lang=$1&topic=$2 [L]
RewriteRule ^(.*)/forum/category/(.*)?$ index\.php?page=forum&lang=$1&category=$2 [L]
RewriteRule ^(.*)/(.*)(\.html?)$ index\.php?lang=$1&page=$2 [L]
RewriteRule ^(.*)(\.html?)$ index\.php?page=$1 [L]

The problem appears to be that your RewriteCond is matching requests that end in .html. As your forum URLs don't end in .html the condition for the subsequent RewriteRule is never met.
There are some other possible problems too:
^(.*)forum will match www.url.com/en/ when it looks like you probably just want en
category/(.*) will match any characters, including forward slashes and the like. Presumably you just want it to match a decimal identifier.
Links to things that aren't covered by your rewrite config e.g. images
I'd probably rewrite your config to look something like this (N.B. not tested in Apache; only in a regex debugger):
RewriteEngine on
# only match forum URLs
# e.g url.com/en/forum/category/12345
RewriteCond %{REQUEST_URI} ^/.+/forum/category/[0-9]+
RewriteRule ^/(.+)/forum/category/([0-9]+) index.php?page=forum&lang=$1&category=$2 [L]
# match all URLs ending in .html
# e.g. url.com/en/foo.html
# and url.com/foo.html
RewriteCond %{REQUEST_URI} ^/.+\.html$
# a bit complicated, this matches both
# /apage.html
# /folder/apage.html
RewriteRule ^(?:/(.+))?/(.+)\.html$ index.php?lang=$1&page=$2 [L]
The second RewriteRule should always provide a value for page but only provide a value for lang if the URL is of the form /lang/page.html. This should be OK if your index.php file can accept an empty lang parameter or supply a default value.
Alternatively, if you don't mind keeping your existing regex and it's only images, CSS etc you want to bypass in URL rewriting you can add some rules at the start to skip them e.g.
RewriteEngine on
# don't actually rewrite, and stop processing rules
RewriteRule \.(jpg|png|css|js)$ - [L]
# only match forum URLs
# e.g url.com/en/forum/category/12345
RewriteCond %{REQUEST_URI} ^/.+/forum/category/[0-9]+
RewriteRule ^/(.+)/forum/category/([0-9]+) index.php?page=forum&lang=$1&category=$2 [L]
etc...

Related

blocking crawlers on specific directory

I have a situation similar to a previous question that uses the following in the accepted answer:
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (googlebot|bingbot|Baiduspider) [NC]
RewriteRule .* - [R=403,L]
It just seems the rules provided from URL above block access to everything (including homepage level)
www.example.com/tbd_templates/
www.example.com/custom_post/
what I really need is to block access to the directories I specified (/tbd_templates/ ,/custom_post/ etc with status code 403) but allow access to the rest of the site structure.
My .htaccess is:
# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization}]
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
# END WordPress
anyone can help me?
RewriteCond %{HTTP_USER_AGENT} (googlebot|bingbot|Baiduspider) [NC]
RewriteRule .* - [R=403,L]
As mentioned in the linked answer, this code would need to go in the .htaccess file inside the directory you are trying to protect - so that it only applies to everything in that directory (denoted by the .* regex).
However, that is impractical if you need to protect several directories. In this case you should change the RewriteRule pattern to target the specific subdirectories you want to protect (touched on in the linked answer, but no example given).
For example, the following would need go before the WordPress code block (ie, before the # BEGIN WordPress comment marker). (You do not need to repeat the RewriteEngine directive, which already occurs later in the file.)
RewriteCond %{HTTP_USER_AGENT} (googlebot|bingbot|Baiduspider) [NC]
RewriteRule ^(tbd_templates|custom_post)($|/) - [F]
The first argument to the RewriteRule directive (the pattern) is a regular expression that matches against the requested URL-path, less the slash prefix.
The regex ^(tbd_templates|custom_post)($|/) matches requests for /tbd_templates or /custom_post (using regex alternation) or /tbd_templates/<anything> or /custom_post/<anything>.
The F flag is short for R=403. The L flag is not required here, it is implied when using F (or R=403).

Changing URLs when using $_get to determine webpage

I currently use $_GET['base'] to determine which homepage that the user visits.
This results in localhost/?base=administrator or localhost/?base=guest
I am also using this to control which page is the user at, such as
localhost/?base=guest&page=register
Is there any way to use mod_rewrite, or htaccess, to change how this system works?
Modifying my code is not an issue, is this possible?
EDIT:
I am trying to achive this:
localhost/?base=guest to localhost/guest
localhost/?base=admin to localhost/admin
localhost/?base=guest&page=register to localhost/guest/register
Below is my htaccess file
RewriteEngine On
RewriteRule ^([^/]*)/([^/]*)$ /?base=$1&page=$2 [L]
RewriteRule ^([^/]*)$ /?base=$1 [L]
Will the document path affect how it is being called? As I am using a case loop to include which items are needed.
This, however, works for localhost, but it will loop every other address to main.
RewriteEngine On
RewriteRule ^$ /index.php?base=guest[L]
But did not give a result as expected.
Your rules in .htaccess need to be in reverse order, like below:
RewriteRule ^([^/]*)/([^/]*)$ /?base=$1&page=$2 [L]
RewriteRule ^([^/]*)$ /?base=$1 [L]
That is because if it is kept in the order you have it, both localhost/?base=guest&page=register & localhost/?base=administrator will match the rule RewriteRule ^([^/]*)$ /?base=$1.
Having them in reverse order ensures that the first rule is matched only for localhost/?base=guest&page=register. It won't match the first rule for localhost/?base=administrator. I hope that helps.
You need to exclude your existent files and folders from the rule
RewriteEngine On
# if the request is a dir
RewriteCond %{REQUEST_FILENAME} -d [OR]
# or file
RewriteCond %{REQUEST_FILENAME} -f
#do nothing
RewriteRule ^ - [L]
RewriteRule ^([^/]*)/([^/]*)$ /?base=$1&page=$2 [L]
RewriteRule ^([^/]*)$ /?base=$1 [L]
So you can use this simple code:
RewriteEngine on
RewriteRule ^(\w+)$ index.php?base=$1 [L]
RewriteRule ^(\w+)/(\w+)$ index.php?base=$1&page=$2 [L]
\w will match symbols a-z, 0-9 and underscore _, I think those characters are enough for your case, but if you need expansion it will be easy
Also in this case you don't need to change your code, because you still get base and page parameters in the $_GET array
UPDATE:
to disable query string params page and base (other params may be needed) add these two lines to the code at the bottom:
RewriteCond %{THE_REQUEST} (\?|&)(page|base) [NC]
RewriteRule .* - [L,R=404]

Remove File Extension and URL Variable using .htaccess

I'm making up myself a small blog and I found a useful .htaccess file to remove file extensions:
AddType text/x-component .htc
RewriteEngine On
RewriteBase /
# remove .php; use THE_REQUEST to prevent infinite loops
RewriteCond %{THE_REQUEST} ^GET\ (.*)\.php\ HTTP
RewriteRule (.*)\.php$ $1 [R=301]
# remove index
RewriteRule (.*)/index$ $1/ [R=301]
# remove slash if not directory
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} /$
RewriteRule (.*)/ $1 [R=301]
# add .php to access file, but don't redirect
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteCond %{REQUEST_URI} !/$
RewriteRule (.*) $1\.php
This works just fine and all pages are showing up .php less. I know wanted to extend this so when I click a link to a specific blog post (say /blog/index.php?art=1) it just shows in the url as website/blog/1. I thought to tag on to the end of the .htaccess file:
RewriteRule ^blog/(.*)$ blog/index.php?art=$0 [L]
But that doesn't seem to be working. EDIT Actually it breaks the blog page so no snippets are pulled through from the DB
My .htaccess file is in the root directory and the blog files are /root/blog/index.php
Any help would be gratefully appreciated
Unlike most other languages, the parameters in .htaccess are not 0-based. To access the first parameter, you should use $1, not $0.
The following should work:
RewriteRule ^blog/(.*)$ blog/index.php?art=$1 [L]
It might also be worthwhile to add some tests in there, for example you might only want numerical values passed to art, so you can improve it using:
RewriteRule ^blog/([0-9]+)$ blog/index.php?art=$1 [L]
Also, it might be worthwhile to add the QSA flag, since this will also preserve any query string that is passed in the original URL:
RewriteRule ^blog/([0-9]+)$ blog/index.php?art=$1 [L,QSA]

htaccess rewrite url and check for 404

I`ve got an internet shop and want to use htaccess to shorten links
there are 3 cases of urls:
shop.com/shop/18 (number) - products.php?categoryid=$1
shop.com/shop/18/page-2 (number)/(page+number) - products.php?categoryid=$1&page=$2
shop.com/shop/18/9877 (number)/(number) - description?categoryid=$1&productid=$2
my try
RewriteRule ^shop/?$ shop.php
RewriteRule ^shop/(.*)/([0-9]+)/?$ description.php?categoryid=$1&productid=$2
RewriteRule ^shop/(.*)/page-(.*)/?$ products.php?categoryid=$1&page=$2
RewriteRule ^shop/(.*)/?$ products.php?categoryid=$1
With my try - 1 (works), 2 (works), 3 (doesn`t work)
How can I rewrite urls so?
How can I redirect to 404 page if e.g. there is no such number of
category or such product (guess check with php and mysql and then
redirect) ?
There are a number of ways that this can be dealt with;
All in htaccess (gets messy with multiple depths)
Combined htaccess and server side code
The best approach is the one that suits you based on how your store is coded. I personally feel that handling it in the server side code is better, it simplifies the htaccess file, and gives you more control with regards to validating data, and how you handle what is sent, to where, and how its processed when it gets there.
For example, in my htaccess file I have;
<IfModule mod_rewrite.c>
Options +FollowSymlinks
RewriteEngine on
#
# Do not apply rewrite rules for non required areas
RewriteCond %{REQUEST_URI} "/hidden-areas/" [OR]
RewriteCond %{REQUEST_URI} "/other-areas/"
RewriteRule (.*) $1 [L]
# Do Not apply if a specific file or folder exists
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
# The rules on how to rewrite the urls
RewriteRule (.*) /index.php?url=$1 [QSA,L]
</IfModule>
Basically, to explain this in a nutshell, I DONT rewrite anything for certain folders, I forward them straight on. This is to stop calls to scripts externally, or extra added systems being able to be accessed without issue.
I then, forward the entire url as a string through to my index page, and deal with what comes through using PHP, an example is below.
// collect the passed url
$url = $_GET['url'];
// split the url into parts
$url_parts = explode('/', $url);
/*
* start sorting what is what in the url
*/
// count how many parts there are
$url_parts_count = count($url_parts);
// determine the class/module
$class = $url_parts[0]; // generally the class/method/module depending on your system, thgough could be a category so run some checks
// determine the last part in the array
$last_url_part = ($url_parts_count - 1);
// set the last part of the url to be used
$slug = $url_parts[$last_url_part]; // generally the slug and will be empty if theres a trailing slash
etc etc etc
This is just a summary, i do far more, as this is taken from a CMS I wrote, but it should give you a very good starting point should you wish to get your hands dirty. Of course, Im happy to elaborate further if necessary.
The caveat of course, is if you are using an off-the-shelf system, they should provide you with this code already ;)
I have added below something based on your updated question, this will help if you do still plan to go the way you are :)
<IfModule mod_rewrite.c>
Options +FollowSymlinks
RewriteEngine on
RewriteBase /
#
# Do not apply rewrite rules for non required areas
RewriteCond %{REQUEST_URI} "/hidden-areas/" [OR]
RewriteCond %{REQUEST_URI} "/other-areas/"
RewriteRule (.*) $1 [L]
# Do Not apply if a specific file or folder exists
# RewriteCond %{REQUEST_FILENAME} !-f
# RewriteCond %{REQUEST_FILENAME} !-d
# The rules on how to rewrite the urls
RewriteRule ^([a-zA-Z0-9_-]+)$ /index.php?slug=$1 [QSA,L]
RewriteRule ^([a-zA-Z0-9_-]+)/$ /index.php?type=$1 [QSA,L]
RewriteRule ^([a-zA-Z0-9_-]+)/([a-zA-Z0-9_-]+)$ /index.php?type=$1&slug=$2 [QSA,L]
RewriteRule ^([a-zA-Z0-9_-]+)/([a-zA-Z0-9_-]+)/$ /index.php?type=$1&cat=$2 [QSA,L]
RewriteRule ^([a-zA-Z0-9_-]+)/([a-zA-Z0-9_-]+)/([a-zA-Z0-9_-]+)$ /index.php?type=$1&cat=$2&slug=$3 [QSA,L]
</IfModule>
Thanks, ended with
RewriteRule ^shop$ shop.php [L]
RewriteRule ^shop/([0-9]+)$ products.php?categoryid=$1 [L]
RewriteRule ^shop/([0-9]+)/(page-[0-9]+)$ products.php?categoryid=$1&page=$2 [L]
RewriteRule ^shop/([0-9]+)/([0-9]+)$ description.php?categoryid=$1&productid=$2 [L]

.htaccess mod_rewrite check if querystring var is avail

Basically I want to rewrite my urls so that it is website.com/folder/ sometimes though I need it to rewrite also website.com/folder/page/
Currently I have it working with just the website.com/folder/ but can not get it to check if there is a page, if I create just another rule under the folder one it reads that one, and gives me an empty page var, which is breaking my php. I struggle with .htaccess and any help would be appreciated.
Here is what I have that works with just the folder but I can not include a page.
Options +FollowSymLinks
RewriteEngine On
RewriteCond %{REQUEST_URI} !^/?(css|js|images|html|docs)/
RewriteRule ^([^/]*)/$ /?folder=$1 [QSA]
Here is what I tried to get it to work with either just a folder, or a folder and page
Options +FollowSymLinks
RewriteEngine On
RewriteCond %{REQUEST_URI} !^/?(css|js|images|html|doc)/
RewriteRule ^([^/]*)/$ /?folder=$1 [QSA]
RewriteCond %{REQUEST_URI} !^/?(css|js|images|html|doc)/
RewriteRule ^([^/]*)/([^/]*)/$ /?folder=$1&page=$2 [L,QSA]
Please Help!
Accordingly to the RewriteRule docs you should reverse the rules order in your rules set. Because in your configuration both rules have the same RewriteCond, the most specific rule (folder + page) should be atop and the most general rule should be the last one. If not when the first rule is matched the URL is rewritten and the second rule never matches. Also, probably you want to remove the trailing forward slash in the pattern of your folder + page rule (assuming that the second group in the pattern matches a page not a folder). So I think the whole thing should read:
RewriteCond %{REQUEST_URI} !^/?(css|js|images|html|doc)/
RewriteRule ^([^/]*)/([^/]*)$ /?folder=$1&page=$2 [L,QSA]
RewriteCond %{REQUEST_URI} !^/?(css|js|images|html|doc)/
RewriteRule ^([^/]*)/$ /?folder=$1 [L, QSA]

Categories