apache mod_rewrite to collect request info - php

I have some existing PHP code on my server. Now I want log-in complete information about requests that come to my server. I don't want to make any changes to existing code. I am using apache mod_rewrite for this. I have a sample php script,stats.php which looks something like this
<?php
/*NOTE:This is peseudo code!!!*/
open database connection
add serverinfo, referer info, script_name, arguments info to database
change characters in request from UTF16 to UTF 8.
//Call header function for redirection
$str = Location : $_SERVER["REQUEST_URI"]
header ("$str");
?>
In httpd.conf file
<IfModule mod_rewrite.c>
RewriteEngine on
RewriteCond %{REQUEST_URI} !/stats\.php
RewriteCond %{REQUEST_URI} !\/favicon\.php
RewriteRule ^/(.*)$ /stats.php?$1 [L]
RewriteLog "logs/error_log"
RewriteLogLevel 3
</IfModule>
The problem is, I am afraid this may not be best from SEO perspective and also may be buggy. Are there any better ways to do this? For example, can I use a script to the access_log file?

Say for example, if you go to http://your-domain.com/some-page.html, you'll get a loop:
Browser contacts server with request URI /some-page.html
mod_rewrite rewrites the URI to /stats.php?some-page.html
The stats.php does its thing, then redirects the browser to /some-page.html
Browser contacts server with request URI /some-page.html
repeat starting at #2
What you need to do instead of responding with the Location: header is read the contents of the some-page.html file and return that to the browser, essentially "proxying" the request for the browser. The browser therefore doesn't get redirected.
As for how to do that in php, there's plenty of google results or even plenty of answers on Stack Overflow.

I figured what I should do. I did the following
1) Add a custom logformat to httpd.conf file.
2) Added a customLog dirctive. Piped the output to stats.php.
3) stats.php takes care of adding the code to database.

Related

Redirect any GET request to a single php script

After many hours messing with .htaccess I've arrived to the conclusion of sending any request to a single PHP script that would handle:
Generation of html (whatever the way, includes or dynamic)
301 Redirections with a lot more flexibility in the logic (for a dumb .htaccess-eer)
404 errors finally if the request makes no sense.
leaving in .htaccess the minimal functionality.
After some tests it seems quite feasible and from my point of view more preferable. So much that I wonder what's wrong or can go wrong with this approach?
Server performance?
In terms of SEO I don't see any issue as the procedure would be "transparent" to the bots.
The redirector.php would expect a query string consisting on the actual request.
What would be the .htaccess code to send everything there?
I prefere to move all your php files in a other directory and put only 1 php file in your htdocs path, which handle all requests. Other files, which you want to pass without php, you can place in that folder too with this htaccess:
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ /index.php/$0 [L]
Existing Files (JPGs,JS or what ever) are still reachable without PHP. Thats the most flexible way to realize it.
Example:
- /scripts/ # Your PHP Files
- /htdocs/index.php # HTTP reachable Path
- /htdocs/images/test.jpg # reachable without PHP
- /private_files/images/test.jpg # only reachable over a PHP script
You can use this code to redirect all requests to one file:
RewriteEngine on
RewriteRule ^.*?(\?.*)?$ myfile.php$1
Note that all requests (including stylesheets, images, ...) will be redirected as well. There are of course other possibilities (rules), but this is the one I am using and it will keep the query string correct. If you don't need it you can use
RewriteEngine on
RewriteRule ^.*?$ myfile.php
This is a common technique as the bots and even users only see their requested URL and not how it is handled internally. Server performance is not a problem at all.
Because you redirect all URLs to one php file there is no 404 page anymore, because it gets cached by your .php file. So make sure you handle invalid URLs correctly.

301 redirect in .htaccess for 30,000 errors

I've been tasked to clean up 30,000 or so url errors left behind from an old website as the result of a redesign and development.
I normally use .htaccess to do this, but I doubt it would be wise to have 30,000 301 redirects inside the .htaccess file!
What methods have some of you used to solve this problem?
Thanks in advance.
Here as you can do with apache httpd
RewriteMap escape int:escape
RewriteMap lowercase int:tolower
RewriteMap my_redir_map txt:map_rewrite.txt
RewriteCond ${my_redir_map:${lowercase:${escape:%{HTTP_HOST}%{REQUEST_URI}}}} ^(.+)$
RewriteRule .* http://%1 [R=301,L]
I use this rewrite rules usually directly inside apache httpd configuration.
Inside the map_rewrite.txt file you have a tab delimited file with the list of redirect in the following format:
www.example.it/tag/nozze www.example.it/categoria/matrimonio
www.example.it/tag/pippo www.example.it/pluto
www.example.it/tag/ancora www.google.com
Would be much easier if you can generalize the approach because the redirect have a common pattern. But if not, in this case you only need to add the redirected url into the list.
Take care to study the RewriteMap configuration, because you can also write the list into a different format, for example like a database table.
Please pay attention to this: I have added escape and lowercase only because there are accents into the urls I need to write. If your urls doesn't have accents, you can remove both.
If you want implement these redirects in php, here the code you need:
<?php
$dest_url = "http://example.com/path...";
header("HTTP/1.1 301 Moved Permanently");
header("Location: ".$dest_url);
Create a PHP page to operate as a 404 handler. It should inspect the incoming URL, check if it should map from an old page to a new page, then issue a 301. If there is no mapping then present a 404.
Simply set this page as the 404 handler in your .htaccess and there you go. IIRC this is how Wordpress used to handle 'clean' URLs on IIS before IIS7 brought in URL rewriting without needing a 3rd-party dll.
I have made a redirect class that is on the 404 page that will check the database if there is a valid page to 301 redirect to and redirect it instead of giving the 404 page. If it can't figure that out, it marks it in the database as a 404 page, so it can be fixed later.
Thanks for help guys. I've carried out the suggested course of action from freedev but have created a separate config file within Apache.
Within the httpd.conf file I have added:
# Map settings
Include "conf/extra/map.conf"
The map.conf file:
RewriteEngine On
RewriteEngine on
RewriteMap url_rewrite_map txt:conf/map.map
RewriteCond ${url_rewrite_map:$1|NOT_FOUND} !NOT_FOUND
RewriteRule ^(.*) http://website.com/${url_rewrite_map:$1} [R=301]
The map.map file is formatted as:
/oldname/ /newname
I've added quite a few of the urls for the redirection and so far so good, it isn't having a massive impact on the server like it did when added to .htaccess

how to redirect to another page if file exist in root?

I'm trying to make a script to put on all my pages on my site saying if so and so file exist in the root directory ("/") it will auto redirect to it and if the file isn't there it does nothing.
I'm using this so i can set up a maintenance mode for the site so i can take it down while im working on it. I already have made the maintenance page, I just don't know how to set up the script. The file name is maintenance.html and I only want it to be in the root file. I don't want to have to upload it to every directory to take the site down.
The file url would be http://domain.tld/maintenance.html and the script would go if the file is there and redirect to that file else if it's not there don't redirect.
I know the redirect code is (in HTML)
<meta HTTP-EQUIV="REFRESH" content="0; url=http://domain.tld">
You should try this in your .htaccess file:
RedirectPermanent / /maintenance.html
RedirectPermanent /page2.html /maintenance.html
RedirectPermanent /anotherpage.html /maintenance.html
And so on. So just do this for each page of your site, on a new line for each.
This will redirect each of your pages right away to the maintenance page.
.htaccess is the best way to do it in my opinion. (better than JavaScript)
Hope this helps.
EDIT:
To use it, first you put:
RedirectPermanent
And then a space and then the page you want to redirect to the maintenance page:
/page.html
And then another space and then the page you want to redirect to:
/maintenance.html
So, all together, here's an example:
RedirectPermanent /page.html /maintenance.html
Note the space in between RedirectPermanent, the page redirecting from and the page redirecting to.
The way it works, well I don't know. This isn't a script, it's a .htaccess file code.
You can use:
if(file_exists('/file.php'))
{
//do something if file exists
}
The better way would be the put a .htaccess named file in your root folder with the following content:
ErrorDocument 404 /maintenance.html
This redirects automatically to this page, if the called page is not existing.
A set of redirection rules for your webserver is what you need, methinks. If you're running Apache, mod_rewrite is the magic word, if you're running something else, well, then, I wouldn't know the magic word, but something similar exists for most servers, if not all.
But, using Apache's brilliant mod_rewrite, to redirect ALL traffic to a set page or address, e.g. during maintenance, is as simple as:
<IfModule mod_rewrite.c>
# Use mod_rewrite
RewriteEngine on
# If you want, you can exclude yourself by adding a condition for the redirection,
# i.e. if the RewriteCond matches, proceed with the RewriteRule
# This statement checks that the IP of the client isn't 123.456.789.012
RewriteCond %{REMOTE_ADDR} !123.456.789.012
# Redirect all traffic to /maintenance.html with a "307 Temporary Redirect",
# except traffic to the maintenance page.
RewriteCond %{REQUEST_FILENAME} !maintenance.html
RewriteRule .* /maintenance.html [R=307,L]
</IfModule>
Where should these instructions be, you ask? Well, since it's a temporary thing, the most logical would be in a .htaccess file in your webroot. But it's also possible to include the same in your servers/virtualhosts global configuration, which for a permanent ruleset would make sense from an optimization aspect.
To disable the redirection, it's enough to comment out either the RewriteEngine on statement, or the RedirectRule statement. You could also rename your .htaccess to something else or delete it.
It's not very efficient to write your own server-side script to check for a file when your webserver can do it for you. Use Apache's mod_rewrite capability in an .htaccess file; you'll enable (i.e. uncomment) your rewrite rules when you want to put your page in maintenance mode. Doing it this way would also allow you to access the website while you work on it if you put in a rule to allow access from your own IP.
If this is free hosting -- which it seems like it is -- then you may not be able to do this, but I don't see why it would be a major issue to do it. Most webserver software has some sort of rewrite function, and this is a fairly trivial rewrite.
Alternatively you could use a quick-and-dirty bit of Javascript similar to this (might not be exactly this):
<script type="text/javascript">location = www.yoursite.com/maintenance.html;</script>
It'd be better to use rewrites, though.

Deny ajax file access using htaccess

There are some scripts that I use only via ajax and I do not want the user to run these scripts directly from the browser. I use jQuery for making all ajax calls and I keep all of my ajax files in a folder named ajax.
So, I was hoping to create an htaccess file which checks for ajax request (HTTP_X_REQUESTED_WITH) and deny all other requests in that folder. (I know that http header can be faked but I can not think of a better solution). I tried this:
ReWriteCond %{HTTP_X_REQUESTED_WITH} ^$
ReWriteCond %{SERVER_URL} ^/ajax/.php$
ReWriteRule ^.*$ -
[F]
But, it is not working. What I am doing wrong? Is there any other way to achieve similar results. (I do not want to check for the header in every script).
The Bad: Apache :-(
X-Requested-With in not a standard HTTP Header.
You can't read it in apache at all (neither by
ReWriteCond %{HTTP_X_REQUESTED_WITH}
nor by
%{HTTP:X-Requested-With}), so its impossible to check it in .htaccess or same place. :-(
The Ugly: Script :-(
Its just accessible in the script (eg. php), but you said you don't want to include a php file in all of your scripts because of number of files.
The Good: auto_prepend_file :-)
But ... there's a simple trick to solve it :-)
auto_prepend_file specifies the name of a file that is automatically parsed before the main file. You can use it to include a "checker" script automatically.
So create a .htaccess in ajax folder
php_value auto_prepend_file check.php
and create check.php as you want:
<?
if( !#$_SERVER["HTTP_X_REQUESTED_WITH"] ){
header('HTTP/1.1 403 Forbidden');
exit;
}
?>
You can customize it as you want.
I'm assuming you have all your AJAX scripts in a directory ajax, because you refer to ^/ajax/.php$ in your non-working example.
In this folder /ajax/ place a .htaccess file with this content:
SetEnvIfNoCase X-Requested-With XMLHttpRequest ajax
Order Deny,Allow
Deny from all
Allow from env=ajax
What this does is deny any request without the XMLHttpRequest header.
There are only a few predefined HTTP_* variables mapping to HTTP headers that you can use in a RewriteCond. For any other HTTP headers, you need to use a %{HTTP:header} variable.
Just change
ReWriteCond %{HTTP_X_REQUESTED_WITH} ^$
To:
ReWriteCond %{HTTP:X-Requested-With} ^$
Just check for if($_SERVER['HTTP_X_REQUESTED_WITH']=='XMLHttpRequest'){ at the beginning of the document, if it's not set, then don't return anything.
edit
Here's why: http://github.com/jquery/jquery/blob/master/src/ajax.js#L370
edit 2
My bad, just read through your post again. You can alternatively make a folder inaccessible to the web and then just have a standard ajax.php file that has include('./private/scripts.php') as your server will still be able to access it, but no one will be able to view from their browser.
An alternative to using .htaccess is to use the $_SERVER['HTTP_REFERER'] variable to test that the script is being accessed from your page, rather than from another site, etc.

Protecting HTML files with .htaccess

My PHP app uses 404 Documents to generate HTML files so that multiple queries to the same HTML file only cause the generation to run once.
I'd like to intercept requests to the HTML files so that the user needs to have an established PHP Session in order to pull up the files.
In the best case, SESSION ID would be used in the URL and force it could be used as a further authentication. For example, logging in would issue you a SessionID and make only certain HTML files accessible to you.
I'm aware that by changing my cookies I could spoof a request, but that's fine.
How would I go about doing this?
Something like this could work (I haven't tested it):
RewriteCond %{HTTP_COOKIE} PHPSESSID=([a-zA-Z0-9]+)
RewriteCond %{REQUEST_FILENAME} %{REQUEST_FILENAME}-%1.html
RewriteRule ^ %{REQUEST_FILENAME}-%1.html
It assumes that you append "-$session_id.html" to filenames ($session_id is PHP's session ID).
It should be safe, and the benefit is that files are served by the web server directly without invoking PHP at all.
SetEnvIf HTTP_COOKIE "PHPSESSID" let_me_in
<Directory /www/static/htmls>
Order Deny,Allow
Deny from all
Allow from env=let_me_in
</Directory>
Of course user can manually create such cookie in his browser (there are extensions which do that for Firefox, and you can always edit your browser's cookie store).
You could use the Apache module mod_rewrite to redirect requests of .html URLs to a PHP script:
RewriteEngine on
RewriteRule \.html$ script.php [L]
The requested URI path and query is then available in the $_SERVER['REQUEST_URI'] variable.
Put you cached files out of your web root, but still in a place where PHP can access them.

Categories