Htaccess: Trigger php or write to log - php

I want a redirect which is as fast as possible. So I decided to use htaccess redirect , because it responses even before the php interpreter is initialized. But I want to log the redirects and write something to the Database.
I tried to redirect and call rewritemap just to trigger a php file but, it only throws a 500 error.
I would be ok, if i can create a log file, even if the log processing would be delayed. Important is only: Fast redirect, track / log a redirect.
Have you got any ideas or recommendations on this?
thank you in upfront
all the best,
emre

You could use RewriteLog to log rewriting actions to a file -- that would be done by Apache, without invoking PHP.
=> Quite fast ; but logs only to a file, not a database ; still, as you said, the log processing can be delayed, and done by a script run from the crontab.
See also RewriteLogLevel, to configure how verbose that log should be.

Can be used in apache/vhost config, but not in .htaccess (so if you can put it there, remember to reload apache):
RewriteEngine On
RewriteRule /foo http://www.example.com [R=301,E=redirected]
CustomLog /path/to/log combined env=redirected
'combined' is a default log format, but you can define your own

If your goal is simply fast as possible with logging, then the primary point of concern is keeping disk I/O to a minimum. Relying on .htaccess, you're doing a directory scan at each level of the URL (http://muffinresearch.co.uk/archives/2008/04/07/avoiding-the-use-of-htaccess-for-performance/).
If you could setup your RewriteRule in Apache's conf, and point your redirects to a PHP file, then you could have PHP running w/ APC and have the logging done to a memcache object. That way your entire client's access could occur purely in fast-access memory, and you could have a cron-job that'd routinely take the data from memcache and push it to the database (this way you'd still have long-term storage but the client's access should never require the disk to be read.)
Obviously if you're flexible on the database, you could use a Couchbase-style solution that'd essentially let you have speed of writing to memcache without storing the information in volatile memory, but I'm guessing you're locked into the database you're currently using.

You might want t try an apache log to mysql module like:
http://www.outoforder.cc/projects/apache/mod_log_sql/
here is a howto for debian
http://www.howtoforge.com/apache2-logging-to-a-mysql-database-with-mod_log_sql-on-debian-etch
However I'm not 100% sure this works with the latest apache.
Good Luck

Related

PHP Sessions: Make non-PHP files require login

I am setting up a little page with a quite simple login-system via PHP Sessions. Business as usual: the correct Username/Password combination sets $_SESSION['login'] to true. When opening /page1.php or /page2.php a few lines of PHP will check the login state.
As I would like to have it nice and secure I would also like to keep unauthorized visitors from accessing files other than .php, for instance my javascript or CSS files.
I thought of a few ways to do that:
htaccess/htpasswd is the most obvious option, but I am searching for something more fancy. You know, having a custom UI etc...
mod_rewrite could redirect everything everything to a PHP-file like fetch.php?url=script.js, which could then execute my PHP before echoing the content of script.js. But this way I would have to mess around with MIME-types and it would bypass all other kinds of htaccess protection. Seems like a security risk to me.
declaring a auto_prepend_file in my .htaccess would do a similar job, yet it does not create any MIME-type problems or security issues. I couldn't really get it to work on my server, probably deactivated by my server-host.
Do you have any additional idea? I assume this is a common problem, so there should be a solution for it. Thanks in advance!
To keep it consistent (not have one login via php and one via basic auth) you'll need to run all your assets through php. However I highly recommend against this from several performance reasons:
Incurrs lot more to latency server files via PHP vs. the server daemon (nginx/apache)
Add unnecessary load on server CPU and memory
Wastes time locking up processes that could be use to serve up more requests
You'll never be able to use a CDN with this logged-in only requirement for
I think the main suggestion in my answer is to rethink what you're doing that requires you're client to be logged in to access CSS and JS assets. Are you putting passwords in the JS or something? If so, I recommend deeper evaluation of your architecture over passing all assets through PHP.

Count downloads without `echo file_get_contents($file)`?

I am now having download links on my server that directly points to files. I have a set of quite complicated rewrite rules but they don't affect what I am asking for.
What I want to do is to count the number of downloads. I know I could write a PHP script to echo the content and with a rewrite rule so that the PHP script will process all downloads.
However, there are a few points that I am worried about:
There is a chance that some dangerous paths (e.g. /etc/passwd, ../../index.php) will not be blocked due to carelessness or unnoticed bugs
Need to handle HTTP 404 Not Found response (and others) in the script which I prefer letting Apache handle them (I have an error handler script that rely on server redirect variables)
HTTP headers (like content type or modified time) may not be correctly set
Using a PHP script doesn't usually allow HTTP 304 Unmodified response so that browser caching will be useless, and re-download can consume extra bandwidth Actually I can check for that, but would require some more coding and debugging.
PHP script uses more processing power than directly loading the file directly by Apache
So, I would like to find some other ways to perform statistics. Can I, for example, make Apache trigger a script when certain files (in certain directories) are being requested and downloaded?
This may not be quite what you're looking for, but in the spirit of using the right tool for the job you could easily use Google Analytics (or probably any other analytics package) to track this. Take a look at https://support.google.com/analytics/bin/answer.py?hl=en-GB&answer=1136922.
Edit:
It would require the ability to modify the vhost setup for your site, but you could create a separate apache log file for your downloads. Let's say you've got a downloads folder to store the files that are available for download, you could add something like this to your vhost:
SetEnvIf Request_URI "^/downloads/.+$" download
LogFormat "%U" download-log
CustomLog download-tracking.log download-log env=download
Now, any time something is requested from the /downloads/ folder, it will be logged in the download-tracking.log file.
A few things to know:
You can have as many SentEnvIf lines as you need. As long as they all set the download environment variable, the request will be logged to the CustomLog
The LogFormat I've shown will log only the URI requested, but you can easily customize that to log much more than just the URI, see http://httpd.apache.org/docs/2.2/mod/mod_log_config.html#logformat for more details.
If you're providing PDF files, be aware that some browsers/plugins will make a separate request for each page of the PDF so you would need to account for that when you read the logs.
The primary benefit of this method is that it does not require any coding, just a simple config change and you're ready to go. The downside, of course, is that you'd have to do some kind of log processing. It just depends what is most important to you.
Another option would be to use a PHP script and the readfile function. This makes it much easier to log requests to a database, but it does come with the other issues you mentioned earlier.
There are ways to pipe Apache logs to MySQL, but from what I've seen it can be tricky. Depending on what you're doing, it may be worth the effort... but then again it might not.
You can parse the Apache log files.
Apaches mod_lua probably is the most general, flexible and effective approach to hooking own code into the request processing inside apache. Usually you chose that language for the task that offers the most direct approach. And lua is much better in teracting with c/c++ than anything else.
However there certainly are other strategies, so be creative. Two things come to my mind immediately:
some creative use of PAM if you are under some sort of unix like system: configure some kind of dummy authentication requirement and setup PAM for processing. Inside the PAM configuration you can do whatever you like. The avantage: you get requests and can filter yourself what to count and what not. You have to make sure the PAM response does not create a valid session though, so that you really get a tick for each request done by a client, not only the first one.
there are other apache modules that allow to do request processing. Have a look at the forensic module or the external filter module. Both allow to hook external logic into request processing. You will need cli based php configured for that.

PHP request file caching in load-balanced server environment

I'm looking to write a basic PHP file caching driver in a PHP application that routes all traffic to a front controller. For example's sake, assume the following simplified setup using apache mod_proxy_balancer:
In a single-server environment I would cache request responses on disk in a directory structure matching the request URI. Then, simple apache rewrite rules like the following could allow apache to return static cache files (if they exist) and avoid the PHP process altogether:
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /front_controller.php [L]
Obviously, this is problematic in a load-balanced environment because the cache file would only be written to disk on the specific PHP server where the request was served up and the results cached.
Solving the issue ...
So, to solve this problem, I figured I could knock out some code to have the individual back-end PHP servers write/delete cache data to the load balancer. However, being mostly ignorant as to the capabilities of mod_proxy_balancer (and any other load balancing options, really), I need some outside verification for the following questions:
And the questions ...
Is it possible to do some form of checking like the above RewriteRules to have the front-facing load balancer serve up a static file before sending off requests to one of the backend servers?
Is this even advisable? Should the load balancer be allowed to route traffic exclusively and not be bothered with serving up static content?
Would it be better to just use an acceptable TTL on the cached files at the PHP server level and deal with an accepted level of stale cache overlap?
Finally, apologies if this is is too broad or has already been answered; I'm not really sure what to search for as a result of my aforementioned ignorance on the load-balancing subject.
For the simplest solution, you can use NFS. Mount a file system via NFS on all of the PHP servers and it acts like local storage, but is the same for all servers. To get a little more sophisticated, use something like Nginx or Varnish that can cache what is on the NFS file system.
Using memcache is also a viable alternative, which is a distributed memory based storage system. The nice thing about memcache is that you don't need to manage cache clearing or purging if you don't want to. You can set TTL for each cached item, or if memcache gets full, it automatically purges cached items.
This sounds like something Nginx could do easily, and would remove the need to write to files on disk.
Nginx can do the load balancing and caching, here's a tutorial on it:
http://nathanvangheem.com/news/nginx-with-built-in-load-balancing-and-caching

Routing .htaccess to GitHub

I was wondering if there was a way to basically host a site on your server so you can run PHP, but have the actual code hosted on GitHub. In other words...
If a HTTP request went to:
http://mysite.com/docs.html
It'd request and pull in the content (via file_get_contents() or something):
https://raw.github.com/OscarGodson/Core.js/master/docs.html
Or, if they went to:
http://mysite.com/somedir/another/core.js
It'd pull down:
https://raw.github.com/OscarGodson/Core.js/master/somedir/another/core.js
I know GitHub has their own DNS servers, but id rather host it on my so i can run server side code. What would the htaccess code look like for this?
This is beyond the capabilities of .htaccess files, if the requirement is to run the PHP embedded in the HTML stored on github.com at the server on yourserver.com simply by a configuration line like a redirect in the .htaccess file.
A .htaccess file is typically used to provide directives to the Apache web server. These directives can indicate, for example, access permissions, popup password protection, linkages between URLs and the server's file system, handlers for certain types of files when fetched by the server before delivery to the browser, and redirects from one URL to another URL.
An .htaccess file can issue redirects for http://mysite.com/somedir/another/core.js to https://raw.github.com.... but then the browser will be pointed to raw.github.com, not mysite.com. Tricks can be done with frames to make this redirection less transparent to the human at the browser... but these dont affect the fact that the data comes from github.com without ever going to the server at mysite.com
In particular, PHP tags embedded in the HTML on github.com are never received by mysite.com's server and therefore will not run. Probably not want you want. Unless some big changes have occurred in Apache, .htaccess files will not set up that workflow. It might be possible for some expert to write an apache module to do it, but I am not sure.
What you can do is put a cron job on mysite.com that git pull's from github.com every few minutes. Perhaps that is what you want to do instead?
If the server can run PHP code, you can do this.
Basically, in the .htaccess file you use a RewriteRule to send all paths to a PHP script on your server. For example, a request for /somedir/anotherdir/core.js becomes /my-script.php/somedir/anotherdir/core.js. This is how a lot of app frameworks operate. When my-script.php runs the "real" path is in the PATH_INFO variable.
From that point the script could then fetch the file from GitHub. If it was HTML or JavaScript or an image, it could just pass it along to the client. (To do things properly, though, you'll want to pass along all the right headers, too, like ETag and Last-Modified and then also check those files, so that caching works properly and you don't spend a lot of time transferring files that don't need to be transferred again and again. Otherwise your site will be really slow.)
If the file is a PHP file, you could download it locally, then include it into the script in order to execute it. In this case, though, you need to make sure that every PHP file is self-contained, because you don't know which files have been fetched from GitHub yet, so if one file includes another you need to make sure the files dependent on the first file are downloaded, too. And the files dependent on those files, also.
So, in short, the .htaccess part of this is really simple, it's just a single RewriteRule. The complexity is in the PHP script that fetches files from GitHub. And if you just do the simplest thing possible, your site might not work, or it will work but really painfully slowly. And if you do a ton of genius level work on that script, you could make it run OK.
Now, what is the goal here? To save yourself the trouble of logging into the server and typing git pull to update the server files? I hope I've convinced you that trying to fetch files on demand from GitHub will be even more trouble than that.

deny access to certain folder using php

Is it possible to "deny from all" apache htaccess style using php.
I can't use htaccess because im using different webserver, so i wan't to use php to workaround it.
So let say user are trying to access folder name 'david', all content and subdirectory are denied from viewing.
No
PHP cannot be used to protect folders.
Because it is not PHP who serves requests, but a web server
You can move this catalog above Document Root to prevent web access to it.
But premissions will help you nothing
Use chmod to change the permissions on that directory. Note that the user running PHP needs to own it in that case.
If you just want to prevent indexing the folder, you can create an index.php file that does a simple redirection. Note: Requests that have a valid filename will still be let through.
<?php
header("Location: /"); // redirect user to root directory
Without cooperation from the webserver the only way to protect your files is
to encrypt them, in an archive, maybe, of which your script would know the password and tell no one - that will end up wasting cpu as the server will be decrypting it all the time, or
to use an incredibly deranged file naming scheme, a file naming scheme you won't ever describe to anyone, and that only your php script can sort trough.
Still data could be downloaded, bandwidth go to waste and encrypted files decrypted.
It all depends on how much that data matters. And how much your time costs, as these convoluted layers of somewhat penetrable obfuscation will likely eat huge chunks of developer time.
Now, as I said... that would be without cooperation from the webserver... but what if the webserver is cooperating and doesn't know?
I've seen some apache webservers, (can anyone confirm it's in the standard distribution?) for instance, come preloaded with a rule denying access to files starting with .ht, not only .htaccess but everything similar: .htproxy, .htcache, .htwhatever_comes_to_mind, .htyourmama...
Chances are your server could be one of those.
If that's the case... rename your hidden files .hthidden-<filename1>,.hthidden-<filename2>... and you'll get access to them only through php file functions, like readfile()

Categories