Count downloads without `echo file_get_contents($file)`? - php

I am now having download links on my server that directly points to files. I have a set of quite complicated rewrite rules but they don't affect what I am asking for.
What I want to do is to count the number of downloads. I know I could write a PHP script to echo the content and with a rewrite rule so that the PHP script will process all downloads.
However, there are a few points that I am worried about:
There is a chance that some dangerous paths (e.g. /etc/passwd, ../../index.php) will not be blocked due to carelessness or unnoticed bugs
Need to handle HTTP 404 Not Found response (and others) in the script which I prefer letting Apache handle them (I have an error handler script that rely on server redirect variables)
HTTP headers (like content type or modified time) may not be correctly set
Using a PHP script doesn't usually allow HTTP 304 Unmodified response so that browser caching will be useless, and re-download can consume extra bandwidth Actually I can check for that, but would require some more coding and debugging.
PHP script uses more processing power than directly loading the file directly by Apache
So, I would like to find some other ways to perform statistics. Can I, for example, make Apache trigger a script when certain files (in certain directories) are being requested and downloaded?

This may not be quite what you're looking for, but in the spirit of using the right tool for the job you could easily use Google Analytics (or probably any other analytics package) to track this. Take a look at https://support.google.com/analytics/bin/answer.py?hl=en-GB&answer=1136922.
Edit:
It would require the ability to modify the vhost setup for your site, but you could create a separate apache log file for your downloads. Let's say you've got a downloads folder to store the files that are available for download, you could add something like this to your vhost:
SetEnvIf Request_URI "^/downloads/.+$" download
LogFormat "%U" download-log
CustomLog download-tracking.log download-log env=download
Now, any time something is requested from the /downloads/ folder, it will be logged in the download-tracking.log file.
A few things to know:
You can have as many SentEnvIf lines as you need. As long as they all set the download environment variable, the request will be logged to the CustomLog
The LogFormat I've shown will log only the URI requested, but you can easily customize that to log much more than just the URI, see http://httpd.apache.org/docs/2.2/mod/mod_log_config.html#logformat for more details.
If you're providing PDF files, be aware that some browsers/plugins will make a separate request for each page of the PDF so you would need to account for that when you read the logs.
The primary benefit of this method is that it does not require any coding, just a simple config change and you're ready to go. The downside, of course, is that you'd have to do some kind of log processing. It just depends what is most important to you.
Another option would be to use a PHP script and the readfile function. This makes it much easier to log requests to a database, but it does come with the other issues you mentioned earlier.
There are ways to pipe Apache logs to MySQL, but from what I've seen it can be tricky. Depending on what you're doing, it may be worth the effort... but then again it might not.

You can parse the Apache log files.

Apaches mod_lua probably is the most general, flexible and effective approach to hooking own code into the request processing inside apache. Usually you chose that language for the task that offers the most direct approach. And lua is much better in teracting with c/c++ than anything else.
However there certainly are other strategies, so be creative. Two things come to my mind immediately:
some creative use of PAM if you are under some sort of unix like system: configure some kind of dummy authentication requirement and setup PAM for processing. Inside the PAM configuration you can do whatever you like. The avantage: you get requests and can filter yourself what to count and what not. You have to make sure the PAM response does not create a valid session though, so that you really get a tick for each request done by a client, not only the first one.
there are other apache modules that allow to do request processing. Have a look at the forensic module or the external filter module. Both allow to hook external logic into request processing. You will need cli based php configured for that.

Related

PHP Sessions: Make non-PHP files require login

I am setting up a little page with a quite simple login-system via PHP Sessions. Business as usual: the correct Username/Password combination sets $_SESSION['login'] to true. When opening /page1.php or /page2.php a few lines of PHP will check the login state.
As I would like to have it nice and secure I would also like to keep unauthorized visitors from accessing files other than .php, for instance my javascript or CSS files.
I thought of a few ways to do that:
htaccess/htpasswd is the most obvious option, but I am searching for something more fancy. You know, having a custom UI etc...
mod_rewrite could redirect everything everything to a PHP-file like fetch.php?url=script.js, which could then execute my PHP before echoing the content of script.js. But this way I would have to mess around with MIME-types and it would bypass all other kinds of htaccess protection. Seems like a security risk to me.
declaring a auto_prepend_file in my .htaccess would do a similar job, yet it does not create any MIME-type problems or security issues. I couldn't really get it to work on my server, probably deactivated by my server-host.
Do you have any additional idea? I assume this is a common problem, so there should be a solution for it. Thanks in advance!
To keep it consistent (not have one login via php and one via basic auth) you'll need to run all your assets through php. However I highly recommend against this from several performance reasons:
Incurrs lot more to latency server files via PHP vs. the server daemon (nginx/apache)
Add unnecessary load on server CPU and memory
Wastes time locking up processes that could be use to serve up more requests
You'll never be able to use a CDN with this logged-in only requirement for
I think the main suggestion in my answer is to rethink what you're doing that requires you're client to be logged in to access CSS and JS assets. Are you putting passwords in the JS or something? If so, I recommend deeper evaluation of your architecture over passing all assets through PHP.

How to implement hit counter for online video

I have placed a video file (mp4) on a Apache server which will be accessed from a Android Application. I need to know how many times did the video have been viewed. The solutions I can think of are
View the Apache logs. But I have very limited access to them.
Call a PHP file then redirect to video file.
Any other better solutions apart from above two?
The third option is to have a PHP file which will register the download and then deliver the file by reading it and sending it to the client.
(See http://www.gayadesign.com/diy/download-counter-in-php-using-htaccess/)
Performance-wise this is somewhat worse than either the logs / redirect methods, but it is the most reliable, as the only way a client can access the file is via the PHP script. Furthermore, you can do this without any access to logs (it is Apache-independent). You also have more control (e.g. you can count download only once per IP), but then again, the other methods allow that too, with some modifications. I am not sure if there is any other way to do it effectively besides the two you've listed and the one I suggest, maybe there is a way with PHP / Apache extensions, I am just not aware of it.
So either go with the redirect or this.

Replace AuthUserFile with custom PHP script in .htaccess

I've been using HTTP authentication through .htaccess files every time I've needed quick and dirty password protection for a complete directory (most of the times, in order to hide third-party apps I install for private use). Now I've written some PHP code to replace local passwords with OpenID. That allows me to get rid of HTTP auth in my PHP sites. However, I'm still trying to figure out a trick I can use in non-PHP stuff (from third-party programs to random stuff).
Apache does not seem to support authentication with custom scripts by default (whatever I do, it should work in my hosting provider). That leaves the obvious solution of using mod_rewrite to route everything though a PHP script that checks credentials and reads the target file but 1) it looks like a performance killer 2) it will interfere with dynamic stuff, such as other PHP scripts.
I'm wondering whether there's a way to tune up the router approach so the script does not need to send the file, or if I'm overlooking some other approach. Any idea?
I think your mod_rewrite approach would be the only way to do this - but instead of using readfile() (as I guess you are, based on what you say about it will interfere with dynamic stuff, such as other PHP scripts) you can just include() them, so that raw files are written straight to output and PHP code is executed.
You may use PHP HTTP-AUTH http://php.net/manual/en/features.http-auth.php
If OpenID is all what you need consider usage of mod_auth_openid for apache

Routing .htaccess to GitHub

I was wondering if there was a way to basically host a site on your server so you can run PHP, but have the actual code hosted on GitHub. In other words...
If a HTTP request went to:
http://mysite.com/docs.html
It'd request and pull in the content (via file_get_contents() or something):
https://raw.github.com/OscarGodson/Core.js/master/docs.html
Or, if they went to:
http://mysite.com/somedir/another/core.js
It'd pull down:
https://raw.github.com/OscarGodson/Core.js/master/somedir/another/core.js
I know GitHub has their own DNS servers, but id rather host it on my so i can run server side code. What would the htaccess code look like for this?
This is beyond the capabilities of .htaccess files, if the requirement is to run the PHP embedded in the HTML stored on github.com at the server on yourserver.com simply by a configuration line like a redirect in the .htaccess file.
A .htaccess file is typically used to provide directives to the Apache web server. These directives can indicate, for example, access permissions, popup password protection, linkages between URLs and the server's file system, handlers for certain types of files when fetched by the server before delivery to the browser, and redirects from one URL to another URL.
An .htaccess file can issue redirects for http://mysite.com/somedir/another/core.js to https://raw.github.com.... but then the browser will be pointed to raw.github.com, not mysite.com. Tricks can be done with frames to make this redirection less transparent to the human at the browser... but these dont affect the fact that the data comes from github.com without ever going to the server at mysite.com
In particular, PHP tags embedded in the HTML on github.com are never received by mysite.com's server and therefore will not run. Probably not want you want. Unless some big changes have occurred in Apache, .htaccess files will not set up that workflow. It might be possible for some expert to write an apache module to do it, but I am not sure.
What you can do is put a cron job on mysite.com that git pull's from github.com every few minutes. Perhaps that is what you want to do instead?
If the server can run PHP code, you can do this.
Basically, in the .htaccess file you use a RewriteRule to send all paths to a PHP script on your server. For example, a request for /somedir/anotherdir/core.js becomes /my-script.php/somedir/anotherdir/core.js. This is how a lot of app frameworks operate. When my-script.php runs the "real" path is in the PATH_INFO variable.
From that point the script could then fetch the file from GitHub. If it was HTML or JavaScript or an image, it could just pass it along to the client. (To do things properly, though, you'll want to pass along all the right headers, too, like ETag and Last-Modified and then also check those files, so that caching works properly and you don't spend a lot of time transferring files that don't need to be transferred again and again. Otherwise your site will be really slow.)
If the file is a PHP file, you could download it locally, then include it into the script in order to execute it. In this case, though, you need to make sure that every PHP file is self-contained, because you don't know which files have been fetched from GitHub yet, so if one file includes another you need to make sure the files dependent on the first file are downloaded, too. And the files dependent on those files, also.
So, in short, the .htaccess part of this is really simple, it's just a single RewriteRule. The complexity is in the PHP script that fetches files from GitHub. And if you just do the simplest thing possible, your site might not work, or it will work but really painfully slowly. And if you do a ton of genius level work on that script, you could make it run OK.
Now, what is the goal here? To save yourself the trouble of logging into the server and typing git pull to update the server files? I hope I've convinced you that trying to fetch files on demand from GitHub will be even more trouble than that.

How can I protect my site from being leeched?

I am using the header function of PHP
to send the file to the browser with some small code. Its work well
and I have it so that if any one requests it with a referer other than my site
it redirects to a page first.
Unfortunately it's not working with the internet download manager.
What I want to know is how the rabidshare and 4shared sites do this.
You could use sessions to make sure the download is being requested by a valid user.
Not all browsers / softwares that can see web pages will send a Referer to your server. Some sites will make a browser "fingerprint", usually hashed, which might be Referer, User-Agent and a couple of other headers strung together to make a uniquie identifier for that user and thus restrict access as you describe.
Of course, I may have completely missed the point of your post!
A typical design pattern is using a front controller to have a single entry point for all requests. By having a front controller, you can control exactly what the client sees.
You can configure this in Apache so that all requests go through a single file (it's been a while since I've done this because I now concentrate on Java). I think you would need to look at pathinfo documentation for Apache.
This might require a significant change in the rest of your application code. But, the code will be more secure and maintainable in the long run.
I've served images and other binary files through this pattern. This allowed me to easily verify users were authenticated before actually sending them the file. Obfuscation is not security, so if you rely on obfuscating your URL, an attacker may be delayed in getting in, but it is just a matter of time.
Walter
The problem probably is that sending file through php script (with headers you mentioned) doesn't support starting file download at certain position. Download managers use this feature to download file using several simultaneous threads (assuming server gives one thread at certain speed).
For small project I would recommend making a copy of file with unique filename just for download time and redirecting user to this copied file. This way he gets full server download features and it also doesn't load processor as php does. Disadvantages - more disk space required and need to cleanup download directory.

Categories