Apache Reverse Proxy - run a script first

Apache Reverse Proxy - run a script first - php

I'm experimenting with some video content delivery using VLC and Apache Reverse Proxy. Since VLC can support http streaming, I'm sure that it will work with a Apache Reverse Proxy (I haven't tried this yet, but I don't see why it wouldn't work).
Before letting Apache proxy the http video stream, I would like to run a script first. Is there an option in Apache to do this?
If not, can someone think of a way for PHP to do some magic first, and then somehow redirect to the http video stream, without making a VLC or Windows Media Player client cry? By doing it this way, the Apache Reverse Proxy would just have to point to the PHP script only.
Either way, the idea of the script it to start the VLC streaming server.
Thanks

if you really want to do it in apache you can always write your own module :)
alternatively you can use mod_rewrite with the prg option (rewrite map).
where you basically have a rewrite rule processed by an external program.
you can do whatever you want there (logging, etc).
don't forget to set a rewritelock file, or you will experience strange behaviour.
you could also do "everything" in php and then use the apache module mod_xsendfile where you just pass a header in php containing the locatin of the file in the filesystem.
it will not be disclosed to the client but catched by the apache module and served by apache. your php process will terminate regularely.
theese are the best out of the box options i can think of.
if nothing of this works because you need to catch some stuff during or at the end of the transfer you could just echo the files content with php. with correct output buffering you can achieve accetable performance on that.
or you could do some logfile postprocessing if this solves your problem.

Related

How do I set up my server to work with PHP

I want to use PHP with a server that i've coded my self. (Not Apache etc) I guess I have to send the http request and some additional data from my server to php, but I dont know how to make that connection and how to format that message.
I know I can run scripts with php.exe, the problem is that way it won't work with php sessions.

If you Google for the CGI specification, this will give you a reasonable idea of what environment variables to set up. Usually you exec the php process with the script filename as an argument, supplying it with the new environment variables. Are you familiar with fork, exec* system calls?
It's a little more complicated than that, since it involves fiddling with standard input, output and error streams. To give you an idea, check out the thttpd source code, in the file libhttpd.c, function cgi_child.
If your server source code is not in C or C++, you will have to dig around for spawning a child process to handle the PHP script being called in your server, and send the output to the browser. Some of that output will be HTTP headers, and it may stop executing after that (e.g. HTTP 2xx no content, HTTP 3xx redirect) in which case you send that back to the browser with \r\n\r\n to terminate the output.
To send the information to the PHP process, just set up the environment and possibly argument variables unless the process reads the environment variable SCRIPT_FILENAME you set up (see CGI environment variables). Change into the directory where the binary is or into the document root, whichever makes sense, and before spawning, handle incoming data (e.g. from POST request) and let the spawned process read it from stdin (from php-fpm, it probably reads from the socket you set up, but not entirely sure, so that's left as an exerciese).
It might be easier to run or spawn php-cgi which is php-fpm, which listens on a default or specified address and port. Run php-fpm -h for options and give it a whirl. This is definitely the way to go for session support I think. Also make sure the process knows where to look for the php.ini file.
As well as the CGI spec, it would be helpful to read all about HTTP. Or at least a good chunk of it.

Is it possible to return a HTTP request to apache after php processing?

What I want to do is the following:
A User initates a HTTP GET request for a static file (for example an image or an mp3 file for HTML5 audio)
A PHP script intercepts these requests (for example via mod_rewrite) and checks the cookie for valid authentification
If successful, the file is delivered to the User, or the User gets a 401
Now, it would be perfect and most simple if the PHP script could "pass through" a successful request and return control to Apache, so that Apache can deliver the file as if it was the original request. Since we have also caching and the need to deliver audio files (including chunked requests) this would free the php script from manually creating the response headers etc.
Is this possible at all?
And if not, what would be the easiest, simplest solution to achieve this, so that caching and (chunked) requests for audio files still works as intended?

I think what you want is X-Sendfile.
Indeed, you can use mod_rewrite to let PHP deal with whatever-mime files, and then return an error or the wanted file via the X-Sendfile header (note that you’ll have to activate the eponymous module within apache).
Didn’t test it though. It seems to be a pretty important issue. OwnCloud, for example, still use PHP to serve files...

Count downloads without `echo file_get_contents($file)`?

I am now having download links on my server that directly points to files. I have a set of quite complicated rewrite rules but they don't affect what I am asking for.
What I want to do is to count the number of downloads. I know I could write a PHP script to echo the content and with a rewrite rule so that the PHP script will process all downloads.
However, there are a few points that I am worried about:
There is a chance that some dangerous paths (e.g. /etc/passwd, ../../index.php) will not be blocked due to carelessness or unnoticed bugs
Need to handle HTTP 404 Not Found response (and others) in the script which I prefer letting Apache handle them (I have an error handler script that rely on server redirect variables)
HTTP headers (like content type or modified time) may not be correctly set
Using a PHP script doesn't usually allow HTTP 304 Unmodified response so that browser caching will be useless, and re-download can consume extra bandwidth Actually I can check for that, but would require some more coding and debugging.
PHP script uses more processing power than directly loading the file directly by Apache
So, I would like to find some other ways to perform statistics. Can I, for example, make Apache trigger a script when certain files (in certain directories) are being requested and downloaded?

This may not be quite what you're looking for, but in the spirit of using the right tool for the job you could easily use Google Analytics (or probably any other analytics package) to track this. Take a look at https://support.google.com/analytics/bin/answer.py?hl=en-GB&answer=1136922.
Edit:
It would require the ability to modify the vhost setup for your site, but you could create a separate apache log file for your downloads. Let's say you've got a downloads folder to store the files that are available for download, you could add something like this to your vhost:
SetEnvIf Request_URI "^/downloads/.+$" download
LogFormat "%U" download-log
CustomLog download-tracking.log download-log env=download
Now, any time something is requested from the /downloads/ folder, it will be logged in the download-tracking.log file.
A few things to know:
You can have as many SentEnvIf lines as you need. As long as they all set the download environment variable, the request will be logged to the CustomLog
The LogFormat I've shown will log only the URI requested, but you can easily customize that to log much more than just the URI, see http://httpd.apache.org/docs/2.2/mod/mod_log_config.html#logformat for more details.
If you're providing PDF files, be aware that some browsers/plugins will make a separate request for each page of the PDF so you would need to account for that when you read the logs.
The primary benefit of this method is that it does not require any coding, just a simple config change and you're ready to go. The downside, of course, is that you'd have to do some kind of log processing. It just depends what is most important to you.
Another option would be to use a PHP script and the readfile function. This makes it much easier to log requests to a database, but it does come with the other issues you mentioned earlier.
There are ways to pipe Apache logs to MySQL, but from what I've seen it can be tricky. Depending on what you're doing, it may be worth the effort... but then again it might not.

You can parse the Apache log files.

Apaches mod_lua probably is the most general, flexible and effective approach to hooking own code into the request processing inside apache. Usually you chose that language for the task that offers the most direct approach. And lua is much better in teracting with c/c++ than anything else.
However there certainly are other strategies, so be creative. Two things come to my mind immediately:
some creative use of PAM if you are under some sort of unix like system: configure some kind of dummy authentication requirement and setup PAM for processing. Inside the PAM configuration you can do whatever you like. The avantage: you get requests and can filter yourself what to count and what not. You have to make sure the PAM response does not create a valid session though, so that you really get a tick for each request done by a client, not only the first one.
there are other apache modules that allow to do request processing. Have a look at the forensic module or the external filter module. Both allow to hook external logic into request processing. You will need cli based php configured for that.

PHP redirect file post stream

I'm writing an API using php to wrap a website functionality and returning everything in json\xml. I've been using curl and so far it's working great.
The website has a standard file upload Post that accepts file(s) up to 1GB.
So the problem is how to redirect the file upload stream to the correspondent website?
I could download the file and after that upload it, but I'm limited by my server to just 20MG. And it seems a poor solution.
Is it even possible to control the stream and redirect it directly to the website?

I preserverd original at the bottom for posterity, but as it turns out, there is a way to do this
What you need to use is a combination of HTTP put method (which unfortunately isn't available in native browser forms), the PHP php://input wrapper, and a streaming php Socket. This gets around several limitations - PHP disallows php://input for post data, but it does nothing with regards to PUT filedata - clever!
If you're going to attempt this with apache, you're going to need mod_actions installed an activated. You're also going to need to specify a PUT script with the Script directive in your virtualhost/.htaccess.
http://blog.haraldkraft.de/2009/08/invalid-command-script-in-apache-configuration/
This allows put methods only for one url endpoint. This is the file that will open the socket and forward its data elsewhere. In my example case below, this is just index.php
I've prepared a boilerplate example of this using the python requests module as the client sending the put request with an image file. If you run the remote_server.py it will open a service that just listens on a port and awaits the forwarded message from php. put.py sends the actual put request to PHP. You're going to need to set the hosts put.py and index.php to the ones you define in your virtual host depending on your setup.
Running put.py will open the included image file, send it to your php virtual host, which will, in turn, open a socket and stream the received data to the python pseudo-service and print it to stdout in the terminal. Streaming PHP forwarder!
There's nothing stopping you from using any remote service that listens on a TCP port in the same way, in another language entirely. The client could be rewritten the same way, so long as it can send a PUT request.
The complete example is here:
https://github.com/DeaconDesperado/php-streamer
I actually had a lot of fun with this problem. Please let me know how it works and we can patch it together.
Begin original answer
There is no native way in php to pass a file asynchronously as it comes in with the request body without saving its state down to disc in some manner. This means you are hard bound by the memory limit on your server (20MB). The manner in which the $_FILES superglobal is initialized after the request is received depends upon this, as it will attempt to migrate that multipart data to a tmp directory.
Something similar can be acheived with the use of sockets, as this will circumvent the HTTP protocol at least, but if the file is passed in the HTTP request, php is still going to attempt to save it statefully in memory before it does anything at all with it. You'd have the tail end of the process set up with no practical way of getting that far.
There is the Stream library comes close, but still relies on reading the file out of memory on the server side - it's got to already be there.
What you are describing is a little bit outside of the HTTP protocol, especially since the request body is so large. HTTP is a request/response based mechanism, and one depends upon the other... it's very difficult to accomplish a in-place, streaming upload at an intermediary point since this would imply some protocol that uploads while the bits are streamed in.
One might argue this is more a limitation of HTTP than PHP, and since PHP is designed expressedly with the HTTP protocol in mind, you are moving about outside its comfort zone.
Deployments like this are regularly attempted with high success using other scripting languages (Twisted in Python for example, lot of people are getting onboard with NodeJS for its concurrent design patter, there are alternatives in Ruby or Java that I know much less about.)

Routing .htaccess to GitHub

I was wondering if there was a way to basically host a site on your server so you can run PHP, but have the actual code hosted on GitHub. In other words...
If a HTTP request went to:
http://mysite.com/docs.html
It'd request and pull in the content (via file_get_contents() or something):
https://raw.github.com/OscarGodson/Core.js/master/docs.html
Or, if they went to:
http://mysite.com/somedir/another/core.js
It'd pull down:
https://raw.github.com/OscarGodson/Core.js/master/somedir/another/core.js
I know GitHub has their own DNS servers, but id rather host it on my so i can run server side code. What would the htaccess code look like for this?

This is beyond the capabilities of .htaccess files, if the requirement is to run the PHP embedded in the HTML stored on github.com at the server on yourserver.com simply by a configuration line like a redirect in the .htaccess file.
A .htaccess file is typically used to provide directives to the Apache web server. These directives can indicate, for example, access permissions, popup password protection, linkages between URLs and the server's file system, handlers for certain types of files when fetched by the server before delivery to the browser, and redirects from one URL to another URL.
An .htaccess file can issue redirects for http://mysite.com/somedir/another/core.js to https://raw.github.com.... but then the browser will be pointed to raw.github.com, not mysite.com. Tricks can be done with frames to make this redirection less transparent to the human at the browser... but these dont affect the fact that the data comes from github.com without ever going to the server at mysite.com
In particular, PHP tags embedded in the HTML on github.com are never received by mysite.com's server and therefore will not run. Probably not want you want. Unless some big changes have occurred in Apache, .htaccess files will not set up that workflow. It might be possible for some expert to write an apache module to do it, but I am not sure.
What you can do is put a cron job on mysite.com that git pull's from github.com every few minutes. Perhaps that is what you want to do instead?

If the server can run PHP code, you can do this.
Basically, in the .htaccess file you use a RewriteRule to send all paths to a PHP script on your server. For example, a request for /somedir/anotherdir/core.js becomes /my-script.php/somedir/anotherdir/core.js. This is how a lot of app frameworks operate. When my-script.php runs the "real" path is in the PATH_INFO variable.
From that point the script could then fetch the file from GitHub. If it was HTML or JavaScript or an image, it could just pass it along to the client. (To do things properly, though, you'll want to pass along all the right headers, too, like ETag and Last-Modified and then also check those files, so that caching works properly and you don't spend a lot of time transferring files that don't need to be transferred again and again. Otherwise your site will be really slow.)
If the file is a PHP file, you could download it locally, then include it into the script in order to execute it. In this case, though, you need to make sure that every PHP file is self-contained, because you don't know which files have been fetched from GitHub yet, so if one file includes another you need to make sure the files dependent on the first file are downloaded, too. And the files dependent on those files, also.
So, in short, the .htaccess part of this is really simple, it's just a single RewriteRule. The complexity is in the PHP script that fetches files from GitHub. And if you just do the simplest thing possible, your site might not work, or it will work but really painfully slowly. And if you do a ton of genius level work on that script, you could make it run OK.
Now, what is the goal here? To save yourself the trouble of logging into the server and typing git pull to update the server files? I hope I've convinced you that trying to fetch files on demand from GitHub will be even more trouble than that.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.