Apache, PHP caching - php

A have setup an internal proxy kind of thing using Curl and PHP. The setup is like this:
The proxy server is a rather cheap VPS (which has slow disk i/o at times). All requests to this server are handled by a single index.php script. The index.php fetches data from another, fast server and displays to the user.
The data transfer between the two servers is very fast and the bottleneck is only the disk i/o on the proxy server. Since there is only one index.php - I want to know
1) How do I ensure that index.php is permanently "cahced" in Apache on the proxy server? (Googling for php cache, I found many custom solutions that will cache the "data" output by php I want to know if there are any pre-build modules in apache that will cache the php-script itself?).
2) Is the data fetched from the backend server alway in the RAM/cache on the proxy server? (assuming there is enough memory)
3) Does apache read any config files or other files from disk when handling requests?
4) Does apache wait for logs to be written to disk before serving the content - if so I will disable logging on the proxy server (or is there way to ensure content is first served without waiting for logs to be written).?
Basically, I want to eliminate disk i/o all together on the 'proxy' server.
Thanks,
JP

1) Install APC (http://pecl.php.net/apc), this will compile your PHP script once and keep it in shared memory for the lifetime of the webserver process (or a given TTL).
2) If your script fetches data and does not cache/store it on the filesystem, it will be in RAM, yes. But only for the duration of the request. PHP uses a 'share-nothing' strategy which means -all- memory is released after a request. If you do cache data on the filesystem, consider using memcached (http://memcached.org/) instead to bypass file i/o.
3) If you have .htaccess support activated, Apache will search for those in each path leading to your php file. See Why can't I disable .htaccess in Apache? for more info.
4) Not 100% sure, but it probably does wait.

Why not use something like Varnish which is explicitly built for this type of task and does not carry the overhead of Apache?

I would recommend "tinyproxy" for this puprose.
Does everything you want very efficeintly.

Related

Is Memcached caching static files like images, css, js..?

I've just installed Memcached and I'd like to know if Memcached can cache images, js, css, font files, etc. on my server or it only works with scripting language?
Is it caching automatically or it have to be configured?
If not, how can I cache static files using PHP (exactly like variables values?)?
No, it doesn't, but there is also zero need to do this on a properly configured server: often accessed files on servers will be in the cache already / in memory buffers, and especially if they're static and you server has enough memory, will stay there for quite a while. Trying to serve them with Memcache will create MORE overhead, not less.
Your best option may be to use a caching layer like nginx for HTTP traffic (either as a proxy for apache or as the primary HTTP server). If you just want a proxy, Varnish is also a decent choice.
If you're stuck using Apache, here is a starting point for getting memory-based caching working: http://httpd.apache.org/docs/2.2/caching.html#inmemory
Also, you may want to look more into setting cache headers on your files so that multiple requests by the same users will not mean more file and network IO. This could be a bigger savings than explicitly caching things in memory as Linux will do some of that work for you.
convert it to string and save it in memcache
<?php
file_get_contents("/path/to/image.jpg");
?>
http://php.net/manual/en/book.memcached.php there is docs.
You can save binary data, but efficient way to store complex data generated, like DB-results

Flushing the HTML document early - with ini_set( 'zlib.output_compression','ON')?

My goal is to early flush the header part of my website while my php script is stitching the rest of the page together and sends it once its done. Important is that the chunks are sent compressed to the browser. (I am using Apache/2.2 and PHP/5.3.4)
Right now I am trying to achieve this by calling ini_set("zlib.output_compression", "On") in my PHP script. But if I use flush() anywhere in my script even at the end the compression won't work anymore.
Questions are:
a) By using this method zlib will flush the output buffer and send the compressed chunk to the browser once the size of this output buffer is reached?
b) If so is there any way to fine control when zlip will send my chunk not by just setting the internal buffer size of zlib? Default is 4KB.
c) Are there any good alternatives to achieve an early compressed flush maybe with more fine control regarding the time when I want to flush it? Maybe I am totally on the wrong path :)
It's been a LONG time since i had to use zlib compression on OB (more on why later). However, let me try and convince you to turn OFF zlib compression on OB in PHP. First of all, a little background to ensure we are on the same page.
HOW DOES OB WORK
Everytime php prints something, without OB it would be sent straight to apache and from apache to the browser. Instead, with OB, the output stops at apache and waits until the data is flushed (to the browser) or until the script ends and the data is flushed automatically. This saves quite a lot of time and resources when generating a page by buffering the Apache to Web Browser stage of the process.
WHY NOT TO USE OB COMPRESSION IN PHP
Why would you make PHP compress it? It should be the servers job to do such tasks (as well as compress js files for example). What you "should" do to drastically free apache to process php is to install NGINX as a front to the public. It's VERY easy to setup as a reverse proxy and you can even install it on the SAME server as php and apache.
So set NGINX on port 80, put apache on say 8080 (and only allow nginx to connect, but don't worry if you leave it public for a little time as it was already public and great for debugging to bypass nginx so no security issues should rise - but i recommend you don't leave it public for to long). Then make nginx reverse proxy to apache, cache all static files which offloads that from apache (because nginx serves them instead) meaning apache can do more php requests, and also get nginx to perform OUTPUT COMPRESSION ;) freeing up apache and php to do even more requests. As an added benefit, nginx can also serve static files much faster than Apache and Nginx also uses much less ram and can handle much more connections.
Even an nginx newbie could get nginx setup after reading a few tutorials online and complete everything i just said within 1 day. 1 day well spent as well.
Remember to KEEP output buffering ON however for PHP to Apache but turn zlib compression OFF on PHP and enable it instead on nginx.

PHP request file caching in load-balanced server environment

I'm looking to write a basic PHP file caching driver in a PHP application that routes all traffic to a front controller. For example's sake, assume the following simplified setup using apache mod_proxy_balancer:
In a single-server environment I would cache request responses on disk in a directory structure matching the request URI. Then, simple apache rewrite rules like the following could allow apache to return static cache files (if they exist) and avoid the PHP process altogether:
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /front_controller.php [L]
Obviously, this is problematic in a load-balanced environment because the cache file would only be written to disk on the specific PHP server where the request was served up and the results cached.
Solving the issue ...
So, to solve this problem, I figured I could knock out some code to have the individual back-end PHP servers write/delete cache data to the load balancer. However, being mostly ignorant as to the capabilities of mod_proxy_balancer (and any other load balancing options, really), I need some outside verification for the following questions:
And the questions ...
Is it possible to do some form of checking like the above RewriteRules to have the front-facing load balancer serve up a static file before sending off requests to one of the backend servers?
Is this even advisable? Should the load balancer be allowed to route traffic exclusively and not be bothered with serving up static content?
Would it be better to just use an acceptable TTL on the cached files at the PHP server level and deal with an accepted level of stale cache overlap?
Finally, apologies if this is is too broad or has already been answered; I'm not really sure what to search for as a result of my aforementioned ignorance on the load-balancing subject.
For the simplest solution, you can use NFS. Mount a file system via NFS on all of the PHP servers and it acts like local storage, but is the same for all servers. To get a little more sophisticated, use something like Nginx or Varnish that can cache what is on the NFS file system.
Using memcache is also a viable alternative, which is a distributed memory based storage system. The nice thing about memcache is that you don't need to manage cache clearing or purging if you don't want to. You can set TTL for each cached item, or if memcache gets full, it automatically purges cached items.
This sounds like something Nginx could do easily, and would remove the need to write to files on disk.
Nginx can do the load balancing and caching, here's a tutorial on it:
http://nathanvangheem.com/news/nginx-with-built-in-load-balancing-and-caching

Load balancing and APC

I am interested in a scenario where webservers serving a PHP application is set up with a load balancer.
There will be multiple webservers with APC behind the load balancer. All requests will have to go through the load balancer, which then sends it to one of the web servers to process.
I understand that memcached should be used for distributed caching, but I think having the APC cache on each machine cache things like application configurations and other objects that will NOT be different across any of the servers would yield even better performance.
There is also an administrator area for this application. It is also accessed via the load balancer (for example, site.com/admin). In a case like this, how can I call apc_clear_cache to clear the APC object cache on ALL servers?
Externally in your network you have a public IP you use to route all your requests to your load balancer that distributes load round robin so outside you cannot make a request to clear your cache on each server one at a time because you don't know which one is being used at any given time. However, within your network, each machine has its own internal IP and can be called directly. Knowing this you can do some funny/weird things that do work externally.
A solution I like is to be able to hit a single URL and get everything done such as http://www.mywebsite/clearcache.php or something like that. If you like that as well, read on. Remember you can have this authenticated if you like so your admin can hit this or however you protect it.
You could create logic where you can externally make one request to clear your cache on all servers. Whichever server receives the request to clear cache will have the same logic to talk to all servers to clear their cache. This sounds weird and a bit frankenstein but here goes the logic assuming we have 3 servers with IPs 10.232.12.1, 10.232.12.2, 10.232.12.3 internally:
1) All servers would have two files called "initiate_clear_cache.php" and "clear_cache.php" that would be the same copies for all servers.
2) "initiate_clear_cache.php" would do a file_get_contents for each machine in the network calling "clear_cache.php" which would include itself
for example:
file_get_contents('http://10.232.12.1/clear_cache.php');
file_get_contents('http://10.232.12.2/clear_cache.php');
file_get_contents('http://10.232.12.3/clear_cache.php');
3) The file called "clear_cache.php" is actually doing the cache clearing for its respective machine.
4) You only need to make a single request now such as http://www.mywebsite/initial_clear_cache.php and you are done.
Let me know if this works for you. I've done this in .NET and Node.js similar but haven't tried this in PHP yet but I'm sure the concept is the same. :)

Reduce PHP IO load

I have a PHP script that serves alot of smaller files (>100,000) with sizes up to 10mb. It basically loads the requested file into memory and serves it to the client. Because of access control I cannot serve these files by apache directly and need a script wrapped around it.
If there is high traffic (>150mbit) my hdd is heavily used and represents a limit for scaling. I had the idea that I could use memcached to reduce the hdd load since I have 10gig of ram available but memcached has a max item size of 1MB. Then I thought I could use PHP-APC but its behaviour if the cache runs out of memory (complete reset) isn't acceptable.
What would you do to reduce the IO load?
Thanks
What would you do to reduce the IO load?
I have never worked with it myself, but the X-Sendfile method may be helpful for taking away a bit of the load. It passes the task of actually serving the file back to Apache.
I think you can't do this unless you have 2 HDD which would split these files.
I would use PHP-APC to load these files into the cache.
apc_add(), apc_fetch() and apc_delete() are what you want. You can ensure you don't overflow by using apc_cache_info() to determine free memory levels. You can also set apc.user_ttl INI setting to prevent total cache clearing on fill.
Set things up on a test server, subject it to high load (with ab or the like) and check your stats using apc.php. Tweak, tweak and tweak some more!
You could use a CDN that supports access control.
If you want to continue serving it yourself, though, there are various approaches you could take. You'll always want to avoid serving the file through PHP though, because that is your bottleneck. None of these are very elegant though.
Store the files outside of the HTTP root and generate a new symlink every X minutes. The symlinks are deleted after Y time. Your PHP authentication script would then simply redirect the user to the URL in which the (temporarily valid) symlink exists. Very short PHP execution time, files served by Apache.
Keep the files inside the HTTP root, but change the rewrite rules in a .htacess file instead to achieve the same thing.
Reduce IO load by storing the most frequently accessed files in a ramdisk, integrate them with the regular file system by some mounting magic or symlinks and then let Apache handle the rest.
You need either mod_xsendfile for Apache2 or nginx with X-Accel-Redirect. There is also similar a solution for lighttpd. Nginx can also serve from memcached.
If you're thinking about storing frequently used files in tmpfs, don't. That's not a real solution because even if you serve files right from the disk subsequent requests will hit the system cache and you'll get similar to tmpfs speed.

Categories