Reduce PHP IO load - php

I have a PHP script that serves alot of smaller files (>100,000) with sizes up to 10mb. It basically loads the requested file into memory and serves it to the client. Because of access control I cannot serve these files by apache directly and need a script wrapped around it.
If there is high traffic (>150mbit) my hdd is heavily used and represents a limit for scaling. I had the idea that I could use memcached to reduce the hdd load since I have 10gig of ram available but memcached has a max item size of 1MB. Then I thought I could use PHP-APC but its behaviour if the cache runs out of memory (complete reset) isn't acceptable.
What would you do to reduce the IO load?
Thanks

What would you do to reduce the IO load?
I have never worked with it myself, but the X-Sendfile method may be helpful for taking away a bit of the load. It passes the task of actually serving the file back to Apache.

I think you can't do this unless you have 2 HDD which would split these files.

I would use PHP-APC to load these files into the cache.
apc_add(), apc_fetch() and apc_delete() are what you want. You can ensure you don't overflow by using apc_cache_info() to determine free memory levels. You can also set apc.user_ttl INI setting to prevent total cache clearing on fill.
Set things up on a test server, subject it to high load (with ab or the like) and check your stats using apc.php. Tweak, tweak and tweak some more!

You could use a CDN that supports access control.
If you want to continue serving it yourself, though, there are various approaches you could take. You'll always want to avoid serving the file through PHP though, because that is your bottleneck. None of these are very elegant though.
Store the files outside of the HTTP root and generate a new symlink every X minutes. The symlinks are deleted after Y time. Your PHP authentication script would then simply redirect the user to the URL in which the (temporarily valid) symlink exists. Very short PHP execution time, files served by Apache.
Keep the files inside the HTTP root, but change the rewrite rules in a .htacess file instead to achieve the same thing.
Reduce IO load by storing the most frequently accessed files in a ramdisk, integrate them with the regular file system by some mounting magic or symlinks and then let Apache handle the rest.

You need either mod_xsendfile for Apache2 or nginx with X-Accel-Redirect. There is also similar a solution for lighttpd. Nginx can also serve from memcached.
If you're thinking about storing frequently used files in tmpfs, don't. That's not a real solution because even if you serve files right from the disk subsequent requests will hit the system cache and you'll get similar to tmpfs speed.

Related

In an image host script, how can I measure an image's bandwidth consumption?

I'm enhancing a PHP image hosting script, but I'm facing a problem: the client has requested a function to limit the amount of used bandwidth, for each image, to 100MB / day
This includes thumbnails
Up to now, thumbnails have been served directly without using PHP (with nginx)
I've tried using a php script instead of a direct .jpg, with a VERY heavily cached code (using APC's cache to store the amount of bandwidth used, and serving the image with the X-Sendfile header), but the server just chokes due to the very,very high amount of connections
Is there a more efficient way to measure bandwidth used by thumbnails without choking the CPU and RAM?
How is this problem usually tackled?
Thanks
Parse the server access logs. I believe in nginx they are at /var/log/nginx/access.log . You'll likely need to change permissions to allow access, or you could use cron to regularly copy that file into an accessible folder if you aren't comfortable doing that.
I guess that if you cached results, and only checked once an hour or something then you could do this using grep -c.
I'd be interested to know, if you pick a file you know has been accessed, how long does it take to run:
grep -c filename.jpg /var/log/nginx.access.log
You could then simply multiply that count by the filesize to get the bandwidth for that image.

Choosing a PHP caching technique: output caching into files vs. opcode caching

I've heard of two caching techniques for the PHP code:
When a PHP script generates output it stores it into local files. When the script is called again it check whether the file with previous output exists and if true returns the content of this file. It's mostly done with playing around the "output buffer". Somthing like this is described in this article.
Using a kind of opcode caching plugin, where the compiled PHP code is stored in memory. The most popular of this one is APC, also eAccelerator.
Now the question is whether it make any sense to use both of the techniques or just use one of them. I think that the first method is a bit complicated and time consuming in the implementation, when the second one seem to be a simple one where you just need to install the module.
I use PHP 5.3 (PHP-FPM) on Ubuntu/Debian.
BTW, are there any other methods to cache PHP code or output, which I didn't mention here? Are they worth considering?
You should always have an opcode cache like APC. Its purpose is to speed up the parsing of your code, and will be bundled into PHP in a future version. For now, it's a simple install on any server and doesn't require you write or change any code.
However, caching opcodes doesn't do anything to speed up the actual execution of your code. Your bottlenecks are usually time spent talking to databases or reading to/from disk. Caching the output of your program avoids unnecessary resource usage and can speed up responses by orders of magnitude.
You can do output caching many different ways at many different places along your stack. The first place you can do it is in your own code, as you suggested, by buffering output, writing it to a file, and reading from that file on subsequent requests.
That still requires executing your PHP code on each request, though. You can cache output at the web server level to skip that as well. Crafting a set of mod_rewrite rules will allow Apache to serve the static files instead of the PHP code when they exist, but you'll have to regenerate the cached versions manually or with a scheduled task, since your PHP code won't be running on each request to do so.
You can also stick a proxy in front of your web server and use that to cache output. Varnish is a popular choice these days and can serve hundreds of times more request per second with caching than Apache running your PHP script on the same server. The cache is created and configured at the proxy level, so when it expires, the request passes through to your script which runs as it normally would to generate the new version of the page.
You know, for me, optcache , filecache .. etc only use for reduce database calls.
They can't speed up your code. However, they improve the page load by using cache to serve your visitors.
With me, APC is good enough for VPS or Dedicated Server when I need to cache widgets, $object to save my mySQL Server.
If I have more than 2 Servers, I like to used Memcache , they are good on using memory to cache. However it is up to you, not everyone like memcached, and not everyone like APC.
For caching whole web page, I ran a lot of wordpress, and I used APC, Memcache, Filecache on some Cache Plugins like W3Total Cache. And I see ( my own exp ): Filecache is good for caching whole website, memory cache is good for caching $object
Filecache will increase your CPU if your hard drive is slow, and Memory cache is terrible if you don't have enough memory on your VPS.
An SSD HDD will be super good speed to read / write file, but Memory is always faster. However, Human can't see what is difference between these speed. You only pick one method base on your project and your server ( RAM, HDD ) or are you on a shared web hosting?
If I am on a shared hosting, without root permission, without php.ini, I like to use phpFastCache, it a simple file cache method with set, get, stats, delete only.
In Addition, I like to use .htaccess to cache static files like images, js, css or by html headers. They will help visitors speed up your page, and save your server bandwidth.
And If you can use .htaccess to redirect to static .html cache if you cache whole page is a great thing.
In future, APC or some Optcache will be bundle into PHP version, but I am sure all the cache can't speed up your code, they use to:
Reduce Database / Query calls.
Improve the speed of page load by use cache to serve.
Save your API Transactions ( like Bing ) or cURL request...
etc...
A lot of times, when it comes to PHP web applications, the database is the bottleneck. As such, one of the best things you can do is to use memcached to cache results in memory. You can also use something like xhprof to profile your code, and really dial in on what's taking the most time.
Yes, those are two different cache-techniques, and you've understood them correctly.
but beware on 1):
1.) Caching script generated output to files or proxies may render problems
if content change rapidly.
2.) x-cache exists too and is easy to install on ubuntu.
regards,
/t
I don't know if this really would work, but I came across a performance problem with a PHP script that I had. I have a plain text file that stores data as a title and a URL tab separated with each record separated by a new line. My script grabs the file at each URL and saves it to its own folder.
Then I have another page that actually displays the local files (in this case, pictures) and I use a preg_replace() to change the output of each line from the remote url to a relative one so that it can be displayed by the server. My tab separated file is now over 1 MB and it takes a few SECONDS to do the preg_replace(), so I decided to look into output caching. I couldn't find anything definitive, so I figured I would try my own hand at it and here's what I came up with:
When I request the page to view stuff locally, I try to read it from a variable in a global scope. If this is empty, it might be that this application hasn't run yet and this global needs populated. If it was empty, read from an output file (plain html file that literally shows everything to output) and save the contents to the global variable and then display the output from the global.
Now, when the script runs to update the tab separated file, it updates the output file and the global variable. This way, the portion of the script that actually does the stuff that runs slowly only runs when the data is being updated.
Now I haven't tried this yet, but theoretically, this should improve my performance a lot, although it does actually still run the script, but the data would never be out of date and I should get a much better load time.
Hope this helps.

Apache, PHP caching

A have setup an internal proxy kind of thing using Curl and PHP. The setup is like this:
The proxy server is a rather cheap VPS (which has slow disk i/o at times). All requests to this server are handled by a single index.php script. The index.php fetches data from another, fast server and displays to the user.
The data transfer between the two servers is very fast and the bottleneck is only the disk i/o on the proxy server. Since there is only one index.php - I want to know
1) How do I ensure that index.php is permanently "cahced" in Apache on the proxy server? (Googling for php cache, I found many custom solutions that will cache the "data" output by php I want to know if there are any pre-build modules in apache that will cache the php-script itself?).
2) Is the data fetched from the backend server alway in the RAM/cache on the proxy server? (assuming there is enough memory)
3) Does apache read any config files or other files from disk when handling requests?
4) Does apache wait for logs to be written to disk before serving the content - if so I will disable logging on the proxy server (or is there way to ensure content is first served without waiting for logs to be written).?
Basically, I want to eliminate disk i/o all together on the 'proxy' server.
Thanks,
JP
1) Install APC (http://pecl.php.net/apc), this will compile your PHP script once and keep it in shared memory for the lifetime of the webserver process (or a given TTL).
2) If your script fetches data and does not cache/store it on the filesystem, it will be in RAM, yes. But only for the duration of the request. PHP uses a 'share-nothing' strategy which means -all- memory is released after a request. If you do cache data on the filesystem, consider using memcached (http://memcached.org/) instead to bypass file i/o.
3) If you have .htaccess support activated, Apache will search for those in each path leading to your php file. See Why can't I disable .htaccess in Apache? for more info.
4) Not 100% sure, but it probably does wait.
Why not use something like Varnish which is explicitly built for this type of task and does not carry the overhead of Apache?
I would recommend "tinyproxy" for this puprose.
Does everything you want very efficeintly.

Read files via php

You all know about restrictions that exist in shared environment, so with that in mind, please suggest me a php function or something with the help of which I could stream my videos and other files. I have a lot of videos on the server, unlimited bandwidth and disk space, but I am limited in ram and cpu.
Don't use php to stream the data. Use a header redirect to point to the URL of the actual file. This will offload the work onto the webserver which might run under a different user id and is better optimized for this task.
Hmm, there is XMoov that acts as a "streaming server" but does not much more than serve a file byte by byte, with a few additional options and settings. It promises random access (i.e. arbitrary skipping within a video) but I haven't used it myself yet.
As a server administrator, though, I would frown on anybody using PHP to serve huge files like that because of the strain it puts on the server. I would generally not regard this to be a good idea, and rent a streaming server instead if at all possible. Use at your own risk.
You can use a while loop to load bits of the file, and then sleep for some time, and then output more, and sleep... (that would be the only way to limit the CPU usage).
RAM shouldn't be a problem, as you will just dump parts of the file, so you don't need to load it into RAM.

mod_rewrite in apache

I have image hosting. All image requests redirect to a PHP script with mod_rewrite. PHP script uses function fread() and displays picture from another file. I want to know, does this use a lot of processor time or not?
It depends on how much you think "a lot of processor time" is, but from what you're describing, the processing time required by mod_rewrite and PHP is trivial compared to the I/O time to read the image from disk and send it over the network.
If you're concerned about speed, caching the images in memory will probably have the most benefit.
Yes, this is putting considerable strain on the web server, because the PHP interpreter has to be initialized for every small resource request, and passes through the data. The consensus is that this is not a good thing to do on a high-traffic website.
Why are you doing this, are you resizing images?
You will run out of memory before hitting CPU limit :-)
Reading/writing file is not CPU-intensive task, but each apache process created for that can eat up to 50 MB of RAM.
If you want to send images fast and secure, you should look into X-SendFile - this allows your php scripts to tell your Webserver to send files not directly accessible by url using something like header('X-SendFile: /path/to/the/file');
For apache there is mod_xsendfile (http://tn123.ath.cx/mod_xsendfile/) though labeled beta, it has proven to be very stable in production, and the its sourcecode is rather small and can be audited easily.

Categories