I'm building a feature of a site that will generate a PDF (using TCPDF) into a booklet of 500+ pages. The layout is very simple but just due to the number of records I think it qualifies as a "long running php process". This will only need to be done a handful of times per year and if I could just have it run in the background and email the admin when done, that would be perfect. Considered Cron but it is a user-generated type of feature.
What can I do to keep my PDF rendering for as long as it takes? I am "good" with PHP but not so much with *nix. Even a tutorial link would be helpful.
Honestly you should avoid doing this entirely from a scalability perspective. I'd use a database table to "schedule" the job with the parameters, have a script that is continuously checking this table. Then use JavaScript to poll your application for the file to be "ready", when the file is ready then let the JavaScript pull down the file to the client.
It will be incredibly hard to maintain/troubleshoot this process while you're wondering why is my web server so slow all of a sudden. Apache doesn't make it easy to determine what process is eating up what CPU.
Also by using a database you can do things like limit the number of concurrent threads, or even provide faster rendering time by letting multiple processes render each PDF page and then re-assemble them together with yet another process... etc.
Good luck!
What you need is to change the allowed maximum execution time for PHP scripts. You can do that by several means from the script itself (you should prefer this if it would work) or by changing php.ini.
BEWARE - Changing execution time might seriously lower the performance of your server. A script is allowed to run only a certain time (30sec by default) before it is terminated by the parser. This helps prevent poorly written scripts from tying up the server. You should exactly know what you are doing before you do this.
You can find some more info about:
setting max-execution-time in php.ini here http://www.php.net/manual/en/info.configuration.php#ini.max-execution-time
limiting the maximum execution time by set_time_limit() here http://php.net/manual/en/function.set-time-limit.php
PS: This should work if you use PHP to generate the PDF. It will not work if you use some stuff outside of the script (called by exec(), system() and similar).
This question is already answered, but as a result of other questions / answers here, here is what I did and it worked great: (I did the same thing using pdftk, but on a smaller scale!)
I put the following code in an iframe:
set_time_limit(0); // ignore php timeout
//ignore_user_abort(true); // optional- keep on going even if user pulls the plug*
while(ob_get_level())ob_end_clean();// remove output buffers
ob_implicit_flush(true);
This avoided the page load timeout. You might want to put a countdown or progress bar on the parent page. I originally had the iframe issuing progress updates back to the parent, but browser updates broke that.
I have a PHP function that I want to make available publically on the web - but it uses a lot of server resources each time it is called.
What I'd like to happen is that a user who calls this function is forced to wait for some time, before the function is called (or, at the least, before they can call it a second time).
I'd greatly prefer this 'wait' to be enforced on the server-side, so that it can't be overridden by dubious clients.
I plan to insist that users log into an online account.
Is there an efficient way I can make the user wait, without using server resources?
Would 'sleep()' be an appropriate way to do this?
Are there any suggested problems with using sleep()?
Is there a better solution to this?
Excuse my ignorance, and thanks!
sleep would be fine if you were using PHP as a command line tool for example. For a website though, your sleep will hold the connection open. Your webserver will only have a finite number of concurrent connections, so this could be used to DOS your site.
A better - but more involved - way would be to use a job queue. Add the task to a queue which is processed by a scheduled script and update the web page using AJAX or a meta-refresh.
sleep() is a bad idea in almost all possible situations. In your case, it's bad because it keeps the connection to the client open, and most webservers have a limit of open connections.
sleep() will not help you at all. The user could just load the page twice at the same time, and the command would be executed twice right after each other.
Instead, you could save a timestamp in your database for when your function was last invoked. Then, before invoking it, you should check the database to see if a suitable amount of time has passed. If it has, invoke the function and update the timestamp in the database.
If you're planning on enforcing a user login, than the problem just got a whole lot simpler.
Have a record inn the database listing users and the last time they used your resource consuming service, and measure the time difference between then and now. If the time difference is too low, deny access and display an error message.
This is best handled at the server level. No reason to even invoke PHP for repeat requests.
Like many sites, I use Nginx and you can use it's rate-limiting to block repeat requests over a certain number. So like, three requests per IP, per hour.
What Server Side Programming Language, which without a single doubt is THE FASTEST to output a file content? (I am looking at ~20k file hits / second, so YES it does matter if certain X Language can output a file 1ms faster then PHP).
Because PHP was my language of choice, I have read the following links before I posted this question (but suddenly it raised a question, which server side programming language that is faster than PHP?)
http://raditha.com/wiki/Readfile_vs_include
When you state your answer, please also tell me the method that is used to read file. So dont just say FASTCGI/PHP, but also the method used to read the file, such as in this case readfile();
(I am looking at ~20k file hits / second , that is why i have totally abandoned the idea to use apache at all, and i really dont want my poor choice of Server Side Programming Language actually slow down the file output , so YES it does matter if certain X Language can output a file 1ms faster then PHP )
The thing is, are all of those 20k hits/second going to be requiring generation of the file? That seems unlikely. After the first generation of a static file, you can just configure nginx to cache it, so all of the requests after that will hit the cached version and never invoke your server-side language at all.
I also need a server side script to check if this file existed or not
That's the point of having a proxy cache like nginx there in the first place.
So are you sure you're really chasing the right problem here? The numbers you should be giving us are not how many hits you expect per second, but rather how many cache misses you expect per second. After all, if you're serving, say, 600 files that change once every minute, that's only going to be on the order of 10 cache misses per second, which is a much more manageable number for the actual server-side program to handle (and makes the choice of language less of an issue).
So, do tell us more: what's your cache hit/miss rate going to be like? A 10% cache miss rate is a lot different than a 1% cache miss rate, and so on.
Ok, I am seeking a way for a shoutbox to trigger ONLY when a message is inputted into it by anyone.
Basically, we have control over what happens when someone submits a shout, but I don't want to use a Javascript interval (setTimeout() or setInterval()) functions to call this every x amount of seconds. Because every x amount of seconds it is hitting the database trying to determine if a shout has been made or not.
There MUST be a better method to do this that doesn't hit the database every x seconds in order to update the Shoutbox for everyone to see the shout.
This is where I need help, because our shoutbox is using too many resources on the server and causing the server to overload at times.
I have thought about a file instead... For example, when a shout has been posted to the shoutbox, a filename on the server changes to something, this filename than gets checked (instead of the database), and if the filename is different, than it should load up the shout. Perhaps, it can even place the shout within the file, instead of the database, and ONLY stick it into the database after another shout has been added to the shoutbox. Therefore, the last shout will always remain in the file, in text format.
Would this be a better approach for handling server overloads? But even still I would need to call an interval to determine if the file name changed. So this is still calling an interval, but do you think it would be better on the server this way? If so, what name should I use? Should I use the php time() function for this to name the file?
Are there any other ideas that someone could recommend to handle this. Preferably a way without using any intervals to update the shoutbox?
Please help, thanks guys :)
You might want to look into HTML5's WebSockets, but for non-compatible browsers I think you're stuck with continuously polling the server on an interval.
What you are getting close to with your descriptions of a possible solution is more like a server push than a page get.
I know you can achieve this sort of behavior using node.js with a Transfer-encoding: chunked. Check out the video at http://www.nodejs.org/ for a better example
Question Part A ▉ (100 bountys, awarded)
Main question was how to make this site, load faster. First we needed to read these waterfalls. Thanks all for your suggestions on the waterfall readout analysis. Evident from the various waterfall graphs shown here is the main bottleneck: the PHP-generated thumbnails. The protocol-less jquery loading from CDN advised by David got my bounty, albeit making my site only 3% faster overall, and while not answering the site's main bottleneck. Time for for clarification of my question, and, another bounty:
Question Part B ▉ (100 bountys, awarded)
The new focus was now to solve the problem that the 6 jpg images had, which are causing the most of the loading-delay. These 6 images are PHP-generated thumbnails, tiny and only 3~5 kb, but loading relatively very slowly. Notice the "time to first byte" on the various graphs. The problem remained unsolved, but a bounty went to James, who fixed the header error that RedBot underlined: "An If-Modified-Since conditional request returned the full content unchanged.".
Question Part C ▉ (my last bounty: 250 points)
Unfortunately, after even REdbot.org header error was fixed, the delay caused by the PHP-generated images remained untouched. What on earth are these tiny puny 3~5Kb thumbnails thinking? All that header information can send a rocket to moon and back. Any suggestions on this bottleneck is much appreciated and treated as possible answer, since I am stuck at this bottleneckish problem for already seven months now.
[Some background info on my site: CSS is at the top. JS at the bottom (Jquery,JQuery UI, bought menu awm/menu.js engines, tabs js engine, video swfobject.js) The black lines on the second image show whats initiating what to load. The angry robot is my pet "ZAM". He is harmless and often happier.]
Load Waterfall: Chronological | http://webpagetest.org
Parallel Domains Grouped | http://webpagetest.org
Site-Perf Waterfall | http://site-perf.com
Pingdom Tools Waterfall | http://tools.pingdom.com
GTmetrix Waterfall | http://gtmetrix.com
First, using those multiple domains requires several DNS lookups. You'd be better off combining many of those images into a sprite instead of spreading the requests.
Second, when I load your page, I see most of the blocking (~1.25s) on all.js. I see that begins with (an old version of) jQuery. You should reference that from the Google CDN, to not only decrease load time, but potentially avoid an HTTP request for it entirely.
Specifically, the most current jQuery and jQuery UI libraries can be referenced at these URLs (see this post if you're interested why I omitted the http:):
//ajax.googleapis.com/ajax/libs/jquery/1.4.4/jquery.min.js
//ajax.googleapis.com/ajax/libs/jqueryui/1.8.9/jquery-ui.min.js
If you're using one of the default jQuery UI themes, you can also pull its CSS and images off the Google CDN.
With the jQuery hosting optimized, you should also combine awmlib2.js and tooltiplib.js into a single file.
If you address those things, you should see a significant improvement.
I had a similar problem a few days ago & i found head.js.
It's a Javascript Plugin which allows you to load all JS files paralell.
Hope that helps.
I am far from an expert but...
In regards to this:
"An If-Modified-Since conditional request returned the full content unchanged."
and my comments.
The code used to generate the Thumbnails should be checking for the following:
Is there a cached version of the thumbnail.
Is the cached version newer than the original image.
If either of these are false the thumbnail should be generated and returned no matter what. If they are both true then the following check should be made:
Is there a HTTP_IF_MODIFIED_SINCE header
Is the cached version's last modified time the same as the HTTP_IF_MODIFIED_SINCE
If either of these are false the cached thumbnail should be returned.
If both of these are true then a 304 http status should be returned. I'm not sure if its required but I also personally return the Cache-Control, Expires and Last-Modified headers along with the 304.
In regards to GZipping, I've been informed that there is no need to GZip images so ignore that part of my comment.
Edit: I didn't notice your addition to your post.
session_cache_limiter('public');
header("Content-type: " . $this->_mime);
header("Expires: " . gmdate("D, d M Y H:i:s", time() + 2419200) . " GMT");
// I'm sure Last-Modified should be a static value. not dynamic as you have it here.
header("Last-Modified: " . gmdate("D, d M Y H:i:s",time() - 404800000) . " GMT");
I'm also sure that your code needs to check for the HTTP_IF_MODIFIED_SINCE header and react to it. Just setting these headers and your .htaccess file won't provide the required result.
I think you need something like this:
$date = 'D, d M Y H:i:s T'; // DATE_RFC850
$modified = filemtime($filename);
$expires = strtotime('1 year'); // 1 Year
header(sprintf('Cache-Control: %s, max-age=%s', 'public', $expires - time()));
header(sprintf('Expires: %s', date($date, $expires)));
header(sprintf('Last-Modified: %s', date($date, $modified)));
header(sprintf('Content-Type: %s', $mime));
if(isset($_SERVER['HTTP_IF_MODIFIED_SINCE'])) {
if(strtotime($_SERVER['HTTP_IF_MODIFIED_SINCE']) === $modified) {
header('HTTP/1.1 304 Not Modified', true, 304);
// Should have been an exit not a return. After sending the not modified http
// code, the script should end and return no content.
exit();
}
}
// Render image data
Wow, it's hard to explain things using that image.. But here, some tries:
files 33-36 load that late, because they are dynamically loaded within the swf, and the swf (25) is loaded first completely before it loads any additional content
files 20 & 21 are maybe (I don't know, because I don't know your code) libraries that are loaded by all.js (11), but for 11 to execute, it waits for the whole page (and assets) to load (you should change that to domready)
files 22-32 are loaded by those two libraries, again after those are completely loaded
Just a simple guess because this kind of analysis requires a lot of A/B testing: your .ch domain seems to be hard to reach (long, green bands before the first byte arrives).
This would mean that either the .ch website is poorly hosted or that you ISP does not have a good route to them.
Given the diagrams, this could explain a big performance hit.
On a side note, there is this cool tool cuzillion that could help you sort out things depending on your ordering of ressource loading.
Try running Y!Slow and Page Speed tests on your site/page, and follow the guidelines to sort out possible performance bottlenecks. You should be getting huge performance gains once you score higher in Y!Slow or Page Speed.
These tests will tell you what's wrong and what to change.
So your PHP script is generating the thumbnails on every page load? First off, if the images that are being thumbnailed are not changing that often, could you set up a cache such that they don't have to be parsed each time the page loads? Secondly, is your PHP script using something like imagecopyresampled() to create the thumbnails? That's a non-trivial downsample and the PHP script won't return anything until its done shrinking things down. Using imagecopymerged() instead will reduce the quality of the image, but speed up the process. And how much of a reduction are you doing? Are these thumbnails 5% the size of the original image or 50%? A greater size of the original image likely is leading to a slowdown since the PHP script has to get the original image in memory before it can shrink it and output a smaller thumbnail.
I've found the URL of your website and checked an individual jpg file from the homepage.
While the loading time is reasonable now (161ms), it's waiting for 126ms, which is far too much.
Your last-modified headers are all set to Sat, 01 Jan 2011 12:00:00 GMT, which looks too "round" to be the real date of generation ;-)
Since Cache-control is "public, max-age=14515200", arbitrary last-modified headers will could cause problem after 168 days.
Anyway, this is not the real reason for delays.
You have to check what your thumbnail generator do when the thumbnail already exists and what could consume so much time checking and delivering the picture.
You could install xdebug to profile the script and see where the bottlenecks are.
Maybe the whole thing uses a framework or connects to some database for nothing. I've seen very slow mysql_connect() on some servers, mostly because they were connecting using TCP and not socket, sometimes with some DNS issues.
I understand you can't post your paid generator here but I'm afraid there are too many possible issues...
If there isn't a really good reason (usually there isn't) your images shouldn't invoke the PHP interpreter.
Create a rewrite rule for your web server that servers the image directly if it is found on the file system. If it's not, redirect to your PHP script to generate the image. When you edit the image, change the images filename to force users that have a cached version to fetch the newly edited image.
If it doesn't work at least you will now it doesn't have anything to do with the way the images are created and checked.
Investigate PHP's usage of session data. Maybe (just maybe), the image-generating PHP script is waiting to get a lock on the session data, which is locked by the still-rendering main page or other image-rendering scripts. This would make all the JavaScript/browser optimizations almost irrelevant, since the browser's waiting for the server.
PHP locks the session data for every script running, from the moment the session handling starts, to the moment the script finishes, or when session_write_close() is called. This effectively serializes things. Check out the PHP page on sessions, especially the comments, like this one.
This is just a wild guess since I haven't looked at your code but I suspect sessions may be playing a role here, the following is from the PHP Manual entry on session_write_close():
Session data is usually stored after
your script terminated without the
need to call session_write_close(),
but as session data is locked to
prevent concurrent writes only one
script may operate on a session at any
time. When using framesets together
with sessions you will experience the
frames loading one by one due to this
locking. You can reduce the time
needed to load all the frames by
ending the session as soon as all
changes to session variables are
done.
Like I said, I don't know what your code is doing but those graphs seem oddly suspicious. I had a similar issue when I coded a multipart file serving function and I had the same problem. When serving a large file I couldn't get the multipart functionality to work nor could I open another page until the download was completed. Calling session_write_close() fixed both my problems.
Have you tried replacing the php generated thumnails by regular images to see if there is any difference ?
The problem could be around
- a bug in your php code leading to a regeneration of the thumbnail upon each server invocation
- a delay in your code ( sleep()?) associated with a clock problem
- a hardrive issue causing a very bad race condition since all the thumbnails get loaded/generated at the same time.
I think instead of using that thumbnail-generator script you must give TinySRC a try for rapid fast and cloud-hosted thumbnail generation.
It has a very simple and easy to use API, you can use like:-
http://i.tinysrc.mobi/ [height] / [width] /http://domain.tld/path_to_img.jpg
[width] (optional):-
This is a width in pixels (which overrides the adaptive- or family-sizing). If prefixed with ‘-’ or ‘x’, it will subtract from, or shrink to a percentage of, the determined size.
[height] (optional):-
This is a height in pixels, if width is also present. It also overrides adaptive- or family-sizing and can be prefixed with ‘-’ or ‘x’.
You can check the API summary here
FAQ
What does tinySrc cost me?
Nothing.
When can I start using tinySrc?
Now.
How reliable is the service?
We make no guarantees about the tinySrc service. However, it runs on a major, distributed cloud infrastructure, so it provides high availability worldwide. It should be sufficient for all your needs.
How fast is it?
tinySrc caches resized images in memory and in our datastore for up to 24 hours, and it will not fetch your original image each time. This makes the services blazingly fast from the user’s perspective. (And reduces your server load as a nice side-effect.)
Good Luck. Just a suggestion, since u ain't showing us the code :p
As some browsers only download 2 parallels downloads per domain, could you not add additional domains to shard the requests over two to three different hostnames. e.g. 1.imagecdn.com 2.imagecdn.com
First of all, you need to handle If-Modified-Since requests and such appropriately, as James said. That error states that: "When I ask your server if that image is modified since the last time, it sends the whole image instead of a simple yes/no".
The time between the connection and the first byte is generally the time your PHP script takes to run. It is apparent that something is happening when that script starts to run.
Have you considered profiling it? It may have some issues.
Combined with the above issue, your script may be running many more times than needed. Ideally, it should generate thumbs only if the original image is modified and send cached thumbs for every other request. Have you checked that the script is generating the images unnecessarily (e.g. for each request)?
Generating proper headers through the application is a bit tricky, plus they may get overwritten by the server. And you are exposed to abuse as anyone sending some no-cache request headers will cause your thumbnail generator to run continuously (and raise loads). So, if possible, try to save those generated thumbs, call the saved images directly from your pages and manage headers from .htaccess. In this case, you wouldn't even need anything in your .htaccess if your server is configured properly.
Other than these, you can apply some of the bright optimization ideas from the performance parts of this overall nice SO question on how to do websites the right way, like splitting your resources into cookieless subdomains, etc. But at any rate, a 3k image shouldn't take a second to load, this is apparent when compared to other items in the graphs. You should try to spot the problem before optimizing.
Have you tried to set up several subdomains under NGINX webserver specially for serving static data like images and stylesheets? Something helpful could be already found in this topic.
Regarding the delayed thumbnails, try putting a call to flush() immediately after the last call to header() in your thumbnail generation script. Once done, regenerate your waterfall graph and see if the delay is now on the body instead of the headers. If so you need to take a long look at the logic that generates and/or outputs the image data.
The script that handles the thumbnails should hopefully use some sort of caching so that whatever actions it takes on the images you're serving will only happen when absolutely necessary. It looks like some expensive operation is taking place every time you serve the thumbnails which is delaying any output (including the headers) from the script.
The majority of the slow issue is your TTFB (Time to first byte) being too high. This is a hard one to tackle without getting intimate with your server config files, code and underlying hardware, but I can see it's rampant on every request. You got too much green bars (bad) and very little blue bars (good). You might want to stop optimizing the frontend for a bit, as I believe you've done much in that area. Despite the adage that "80%-90% of the end-user response time is spent on the frontend", I believe yours is occuring in the backend.
TTFB is backend stuff, server stuff, pre-processing prior to output and handshaking.
Time your code execution to find slow stuff like slow database queries, time entering and exiting functions/methods to find slow functions. If you use php, try Firephp. Sometimes it is one or two slow queries being run during startup or initializtion like pulling session info or checking authentication and what not. Optimizing queries can lead to some good perf gains. Sometimes code is run using php prepend or spl autoload so they run on everything. Other times it can be mal configured apache conf and tweaking that saves the day.
Look for inefficient loops. Look for slow fetching calls of caches or slow i/o operations caused by faulty disk drives or high disk space usage. Look for memory usages and what's being used and where. Run a webpagetest repeated test of 10 runs on a single image or file using only first view from different locations around the world and not the same location. And read your access and error logs, too many developers ignore them and rely only on outputted onscreen errors. If your web host has support, ask them for help, if they don't maybe politely ask them for help anyway, it won't hurt.
You can try DNS Prefetching to combat the many domains and resources, http://html5boilerplate.com/docs/DNS-Prefetching/
Is the server your own a good/decent server? Sometimes a better server can solve a lot of problems. I am a fan of the 'hardware is cheap, programmers are expensive' mentality, if you have the chance and the money upgrade a server. And/Or use a CDN like maxcdn or cloudflare or similar.
Good Luck!
(p.s. i don't work for any of these companies. Also the cloudflare link above will argue that TTFB is not that important, I threw that in there so you can get another take.)
Sorry to say, you provide to few data. And you already had some good suggestions.
How are you serving those images ? If you're streaming those via PHP you're doing a very bad thing, even if they are already generated.
NEVER STREAM IMAGES WITH PHP. It will slow down your server, no matter the way you use it.
Put them in a accessible folder, with a meaningful URI. Then call them directly with their real URI.
If you need on the fly generation you should put an .htaccess in the images directory which redirects to a generator php-script only if the request image is missing. (this is called cache-on-request strategy).
Doing that will fix php session, browser-proxy, caching, ETAGS, whatever all at once.
WP-Supercache uses this strategy, if properly configured.
I wrote this some time ago ( http://code.google.com/p/cache-on-request/source/detail?r=8 ), last revisions are broken, but I guess 8 or less should work and you can grab the .htaccess as an example just to test things out (although there are better ways to configure the .htaccess than the way I used to).
I described that strategy in this blog post ( http://www.stefanoforenza.com/need-for-cache/ ). It is probably badly written but it may help clarifying things up.
Further reading: http://meta.wikimedia.org/wiki/404_handler_caching