Cached, PHP generated Thumbnails load slowly - php

Question Part A ▉ (100 bountys, awarded)
Main question was how to make this site, load faster. First we needed to read these waterfalls. Thanks all for your suggestions on the waterfall readout analysis. Evident from the various waterfall graphs shown here is the main bottleneck: the PHP-generated thumbnails. The protocol-less jquery loading from CDN advised by David got my bounty, albeit making my site only 3% faster overall, and while not answering the site's main bottleneck. Time for for clarification of my question, and, another bounty:
Question Part B ▉ (100 bountys, awarded)
The new focus was now to solve the problem that the 6 jpg images had, which are causing the most of the loading-delay. These 6 images are PHP-generated thumbnails, tiny and only 3~5 kb, but loading relatively very slowly. Notice the "time to first byte" on the various graphs. The problem remained unsolved, but a bounty went to James, who fixed the header error that RedBot underlined: "An If-Modified-Since conditional request returned the full content unchanged.".
Question Part C ▉ (my last bounty: 250 points)
Unfortunately, after even REdbot.org header error was fixed, the delay caused by the PHP-generated images remained untouched. What on earth are these tiny puny 3~5Kb thumbnails thinking? All that header information can send a rocket to moon and back. Any suggestions on this bottleneck is much appreciated and treated as possible answer, since I am stuck at this bottleneckish problem for already seven months now.
[Some background info on my site: CSS is at the top. JS at the bottom (Jquery,JQuery UI, bought menu awm/menu.js engines, tabs js engine, video swfobject.js) The black lines on the second image show whats initiating what to load. The angry robot is my pet "ZAM". He is harmless and often happier.]
Load Waterfall: Chronological | http://webpagetest.org
Parallel Domains Grouped | http://webpagetest.org
Site-Perf Waterfall | http://site-perf.com
Pingdom Tools Waterfall | http://tools.pingdom.com
GTmetrix Waterfall | http://gtmetrix.com

First, using those multiple domains requires several DNS lookups. You'd be better off combining many of those images into a sprite instead of spreading the requests.
Second, when I load your page, I see most of the blocking (~1.25s) on all.js. I see that begins with (an old version of) jQuery. You should reference that from the Google CDN, to not only decrease load time, but potentially avoid an HTTP request for it entirely.
Specifically, the most current jQuery and jQuery UI libraries can be referenced at these URLs (see this post if you're interested why I omitted the http:):
//ajax.googleapis.com/ajax/libs/jquery/1.4.4/jquery.min.js
//ajax.googleapis.com/ajax/libs/jqueryui/1.8.9/jquery-ui.min.js
If you're using one of the default jQuery UI themes, you can also pull its CSS and images off the Google CDN.
With the jQuery hosting optimized, you should also combine awmlib2.js and tooltiplib.js into a single file.
If you address those things, you should see a significant improvement.

I had a similar problem a few days ago & i found head.js.
It's a Javascript Plugin which allows you to load all JS files paralell.
Hope that helps.

I am far from an expert but...
In regards to this:
"An If-Modified-Since conditional request returned the full content unchanged."
and my comments.
The code used to generate the Thumbnails should be checking for the following:
Is there a cached version of the thumbnail.
Is the cached version newer than the original image.
If either of these are false the thumbnail should be generated and returned no matter what. If they are both true then the following check should be made:
Is there a HTTP_IF_MODIFIED_SINCE header
Is the cached version's last modified time the same as the HTTP_IF_MODIFIED_SINCE
If either of these are false the cached thumbnail should be returned.
If both of these are true then a 304 http status should be returned. I'm not sure if its required but I also personally return the Cache-Control, Expires and Last-Modified headers along with the 304.
In regards to GZipping, I've been informed that there is no need to GZip images so ignore that part of my comment.
Edit: I didn't notice your addition to your post.
session_cache_limiter('public');
header("Content-type: " . $this->_mime);
header("Expires: " . gmdate("D, d M Y H:i:s", time() + 2419200) . " GMT");
// I'm sure Last-Modified should be a static value. not dynamic as you have it here.
header("Last-Modified: " . gmdate("D, d M Y H:i:s",time() - 404800000) . " GMT");
I'm also sure that your code needs to check for the HTTP_IF_MODIFIED_SINCE header and react to it. Just setting these headers and your .htaccess file won't provide the required result.
I think you need something like this:
$date = 'D, d M Y H:i:s T'; // DATE_RFC850
$modified = filemtime($filename);
$expires = strtotime('1 year'); // 1 Year
header(sprintf('Cache-Control: %s, max-age=%s', 'public', $expires - time()));
header(sprintf('Expires: %s', date($date, $expires)));
header(sprintf('Last-Modified: %s', date($date, $modified)));
header(sprintf('Content-Type: %s', $mime));
if(isset($_SERVER['HTTP_IF_MODIFIED_SINCE'])) {
if(strtotime($_SERVER['HTTP_IF_MODIFIED_SINCE']) === $modified) {
header('HTTP/1.1 304 Not Modified', true, 304);
// Should have been an exit not a return. After sending the not modified http
// code, the script should end and return no content.
exit();
}
}
// Render image data

Wow, it's hard to explain things using that image.. But here, some tries:
files 33-36 load that late, because they are dynamically loaded within the swf, and the swf (25) is loaded first completely before it loads any additional content
files 20 & 21 are maybe (I don't know, because I don't know your code) libraries that are loaded by all.js (11), but for 11 to execute, it waits for the whole page (and assets) to load (you should change that to domready)
files 22-32 are loaded by those two libraries, again after those are completely loaded

Just a simple guess because this kind of analysis requires a lot of A/B testing: your .ch domain seems to be hard to reach (long, green bands before the first byte arrives).
This would mean that either the .ch website is poorly hosted or that you ISP does not have a good route to them.
Given the diagrams, this could explain a big performance hit.
On a side note, there is this cool tool cuzillion that could help you sort out things depending on your ordering of ressource loading.

Try running Y!Slow and Page Speed tests on your site/page, and follow the guidelines to sort out possible performance bottlenecks. You should be getting huge performance gains once you score higher in Y!Slow or Page Speed.
These tests will tell you what's wrong and what to change.

So your PHP script is generating the thumbnails on every page load? First off, if the images that are being thumbnailed are not changing that often, could you set up a cache such that they don't have to be parsed each time the page loads? Secondly, is your PHP script using something like imagecopyresampled() to create the thumbnails? That's a non-trivial downsample and the PHP script won't return anything until its done shrinking things down. Using imagecopymerged() instead will reduce the quality of the image, but speed up the process. And how much of a reduction are you doing? Are these thumbnails 5% the size of the original image or 50%? A greater size of the original image likely is leading to a slowdown since the PHP script has to get the original image in memory before it can shrink it and output a smaller thumbnail.

I've found the URL of your website and checked an individual jpg file from the homepage.
While the loading time is reasonable now (161ms), it's waiting for 126ms, which is far too much.
Your last-modified headers are all set to Sat, 01 Jan 2011 12:00:00 GMT, which looks too "round" to be the real date of generation ;-)
Since Cache-control is "public, max-age=14515200", arbitrary last-modified headers will could cause problem after 168 days.
Anyway, this is not the real reason for delays.
You have to check what your thumbnail generator do when the thumbnail already exists and what could consume so much time checking and delivering the picture.
You could install xdebug to profile the script and see where the bottlenecks are.
Maybe the whole thing uses a framework or connects to some database for nothing. I've seen very slow mysql_connect() on some servers, mostly because they were connecting using TCP and not socket, sometimes with some DNS issues.
I understand you can't post your paid generator here but I'm afraid there are too many possible issues...

If there isn't a really good reason (usually there isn't) your images shouldn't invoke the PHP interpreter.
Create a rewrite rule for your web server that servers the image directly if it is found on the file system. If it's not, redirect to your PHP script to generate the image. When you edit the image, change the images filename to force users that have a cached version to fetch the newly edited image.
If it doesn't work at least you will now it doesn't have anything to do with the way the images are created and checked.

Investigate PHP's usage of session data. Maybe (just maybe), the image-generating PHP script is waiting to get a lock on the session data, which is locked by the still-rendering main page or other image-rendering scripts. This would make all the JavaScript/browser optimizations almost irrelevant, since the browser's waiting for the server.
PHP locks the session data for every script running, from the moment the session handling starts, to the moment the script finishes, or when session_write_close() is called. This effectively serializes things. Check out the PHP page on sessions, especially the comments, like this one.

This is just a wild guess since I haven't looked at your code but I suspect sessions may be playing a role here, the following is from the PHP Manual entry on session_write_close():
Session data is usually stored after
your script terminated without the
need to call session_write_close(),
but as session data is locked to
prevent concurrent writes only one
script may operate on a session at any
time. When using framesets together
with sessions you will experience the
frames loading one by one due to this
locking. You can reduce the time
needed to load all the frames by
ending the session as soon as all
changes to session variables are
done.
Like I said, I don't know what your code is doing but those graphs seem oddly suspicious. I had a similar issue when I coded a multipart file serving function and I had the same problem. When serving a large file I couldn't get the multipart functionality to work nor could I open another page until the download was completed. Calling session_write_close() fixed both my problems.

Have you tried replacing the php generated thumnails by regular images to see if there is any difference ?
The problem could be around
- a bug in your php code leading to a regeneration of the thumbnail upon each server invocation
- a delay in your code ( sleep()?) associated with a clock problem
- a hardrive issue causing a very bad race condition since all the thumbnails get loaded/generated at the same time.

I think instead of using that thumbnail-generator script you must give TinySRC a try for rapid fast and cloud-hosted thumbnail generation.
It has a very simple and easy to use API, you can use like:-
http://i.tinysrc.mobi/ [height] / [width] /http://domain.tld/path_to_img.jpg
[width] (optional):-
This is a width in pixels (which overrides the adaptive- or family-sizing). If prefixed with ‘-’ or ‘x’, it will subtract from, or shrink to a percentage of, the determined size.
[height] (optional):-
This is a height in pixels, if width is also present. It also overrides adaptive- or family-sizing and can be prefixed with ‘-’ or ‘x’.
You can check the API summary here
FAQ
What does tinySrc cost me?
Nothing.
When can I start using tinySrc?
Now.
How reliable is the service?
We make no guarantees about the tinySrc service. However, it runs on a major, distributed cloud infrastructure, so it provides high availability worldwide. It should be sufficient for all your needs.
How fast is it?
tinySrc caches resized images in memory and in our datastore for up to 24 hours, and it will not fetch your original image each time. This makes the services blazingly fast from the user’s perspective. (And reduces your server load as a nice side-effect.)
Good Luck. Just a suggestion, since u ain't showing us the code :p

As some browsers only download 2 parallels downloads per domain, could you not add additional domains to shard the requests over two to three different hostnames. e.g. 1.imagecdn.com 2.imagecdn.com

First of all, you need to handle If-Modified-Since requests and such appropriately, as James said. That error states that: "When I ask your server if that image is modified since the last time, it sends the whole image instead of a simple yes/no".
The time between the connection and the first byte is generally the time your PHP script takes to run. It is apparent that something is happening when that script starts to run.
Have you considered profiling it? It may have some issues.
Combined with the above issue, your script may be running many more times than needed. Ideally, it should generate thumbs only if the original image is modified and send cached thumbs for every other request. Have you checked that the script is generating the images unnecessarily (e.g. for each request)?
Generating proper headers through the application is a bit tricky, plus they may get overwritten by the server. And you are exposed to abuse as anyone sending some no-cache request headers will cause your thumbnail generator to run continuously (and raise loads). So, if possible, try to save those generated thumbs, call the saved images directly from your pages and manage headers from .htaccess. In this case, you wouldn't even need anything in your .htaccess if your server is configured properly.
Other than these, you can apply some of the bright optimization ideas from the performance parts of this overall nice SO question on how to do websites the right way, like splitting your resources into cookieless subdomains, etc. But at any rate, a 3k image shouldn't take a second to load, this is apparent when compared to other items in the graphs. You should try to spot the problem before optimizing.

Have you tried to set up several subdomains under NGINX webserver specially for serving static data like images and stylesheets? Something helpful could be already found in this topic.

Regarding the delayed thumbnails, try putting a call to flush() immediately after the last call to header() in your thumbnail generation script. Once done, regenerate your waterfall graph and see if the delay is now on the body instead of the headers. If so you need to take a long look at the logic that generates and/or outputs the image data.
The script that handles the thumbnails should hopefully use some sort of caching so that whatever actions it takes on the images you're serving will only happen when absolutely necessary. It looks like some expensive operation is taking place every time you serve the thumbnails which is delaying any output (including the headers) from the script.

The majority of the slow issue is your TTFB (Time to first byte) being too high. This is a hard one to tackle without getting intimate with your server config files, code and underlying hardware, but I can see it's rampant on every request. You got too much green bars (bad) and very little blue bars (good). You might want to stop optimizing the frontend for a bit, as I believe you've done much in that area. Despite the adage that "80%-90% of the end-user response time is spent on the frontend", I believe yours is occuring in the backend.
TTFB is backend stuff, server stuff, pre-processing prior to output and handshaking.
Time your code execution to find slow stuff like slow database queries, time entering and exiting functions/methods to find slow functions. If you use php, try Firephp. Sometimes it is one or two slow queries being run during startup or initializtion like pulling session info or checking authentication and what not. Optimizing queries can lead to some good perf gains. Sometimes code is run using php prepend or spl autoload so they run on everything. Other times it can be mal configured apache conf and tweaking that saves the day.
Look for inefficient loops. Look for slow fetching calls of caches or slow i/o operations caused by faulty disk drives or high disk space usage. Look for memory usages and what's being used and where. Run a webpagetest repeated test of 10 runs on a single image or file using only first view from different locations around the world and not the same location. And read your access and error logs, too many developers ignore them and rely only on outputted onscreen errors. If your web host has support, ask them for help, if they don't maybe politely ask them for help anyway, it won't hurt.
You can try DNS Prefetching to combat the many domains and resources, http://html5boilerplate.com/docs/DNS-Prefetching/
Is the server your own a good/decent server? Sometimes a better server can solve a lot of problems. I am a fan of the 'hardware is cheap, programmers are expensive' mentality, if you have the chance and the money upgrade a server. And/Or use a CDN like maxcdn or cloudflare or similar.
Good Luck!
(p.s. i don't work for any of these companies. Also the cloudflare link above will argue that TTFB is not that important, I threw that in there so you can get another take.)

Sorry to say, you provide to few data. And you already had some good suggestions.
How are you serving those images ? If you're streaming those via PHP you're doing a very bad thing, even if they are already generated.
NEVER STREAM IMAGES WITH PHP. It will slow down your server, no matter the way you use it.
Put them in a accessible folder, with a meaningful URI. Then call them directly with their real URI.
If you need on the fly generation you should put an .htaccess in the images directory which redirects to a generator php-script only if the request image is missing. (this is called cache-on-request strategy).
Doing that will fix php session, browser-proxy, caching, ETAGS, whatever all at once.
WP-Supercache uses this strategy, if properly configured.
I wrote this some time ago ( http://code.google.com/p/cache-on-request/source/detail?r=8 ), last revisions are broken, but I guess 8 or less should work and you can grab the .htaccess as an example just to test things out (although there are better ways to configure the .htaccess than the way I used to).
I described that strategy in this blog post ( http://www.stefanoforenza.com/need-for-cache/ ). It is probably badly written but it may help clarifying things up.
Further reading: http://meta.wikimedia.org/wiki/404_handler_caching

Related

Very bad TTFB time [duplicate]

I have a query which involves getting a list of user from a table in sorted order based on at what time it was created. I got the following timing diagram from the chrome developer tools.
You can see that TTFB (time to first byte) is too high.
I am not sure whether it is because of the SQL sort. If that is the reason then how can I reduce this time?
Or is it because of the TTFB. I saw blogs which says that TTFB should be less (< 1sec). But for me it shows >1 sec. Is it because of my query or something else?
I am not sure how can I reduce this time.
I am using angular. Should I use angular to sort the table instead of SQL sort? (many posts say that shouldn't be the issue)
What I want to know is how can I reduce TTFB. Guys! I am actually new to this. It is the task given to me by my team members. I am not sure how can I reduce TTFB time. I saw many posts, but not able to understand properly. What is TTFB. Is it the time taken by the server?
The TTFB is not the time to first byte of the body of the response (i.e., the useful data, such as: json, xml, etc.), but rather the time to first byte of the response received from the server. This byte is the start of the response headers.
For example, if the server sends the headers before doing the hard work (like heavy SQL), you will get a very low TTFB, but it isn't "true".
In your case, TTFB represents the time you spend processing data on the server.
To reduce the TTFB, you need to do the server-side work faster.
I have met the same problem. My project is running on the local server. I checked my php code.
$db = mysqli_connect('localhost', 'root', 'root', 'smart');
I use localhost to connect to my local database. That maybe the cause of the problem which you're describing. You can modify your HOSTS file. Add the line
127.0.0.1 localhost.
TTFB is something that happens behind the scenes. Your browser knows nothing about what happens behind the scenes.
You need to look into what queries are being run and how the website connects to the server.
This article might help understand TTFB, but otherwise you need to dig deeper into your application.
If you are using PHP, try using <?php flush(); ?> after </head> and before </body> or whatever section you want to output quickly (like the header or content). It will output the actually code without waiting for php to end. Don't use this function all the time, or the speed increase won't be noticable.
More info
I would suggest you read this article and focus more on how to optimize the overall response to the user request (either a page, a search result etc.)
A good argument for this is the example they give about using gzip to compress the page. Even though ttfb is faster when you do not compress, the overall experience of the user is worst because it takes longer to download content that is not zipped.

How to break (stop with an error) a stream-based download in browser

In our app (PHP) users can download files, which content is based on a data from db. This data can be very big (gigs), so we start streaming file right away while generating it (includes processing data). Thus, we are unable to set Content-Length, cause we don't know it yet. Of course, some exceptions are possible, first of all db (mongo to be exact) timeout or connection loss, or something in data-to-content processing, or smth else. Now if exception occurs, from the users point of view, downloading simply stops (as if it is finished). That is very unpleasant for us, as user may think that the download was a success, but in reality the file contains 10 000 lines instead of 20 000. So we'd like to break the stream in such a way, that browser would show 'error' on download. Currently, the streaming is handled by Silex as it is described in docs (http://silex.sensiolabs.org/doc/master/usage.html#streaming), but we can switch no native php instruments, if it matters. Googling haven't helped much yet, but probably i'm using wrong keywords. So the question is: is it possible, and if so, how? Language is a second case, the most important here is a general approach it terms of http, i think.

Creating a long TCPDF document without timeout (so long running php process)

I'm building a feature of a site that will generate a PDF (using TCPDF) into a booklet of 500+ pages. The layout is very simple but just due to the number of records I think it qualifies as a "long running php process". This will only need to be done a handful of times per year and if I could just have it run in the background and email the admin when done, that would be perfect. Considered Cron but it is a user-generated type of feature.
What can I do to keep my PDF rendering for as long as it takes? I am "good" with PHP but not so much with *nix. Even a tutorial link would be helpful.
Honestly you should avoid doing this entirely from a scalability perspective. I'd use a database table to "schedule" the job with the parameters, have a script that is continuously checking this table. Then use JavaScript to poll your application for the file to be "ready", when the file is ready then let the JavaScript pull down the file to the client.
It will be incredibly hard to maintain/troubleshoot this process while you're wondering why is my web server so slow all of a sudden. Apache doesn't make it easy to determine what process is eating up what CPU.
Also by using a database you can do things like limit the number of concurrent threads, or even provide faster rendering time by letting multiple processes render each PDF page and then re-assemble them together with yet another process... etc.
Good luck!
What you need is to change the allowed maximum execution time for PHP scripts. You can do that by several means from the script itself (you should prefer this if it would work) or by changing php.ini.
BEWARE - Changing execution time might seriously lower the performance of your server. A script is allowed to run only a certain time (30sec by default) before it is terminated by the parser. This helps prevent poorly written scripts from tying up the server. You should exactly know what you are doing before you do this.
You can find some more info about:
setting max-execution-time in php.ini here http://www.php.net/manual/en/info.configuration.php#ini.max-execution-time
limiting the maximum execution time by set_time_limit() here http://php.net/manual/en/function.set-time-limit.php
PS: This should work if you use PHP to generate the PDF. It will not work if you use some stuff outside of the script (called by exec(), system() and similar).
This question is already answered, but as a result of other questions / answers here, here is what I did and it worked great: (I did the same thing using pdftk, but on a smaller scale!)
I put the following code in an iframe:
set_time_limit(0); // ignore php timeout
//ignore_user_abort(true); // optional- keep on going even if user pulls the plug*
while(ob_get_level())ob_end_clean();// remove output buffers
ob_implicit_flush(true);
This avoided the page load timeout. You might want to put a countdown or progress bar on the parent page. I originally had the iframe issuing progress updates back to the parent, but browser updates broke that.

Any programming language which is faster to read a file than PHP's readfile(test.html); # 20.000,- hits / second?

What Server Side Programming Language, which without a single doubt is THE FASTEST to output a file content? (I am looking at ~20k file hits / second, so YES it does matter if certain X Language can output a file 1ms faster then PHP).
Because PHP was my language of choice, I have read the following links before I posted this question (but suddenly it raised a question, which server side programming language that is faster than PHP?)
http://raditha.com/wiki/Readfile_vs_include
When you state your answer, please also tell me the method that is used to read file. So dont just say FASTCGI/PHP, but also the method used to read the file, such as in this case readfile();
(I am looking at ~20k file hits / second , that is why i have totally abandoned the idea to use apache at all, and i really dont want my poor choice of Server Side Programming Language actually slow down the file output , so YES it does matter if certain X Language can output a file 1ms faster then PHP )
The thing is, are all of those 20k hits/second going to be requiring generation of the file? That seems unlikely. After the first generation of a static file, you can just configure nginx to cache it, so all of the requests after that will hit the cached version and never invoke your server-side language at all.
I also need a server side script to check if this file existed or not
That's the point of having a proxy cache like nginx there in the first place.
So are you sure you're really chasing the right problem here? The numbers you should be giving us are not how many hits you expect per second, but rather how many cache misses you expect per second. After all, if you're serving, say, 600 files that change once every minute, that's only going to be on the order of 10 cache misses per second, which is a much more manageable number for the actual server-side program to handle (and makes the choice of language less of an issue).
So, do tell us more: what's your cache hit/miss rate going to be like? A 10% cache miss rate is a lot different than a 1% cache miss rate, and so on.

PHP speed optimisation

Im wondering about speed optimization in PHP.
I have a series of files that are requested every page load. On average there are 20 files. Each file must be read an parsed if they have changed. And this is excluding that standard files required for a web page (HTML, CSS, images, etc).
EG -> client requests page -> server outputs html, css, images -> server outputs dynamic content (20+/- files combined and minified).
What would be the best way to serve these files as fast as possible?
Before wondering of speed optimization one should wonder of profiling, which consists of two parts:
Decide if we ever need any speed optimization.
If so - determine certain part of our application that become a "bottleneck" and demands optimization, unlike any other parts.
The last one could lay surprizingly far, far away from one, you dreamed about.
You've not provided enough information here to make a sensible answer. By far the biggest benefit is going to come from effective caching of content - but you probably need to look at more than just the headers your are sending - you probably need to start tweaking filenames (or the query part of the URL) to effectively allow the browser to use newer content in place of cached (and not expired) content.
I have a series of files that are requested every page load. On average there are 20 files.
Are these all php scripts? Are they all referenced by the HTML page?
Each file must be read an parsed if they have changed
Do you mean they must be read and parsed on the server? Why? Read this post for details on how to to identify and supply new versions of cacheable static content.
Have you looked at server-side caching? Its not that hard:
<?php
$cached=md5($_SERVER['REQUEST_URI']);
$lastmod=time() - #mtime($cache_dir . $cached)
if ((file_exists($cache_dir . $cached)) && (60 < $lastmod)) {
print file_get_contents($cache_dir . $cached);
exit;
} else {
ob_start();
... do slow expensive stuff
$output=ob_get_contents();
print $output;
file_put_contents($output, $cache_dir . $cached);
}
?>
Note that server-side caching is not nearly as effective as client-side caching.
Once you've got the caching optimal, then you need to look at serving your content from multiple hostnames - even if they all point to the same server - a browser limits the number of connections it makes to each hostname so requests are effectively queued up when they could be running in parallel.
For solving your problem further, we'd need to know a lot more about your application.
C.
If you are talking about PHP files, use eAccelerator. If you are talking about other files, check the filemtime to see if you have they have changed and if you have to parse them again.
Also, use yslow to determine why your website is slow.
There are several ways of keeping an opcode cache for PHP files, which automatically check for file-modifications. APC is one of them I very much like.

Categories