I've written a PHP script that iterates through a given folder, extracts all the images from there and displays them on an HTML page (as tags). The size of the page is about 14 KB, but it takes the page almost 15 seconds.
Here's the code:
function displayGallery( $gallery, $rel, $first_image ) {
$prefix = "http://www.example.com/";
$path_to_gallery = "gallery_albums/" . $gallery . "/";
$handler = opendir( $path_to_gallery ); //opens directory
while ( ( $file = readdir( $handler ) ) !== false ) {
if ( strcmp( $file, "." ) != 0 && strcmp( $file, ".." ) !=0 ) {
//check for "." and ".." files
if ( isImage( $prefix . $path_to_gallery . $file ) ) {
echo '';
}
}
}
closedir( $handler ); //closes directory
}
function isImage($image_file) {
if (getimagesize($image_file)!==false) {
return true;
} else {
return false;
}
}
I looked at other posts, but most of them deal with SQL queries, and that's not my case.
Any suggestions how to optimize this?
You can use a PHP profiler like http://xdebug.org/docs/profiler to find what part of the script is taking forever to run. It might be overkill for this issue, but long-term you may be glad you took the time now to set it up.
I suppose that's because you've added $prefix in the isImage invokation. That way this function actually downloads all your images directly from your webserver instead of looking them up locally.
you may use use getimagesize(), it issues E_NOTICE and returns FALSE when file is not a known image type.
An out of left field suggestion here. You don't state how you are clocking the execution time. If you are clocking it in the browser, as in taking 15 seconds to load the page from a link, the problem could have nothing at all to do with your script. I have seen people in the past create similar pages trying to use images as tags, and they take forever to load because even though they are displaying the image at thumbnail size or smaller, the image itself is still 800 x 600 or something. I know it sounds daft, but make sure that you are not just displaying large images in a small size. It would be perfectly reasonable for a script to require 15 seconds to load and display 76 800 x 600 jpegs.
My assumption is that isImage is the problem. I've never seen it before. Why not just check for particular file extensions? That's pretty quick.
Update: You might also try switching to use exif_imagetype() which is likely faster than getimagesize() Putting that check into the top function is also going to be faster. Neither of those functions was meant to be done over a web connection - avoid that altogether. Best to stick with the file extension.
Do you not already have access to the files directly? Every time you look something up over the web, it's going to take a while - you need to wait for the entire file to download. Look up the files directly on your system.
Use scandir to get all the filenames at once into an array and walk through them. That will likely speed things up as I assume there won't be a back and forth to get things individually.
Instead of doing strcmp for . and .. just do $file != '.' && $file != '..'
Also, the speed is going to depend on the number of files being returned, if there are a lot it's going to be slow. The OS can slow down with too many files in a directory as well. You're looping over all files and directories, not just images so that's the number that counts, not just the images.
getimagesize is the problem, it took 99.1% of the script time.
Version #1 - Orignal case
Version #2 - If you really need to use getimagesize with URL (http://). Then a faster alternative, found in http://www.php.net/manual/en/function.getimagesize.php#88793 . It reads only the X first bytes of the image. XHProf shows it is x10 faster. Another ideas also could be using curl multi for parallel download https://stackoverflow.com/search?q=getimagesize+alternative
Version #3 - I think this is the best suitable for your case is to open files as normal files systems without (http://). This is even x100 faster per Xhprof
Related
In my cache system, I want it where if a new page is requested, a check is made to see if a file exists and if it doesn't then a copy is stored on the server, If it does exist, then it must not be overwritten.
The problem I have is that I may be using functions designed to be slow.
This is part of my current implementation to save files:
if (!file_exists($filename)){$h=fopen($filename,"wb");if ($h){fwrite($h,$c);fclose($h);}}
This is part of my implementation to load files:
if (($m=#filemtime($file)) !== false){
if ($m >= filemtime("sitemodification.file")){
$outp=file_get_contents($file);
header("Content-length:".strlen($outp),true);echo $outp;flush();exit();
}
}
What I want to do is replace this with a better set of functions meant for performance and yet still achieve the same functionality. All caching files including sitemodification.file reside on a ramdisk. I added a flush before exit in hopes that content will be outputted faster.
I can't use direct memory addressing at this time because the file sizes to be stored are all different.
Is there a set of functions I can use that can execute the code I provided faster by at least a few milliseconds, especially the loading files code?
I'm trying to keep my time to first byte low.
First, prefer is_file to file_exists and use file_put_contents:
if ( !is_file($filename) ) {
file_put_contents($filename,$c);
}
Then, use the proper function for this kind of work, readfile:
if ( ($m = #filemtime($file)) !== false && $m >= filemtime('sitemodification.file')) {
header('Content-length:'.filesize($file));
readfile($file);
}
}
You should see a little improvement but keep in mind that file accesses are slow and you check three times for files access before sending any content.
I have this PHP code that resizes images perfectly: smart_resize_image(path/to/image.jpg);
Now I want to run it on all images inside a folder say parent and all of its subfolders child1, child2 etc.. every hour.
Is this something i could do on a share host? Yes, I can setup cron but I'm not sure how to run it on folders and its subfolders.
Thanks!
You're looking for a recursive algorithm, to drill down through folders:
resizing( '/folder' );
function resizing( $folder )
{
foreach( scandir( $folder ) as $file )
{
$filepath = $folder . '/' . $file;
if( preg_match( '/^\./', $file ) ) continue; // not . and ..
if( is_dir( $filepath ) ) resizing( $filepath ); // Send it off to drill - with this function!!
// It's time to resize
smart_resize_image( $filepath );
}
}
you may lookup for files using glob function and then loop the results through your resize method. Be aware as it must be done by cron as it probably will take more time that allowed in a webserver script context.
You have to build a recursive function.
It is quite simple.
Just start reading first directory and if you find any directories for each make a callback to itself to parse each folder.
It's not a very consuming function.
See this: http://www.codingforums.com/showthread.php?t=71882
Dump all file matching your desired extensions to an array then run your resize function for them.
To avoid any memory issues you can call the resize via CLI.
Just make a file that will take the file path as an argument (not argument not get) and call it with shell_exec().
This way it will run them nicely one by one in a separate process without affecting the main one.
Multi-processing in PHP .. hehe
I've seen many questions about how to efficiently use PHP to download files rather than allowing direct HTTP requests (to keep files secure, to track downloads, etc.).
The answer is almost always PHP readfile().
Downloading large files reliably in PHP
How to force download of big files without using too much memory?
Best way to transparently log downloads?
BUT, although it works great during testing with huge files, when it's on a live site with hundreds of users, downloads start to hang and PHP memory limits are exhausted.
So what is it about how readfile() works that causes memory to blow up so bad when traffic is high? I thought it's supposed to bypass heavy use of PHP memory by writing directly to the output buffer?
EDIT: (To clarify, I'm looking for a "why", not "what can I do". I think that Apache's mod_xsendfile is the best way to circumvent)
Description
int readfile ( string $filename [, bool $use_include_path = false [, resource $context ]] )
Reads a file and writes it to the output buffer*.
PHP has to read the file and it writes to the output buffer.
So, for 300Mb file, no matter what the implementation you wrote (by many small segments, or by 1 big chunk) PHP has to read through 300Mb of file eventually.
If multiple user has to download the file, there will be a problem.
(In one server, hosting providers will limit memory given to each hosting user. With such limited memory, using buffer is not going to be a good idea. )
I think using the direct link to download a file is a much better approach for big files.
If you have output buffering on than use ob_end_flush() right before the call to readfile()
header(...);
ob_end_flush();
#readfile($file);
As mentioned here: "Allowed memory .. exhausted" when using readfile, the following block of code at the top of the php file did the trick for me.
This will checks if php output buffering is active. If so it turns it off.
if (ob_get_level()) {
ob_end_clean();
}
You might want to turn off output buffering altogether for that particular location, using PHP's output_buffering configuration directive.
Apache example:
<Directory "/your/downloadable/files">
...
php_admin_value output_buffering "0"
...
</Directory>
"Off" as the value seems to work as well, while it really should throw an error. At least according to how other types are converted to booleans in PHP. *shrugs*
Came up with this idea in the past (as part of my library) to avoid high memory usage:
function suTunnelStream( $sUrl, $sMimeType, $sCharType = null )
{
$f = #fopen( $sUrl, 'rb' );
if( $f === false )
{ return false; }
$b = false;
$u = true;
while( $u !== false && !feof($f ))
{
$u = #fread( $f, 1024 );
if( $u !== false )
{
if( !$b )
{ $b = true;
suClearOutputBuffers();
suCachedHeader( 0, $sMimeType, $sCharType, null, !suIsValidString($sCharType)?('content-disposition: attachment; filename="'.suUniqueId($sUrl).'"'):null );
}
echo $u;
}
}
#fclose( $f );
return ( $b && $u !== false );
}
Maybe this can give you some inspiration.
Well, it is memory intensive function. I would pipe users to a static server that has specific rule set in place to control downloads instead of using readfile().
If that's not an option add more RAM to satisfy the load or introduce queuing system that gracefully controls server usage.
I want to delete cache files in a directory, the directory can contain up to 50.000 files. I currently I use this function.
// Deletes all files in $type directory that start with $start
function clearCache($type,$start)
{
$open = opendir($GLOBALS['DOC_ROOT']."/cache/".$type."/");
while( ($file = readdir($open)) !== false )
{
if ( strpos($file, $start)!==false )
{
unlink($GLOBALS['DOC_ROOT']."/cache/".$type."/".$file);
}
}
closedir($open);
}
This works fine and it is fast, but is there any faster way to do this? (scan_dir seems to be slow). I can move the cache to memory obviously.
Thanks,
hamlet
You may want to take a look into the glob function, as it may be even faster... it depends on the C library's glob command to do its work.
I haven't tested this, but I think this would work::
foreach (glob($GLOBALS['DOC_ROOT']."/cache/".$type."/".$start) as $file) {
unlink($GLOBALS['DOC_ROOT']."/cache/".$type."/".$file);
}
Edit: I'm not sure if $file would be just the filename or the entire path. glob's documentation implies just the filename.
Either glob as suggested before or, if you can be certain there won't be malicious input, by issueing directly to the system via exec(sprintf('rm %s/sess*', realpath($path)));, which should be fastest.
I have the php code below which help me get a photo's thumbnail image path in a script
It will take a supplied value like this from a mysql DB '2/34/12/thepicture.jpg'
It will then turn it into this '2/34/12/thepicture_thumb1.jpg'
I am sure there is a better performance way of doing this and I am open to any help please
Also on a page with 50 user's this would run 50 times to get 50 different photos
// the photo has it is pulled from the DB, it has the folders and filename as 1
$photo_url = '2/34/12/thepicture_thumb1.jpg';
//build the full photo filepath
$file = $site_path. 'images/userphoto/' . $photo_url;
// make sure file name is not empty and the file exist
if ($photo_url != '' && file_exists($file)) {
//get file info
$fil_ext1 = pathinfo($file);
$fil_ext = $fil_ext1['extension'];
$fil_explode = '.' . $fil_ext;
$arr = explode($fil_explode, $photo_url);
// add "_thumb" or else "_thumb1" inbetween
// the file name and the file extension 2/45/12/photo.jpg becomes 2/45/12/photo_thumb1.jpg
$pic1 = $arr[0] . "_thumb" . $fil_explode;
//make sure the thumbnail image exist
if (file_exists("images/userphoto/" . $pic1)) {
//retunr the thumbnail image url
$img_name = $pic1;
}
}
1 thing I am curious about is how it uses pathinfo() to get the files extension, since the extension will always be 3 digits, would other methods of getting this value better performance?
Is there a performance problem with this code, or are you just optimizing prematurely? Unless the performance is bad enough to be a usability issue and the profiler tells you that this code is to blame, there are much more pressing issues with this code.
To answer the question: "How can I improve this PHP code?" Add whitespace.
Performance-wise, if you're calling built-in PHP functions the performance is excellent because you're running compiled code behind the scenes.
Of course, calling all these functions when you don't need to isn't a good idea. In your case, the pathinfo function returns the various paths you need. You call the explode function on the original name when you can build the file name like this (note, the 'filename' is only available since PHP 5.2):
$fInfo = pathinfo($file);
$thumb_name = $fInfo['dirname'] . '/' . $fInfo['filename'] . '_thumb' . $fInfo['extension'];
If you don't have PHP 5.2, then the simplest way is to ignore that function and use strrpos and substr:
// gets the position of the last dot
$lastDot = strrpos($file, '.');
// first bit gets everything before the dot,
// second gets everything from the dot onwards
$thumbName = substr($file, 0, $lastDot) . '_thumb1' . substr($file, $lastDot);
The best optimization for this code is to increase it's readability:
// make sure file name is not empty and the file exist
if ( $photo_url != '' && file_exists($file) ) {
// Get information about the file path
$path_info = pathinfo($file);
// determine the thumbnail name
// add "_thumb" or else "_thumb1" inbetween
// the file name and the file extension 2/45/12/photo.jpg
// becomes 2/45/12/photo_thumb.jpg
$pic1 = "{$path_info['dirname']}/{$path_info['basename']}_thumb.{$fil_ext}";
// if this calculated thumbnail file exists, use it in place of
// the image name
if ( file_exists( "images/userphoto/" . $pic1 ) ) {
$img_name = $pic1;
}
}
I have broken up the components of the function using line breaks, and used the information returned from pathinfo() to simplify the process of determining the thumbnail name.
Updated to incorporate feedback from #DisgruntledGoat
Why are you even concerned about the performance of this function? Assuming you call it only once (say, when the "main" filename is generated) and store the result, its runtime should be essentially zero compared to DB and filesystem access. If you're calling it on every access to re-compute the thumbnail path, well, that's wasteful but it's still not going to be significantly impacting your runtime.
Now, if you want it to look nicer and be more maintainable, that's a worthwhile goal.
The easiest way to fix this is to thumbnail all user profile pics before hand and keep it around so you don't keep resizing.
$img_name = preg_replace('/^(.*)(\..*?)$/', '\1_thumb\2', $file);
Edit: bbcode disappeared with \.