PHP, search and delete files from directory - performance - php

I want to delete cache files in a directory, the directory can contain up to 50.000 files. I currently I use this function.
// Deletes all files in $type directory that start with $start
function clearCache($type,$start)
{
$open = opendir($GLOBALS['DOC_ROOT']."/cache/".$type."/");
while( ($file = readdir($open)) !== false )
{
if ( strpos($file, $start)!==false )
{
unlink($GLOBALS['DOC_ROOT']."/cache/".$type."/".$file);
}
}
closedir($open);
}
This works fine and it is fast, but is there any faster way to do this? (scan_dir seems to be slow). I can move the cache to memory obviously.
Thanks,
hamlet

You may want to take a look into the glob function, as it may be even faster... it depends on the C library's glob command to do its work.
I haven't tested this, but I think this would work::
foreach (glob($GLOBALS['DOC_ROOT']."/cache/".$type."/".$start) as $file) {
unlink($GLOBALS['DOC_ROOT']."/cache/".$type."/".$file);
}
Edit: I'm not sure if $file would be just the filename or the entire path. glob's documentation implies just the filename.

Either glob as suggested before or, if you can be certain there won't be malicious input, by issueing directly to the system via exec(sprintf('rm %s/sess*', realpath($path)));, which should be fastest.

Related

PHP scandir - multiple directories

I am creating a WordPress plugin which allows a user to apply sorting rules to a particular template (page, archive, single etc). I am populating list of pages using PHP scandir like so:
$files = scandir(get_template_directory());
The problem is that I keep single.php templates in a '/single' subfolder so these templates are not being called by the above function.
How can I use multiple directories within the scandir function (perhaps an array?) or will I need a different solution?
So basically I am trying to:
$files = scandir( get_template_directory() AND get_template_directory().'/single' );
My current solution (not very elegant as it requires 2 for each loops):
function query_caller_is_template_file_get_template_files()
{
$template_files_list = array();
$files = scandir(get_template_directory());
$singlefiles = scandir(get_template_directory().'/single');
foreach($files as $file)
{
if(strpos($file, '.php') === FALSE)
continue;
$template_files_list[] = $file;
}
foreach($singlefiles as $singlefile)
{
if(strpos($file, '.php') === FALSE)
continue;
$template_files_list[] = $singlefile;
}
return $template_files_list;
}
First, there's not really anything wrong about what you're doing. You have two directories, so you do the same thing twice. Of course you could make it look a little cleaner and avoid the blatant copy paste:
$files = array_merge(
scandir(get_template_directory()),
scandir(get_template_directory().'/single')
);
Now just iterate over the single array.
In your case, getting the file list recursively doesn't make sense, as there may be subdirectories you don't want to check. If you did want to recurse into subdirectories, opendir() and readdir() along with is_dir() would allow you to build a recursive scan function.
You could event tighten up the '.php' filter part a bit with array_filter().
$files = array_filter($files, function($file){
return strpos($file, '.php');
});
Here I'm assuming that should a file start with .php you're not really interested in it making your list (as strpos() will return the falsy value of 0 in that case). I'm also assuming that you're sure there will be no files that have .php in the middle somewhere.
Like, template.php.bak, because you'll be using version control for something like that.
If however there is the chance of that, you may want to tighten up your check a bit to ensure the .php is at the end of the filename.

unlink files with a case-insensitive (glob-like) pattern

I have two folders, in one i have the videos and in the second one the configuration files for each video(3 files per video). Now if i want to delete a video i have to delete files by hand.
I found this :
<?php
$filename = 'name.of.the.video.xml';
$term = str_replace(".xml","", $filename);
$dirPath = ("D:/test/");
foreach (glob($dirPath.$term.".*") as $removeFile)
{
unlink ($removeFile);
}
?>
A echo will return:
D:/test/name.of.the.video.jpg
D:/test/name.of.the.video.srt
D:/test/name.of.the.video.xml
Is ok and it help me a lot, but i have a problem here.
Not all files are the same ex:
Name.of.The.video.jpg
Name.Of.The.Video.xml
If i echo the folder looking for that string and is not identic with the $filename will return empty.
So, my question is, how can i make that search Case insensitive?
Thank you.
You are making use of the glob function which is case sensitive. You are using the wrong function therefore to get the list of files.
You should therefore first normalize the filenames in the directory so they all share the same case (e.g. all lowercase). Or you need to use another method to get the directory listing case-insensitive. I suggest the first, however if that is not an option, why don't you glob for all files first and then filter the list of files using preg_grep which allows to specify patterns that are case-insensitive?
Which leads me to the point that it's more practicable to use DirectoryIterator with a RegexIterator:
$filename = 'name.of.the.video.xml';
$term = basename($filename, ".xml");
$files = new DirectoryIterator($dirPath);
$filesFiltered = new RegexIterator($files, sprintf('(^%s\\..*$)i', preg_quote($term)));
foreach($filesFiltered as $file)
{
printf("delete: %s\n", $file);
unlink($file->getPathname());
}
A good example of the flexibility of the Iterators code are your changed requirements: Do that for two directories at once. You just create two DirectoryIterators and append the one to the other with an AppendIterator. Job done. The rest of the code stays the same:
...
$files = new AppendIterator();
$files->append(new DirectoryIterator($dirPath1));
$files->append(new DirectoryIterator($dirPath2));
...
Voilá. Sounds good? glob is okay for some quick jobs that need just it. For everything else with directory operations start to consider the SPL. It has much more power.
Is strcasecmp() a valid function for this? Its a case insensitive str comparison function?
Surely if you know the file name and you can echo it out, you can pass this to unlink()?

PHP Script takes almost 15 seconds to load

I've written a PHP script that iterates through a given folder, extracts all the images from there and displays them on an HTML page (as tags). The size of the page is about 14 KB, but it takes the page almost 15 seconds.
Here's the code:
function displayGallery( $gallery, $rel, $first_image ) {
$prefix = "http://www.example.com/";
$path_to_gallery = "gallery_albums/" . $gallery . "/";
$handler = opendir( $path_to_gallery ); //opens directory
while ( ( $file = readdir( $handler ) ) !== false ) {
if ( strcmp( $file, "." ) != 0 && strcmp( $file, ".." ) !=0 ) {
//check for "." and ".." files
if ( isImage( $prefix . $path_to_gallery . $file ) ) {
echo '';
}
}
}
closedir( $handler ); //closes directory
}
function isImage($image_file) {
if (getimagesize($image_file)!==false) {
return true;
} else {
return false;
}
}
I looked at other posts, but most of them deal with SQL queries, and that's not my case.
Any suggestions how to optimize this?
You can use a PHP profiler like http://xdebug.org/docs/profiler to find what part of the script is taking forever to run. It might be overkill for this issue, but long-term you may be glad you took the time now to set it up.
I suppose that's because you've added $prefix in the isImage invokation. That way this function actually downloads all your images directly from your webserver instead of looking them up locally.
you may use use getimagesize(), it issues E_NOTICE and returns FALSE when file is not a known image type.
An out of left field suggestion here. You don't state how you are clocking the execution time. If you are clocking it in the browser, as in taking 15 seconds to load the page from a link, the problem could have nothing at all to do with your script. I have seen people in the past create similar pages trying to use images as tags, and they take forever to load because even though they are displaying the image at thumbnail size or smaller, the image itself is still 800 x 600 or something. I know it sounds daft, but make sure that you are not just displaying large images in a small size. It would be perfectly reasonable for a script to require 15 seconds to load and display 76 800 x 600 jpegs.
My assumption is that isImage is the problem. I've never seen it before. Why not just check for particular file extensions? That's pretty quick.
Update: You might also try switching to use exif_imagetype() which is likely faster than getimagesize() Putting that check into the top function is also going to be faster. Neither of those functions was meant to be done over a web connection - avoid that altogether. Best to stick with the file extension.
Do you not already have access to the files directly? Every time you look something up over the web, it's going to take a while - you need to wait for the entire file to download. Look up the files directly on your system.
Use scandir to get all the filenames at once into an array and walk through them. That will likely speed things up as I assume there won't be a back and forth to get things individually.
Instead of doing strcmp for . and .. just do $file != '.' && $file != '..'
Also, the speed is going to depend on the number of files being returned, if there are a lot it's going to be slow. The OS can slow down with too many files in a directory as well. You're looping over all files and directories, not just images so that's the number that counts, not just the images.
getimagesize is the problem, it took 99.1% of the script time.
Version #1 - Orignal case
Version #2 - If you really need to use getimagesize with URL (http://). Then a faster alternative, found in http://www.php.net/manual/en/function.getimagesize.php#88793 . It reads only the X first bytes of the image. XHProf shows it is x10 faster. Another ideas also could be using curl multi for parallel download https://stackoverflow.com/search?q=getimagesize+alternative
Version #3 - I think this is the best suitable for your case is to open files as normal files systems without (http://). This is even x100 faster per Xhprof

How to copy all contents in a folder to a dir in php

I am trying to copy all the files from a directory to another directory in php.
$copy_all_files_from = "layouts/";
$copy_to = "Website3/";
Can someone help me do this please.
Something like this(untested):
<?php
$handle = opendir($copy_all_files_from);
while (false !== ($file = readdir($handle))) {
copy( $file, $copy_to);
}
edit:
To use Amadan's method, you should be able to use this php function:
shell_exec();
Not sure since I never need to use server commands
Easiest:
`cp -r $copy_all_files_from $copy_to`
Unless you're on Windows. Without shelling, it's a bit more complex: read directory, iterate on files (if it's a directory, recurse), open each, iterate while not end of file, read block and write it.
UPDATE: doh, PHP has copy...

All available images under a domain

I'd like to make a gallery of all images i have under my domain (my internet root folder). All these images are in different folders. What's the best way to 'browse' through all the folders and return the images?
Use Google Image Search with site: www.mydomainwithimages.com as the search term and this will show you all your indexed images. This should be everything in your domain as long as your robots.txt file doesn't exclude the Google crawler.
Take a look at opendir you would want to write a function that gets called in a recursive loop, the function could loop through the files in the specific directory, check the file extension and return the files as an array which you would merge with a global array.
Depends on hosting system, you could use command line with exec or passthru
find /path/to/website/root/ -type f -name '*.jpg'
If you can't do such a thing, as fire said, opendir is the way to go.
I would give PHP's DirectoryIterator a spin.
This is untested pseudo-code, but it should work a little bit like this:
function scanDirectoryForImages($dirPath)
{
$images = array();
$dirIter = new DirectoryIterator($dirPath);
foreach($dirIter as $fileInfo)
{
if($fileInfo->isDot())
continue;
// If it's a directory, scan it recursively
elseif($fileInfo->isDir())
{
$images = array_merge(
$images, scanDirectoryForImages($fileInfo->getPath())
);
}
elseif($fileInfo->isFile())
{
/* This works only for JPEGs, oviously, but feel free to add other
extensions */
if(strpos($fileInfo->getFilename(), '.jpg') !== FALSE)
{
$images[] = $fileInfo->getPathname();
}
}
}
return $images;
}
Please don't sue me if this doesn't work, it's really kinda from the top of my hat, but using such a function would be the most elegant way to solve your problem, imho.
// edit: Yeah, that's basically the same as fire pointed out.

Categories