I have a function that detects all files started by a string and it returns an array filled with the correspondent files, but it is starting to get slow, because I have arround 20000 files in a particular directory.
I need to optimize this function, but I just can't see how. This is the function:
function DetectPrefix ($filePath, $prefix)
{
$dh = opendir($filePath);
while (false !== ($filename = readdir($dh)))
{
$posIni = strpos( $filename, $prefix);
if ($posIni===0):
$files[] = $filename;
endif;
}
if (count($files)>0){
return $files;
} else {
return null;
}
}
What more can I do?
Thanks
http://php.net/glob
$files = glob('/file/path/prefix*');
Wikipedia breaks uploads up by the first couple letters of their filenames, so excelfile.xls would go in a directory like /uploads/e/x while textfile.txt would go in /uploads/t/e.
Not only does this reduce the number of files glob (or any other approach) has to sort through, but it avoids the maximum files in a directory issue others have mentioned.
You could use scandir() to list the files in the directory, instead of iterating through them one-by-one using readdir(). scandir() returns an array of the files.
However, it'd be better if you could change your file system organization - do you really need to store 20000+ files in a single directory?
As the other answers mention, I'd look at glob(), scandir(), and/or the DirectoryIterator class, there is no need to recreate the wheel.
However watch out! check your operating system, but there may be a limit on the maximum number of files in a single directory. If this is the case and you just keep adding files in the same directory you will have some downtime, and some problems, when you reach the limit. This error will probably appear as a permissions or write failure and not an obvious "you can't write more files in a single directory" message.
I'm not sure but probably DirectoryIterator is a bit faster. Also add caching so that list gets generated only when files are added or deleted.
You just need to compare the first length of prefix characters. So try this:
function DetectPrefix($filePath, $prefix) {
$dh = opendir($filePath);
$len = strlen($prefix);
$files = array();
while (false !== ($filename = readdir($dh))) {
if (substr($filename, 0, $len) === $prefix) {
$files[] = $filename;
}
}
if (count($files)) {
return $files;
} else {
return null;
}
}
Related
i'm looking for away to see if the query 'id' has a folder with the same id as name in the file system, i did it but it will slow down the drive in the future with lots of files
$query = Model::all();
if(Input::get('field') == 'true'){
$filenames = scandir('img/folders');
$query->whereIn('id', $filenames);
}
as you can see this will scan and get names of all folders inside the 'folders' directory and create an array with it, now my app is going to have hundreds of thousands of folders in the future and i would like to resolve it before it happens, thanks for further help
ps: other propositions to do it differently are welcome
Do you have good reason to believe that scandir on a directory with a large number of folders will actually slow you down?
You can do your query like this:
if(Input::has('field')){
$filenames = scandir('img/folders');
$query = Model::whereIn('id', $filenames)->get();
}
Edit 1
You may find these links useful:
PHP: scandir() is too slow
Get the Files inside a directory
Edit 2
There are some really good suggestions in the links which you should be able to use for guidance to make your own implementation. As I see it, based on the links included from the first edit I made, your options are use DirectoryIterator, readdir or chunking with scandir.
This is a very basic way of doing it but I guess you could do something with readdir like this:
$ids = Model::lists('id');
$matches = [];
if($handle = opendir('path/to/folders'))
{
while (($entry = readdir($handle)) !== false)
{
if(count($ids) === 0)
{
break;
}
if ($entry != "." && $entry != "..")
{
foreach ($ids as $key => $value)
{
if($value === $entry)
{
$matches[] = $entry;
unset($ids[$key]);
}
}
}
}
closedir($handle);
}
return $matches;
I have a function to check if a file exists via jQuery which makes a call to a PHP script which I'll use when changing certain images at the click of a button on my index page.
jQuery function:
function fileExists(path){
$.getJSON("/ajax/fileExists.php",{ path: path },
function (data){
return data.path;
});
}
fileExists.php:
$path=$_SERVER['DOCUMENT_ROOT'].'/packs'.$_GET['path'];
if(file_exists($path)){
echo json_encode(TRUE);
}else{
echo json_encode(FALSE);
}
I'm worried about people using this script to list the contents of my server or files which I may not want them to know about so I've used DOCUMENT_ROOT and /packs to try to limit calls to that directory but I think people can simply use ../ within the supplied path to check alternatives.
What is the best way to make this safe, ideally limit it to /packs, and are there any other concerns I should worry about?
Edit: an example call in javascript/jQuery:
if( fileExists('/index.php') ){
alert('Exists');
}else{
alert('Doesn\'t exist');
}
This is how I've handled it in the past:
$path = realpath($_SERVER['DOCUMENT_ROOT'].'/packs'.$_GET['path']);
if (strpos($path, $_SERVER['DOCUMENT_ROOT']) !== 0) {
//It's looking to a path that is outside the document root
}
You can remove any path-transversing from your filename:
$path_arr = explode("/", $_GET['path']);
$path = $path_arr[count($path_arr - 1)];
Such a practice is moderately secure and fast (O(1) complexity) but is not really the best as you have to watch out for encoding, character replacement and all like stuff.
But the overall best practice (though less faster depending on your directory size, let's say O(n) complexity) would be to use readdir() to get a list of all the files in your /packs directory then see if the supplied filename is present:
$handle = opendir($path=$_SERVER['DOCUMENT_ROOT'].'/packs');
while (false !== ($entry = readdir($handle))) {
if ($entry === $_GET['path']) {
echo json_encode(TRUE);
return;
}
}
echo json_encode(FALSE);
I have a directory with 1.3 Million files that I need to move into a database. I just need to grab a single filename from the directory WITHOUT scanning the whole directory. It does not matter which file I grab as I will delete it when I am done with it and then move on to the next. Is this possible? All the examples I can find seem to scan the whole directory listing into an array. I only need to grab one at a time for processing... not 1.3 Million every time.
This should do it:
<?php
$h = opendir('./'); //Open the current directory
while (false !== ($entry = readdir($h))) {
if($entry != '.' && $entry != '..') { //Skips over . and ..
echo $entry; //Do whatever you need to do with the file
break; //Exit the loop so no more files are read
}
}
?>
readdir
Returns the name of the next entry in the directory. The entries are returned in the order in which they are stored by the filesystem.
Just obtain the directories iterator and look for the first entry that is a file:
foreach(new DirectoryIterator('.') as $file)
{
if ($file->isFile()) {
echo $file, "\n";
break;
}
}
This also ensures that your code is executed on some other file-system behaviour than the one you expect.
See DirectoryIterator and SplFileInfo.
readdir will do the trick. Check the exampl on that page but instead of doing the readdir call in the loop, just do it once. You'll get the first file in the directory.
Note: you might get ".", "..", and other similar responses depending on the server, so you might want to at least loop until you get a valid file.
do you want return first directory OR first file? both? use this:
create function "pickfirst" with 2 argument (address and mode dir or file?)
function pickfirst($address,$file) { // $file=false >> pick first dir , $file=true >> pick first file
$h = opendir($address);
while (false !== ($entry = readdir($h))) {
if($entry != '.' && $entry != '..' && ( ($file==false && !is_file($address.$entry)) || ($file==true && is_file($address.$entry)) ) )
{ return $entry; break; }
} // end while
} // end function
if you want pick first directory in your address set $file to false and if you want pick first file in your address set $file to true.
good luck :)
I'm building a file browser, and I need to know if a directory has children (but not how many or what type).
What's the most efficient way to find if a directory has children? glob()? scandir() it? Check its tax records?
Edit
It seems I was misunderstood, although I thought I was pretty clear. I'll try to restate my question.
What is the most efficient way to know if a directory is not empty? I'm basically looking for a boolean answer - NOT EMPTY or EMPTY.
I don't need to know:
how many files are in the directory
what the files are
when they were modified
etc.
I do need to know:
does the directory have any files in it at all
efficiently.
I think this is very efficient:
function dir_contains_children($dir) {
$result = false;
if($dh = opendir($dir)) {
while(!$result && ($file = readdir($dh)) !== false) {
$result = $file !== "." && $file !== "..";
}
closedir($dh);
}
return $result;
}
It stops the listing of the directories contents as soon as there is a file or directory found (not including the . and ..).
You could use 'find' to list all empty directories in one step:
exec("find '$dir' -maxdepth 1 -empty -type d",$out,$ret);
print_r($out);
Its not "pure" php but its simple and fast.
This should do, easy, quick and effective.
<?php
function dir_is_empty($dir) {
$dirItems = count(scandir($dir));
if($dirItems > 2) return false;
else return true;
}
?>
Unfortunately, each solution so far has lacked the brevity and elegance necessary to shine above the rest.
So, I was forced to homebrew a solution myself, which I'll be implementing until something better pops up:
if(count(glob($dir."/*")) {
echo "NOT EMPTY";
}
Still not sure of the efficiency of this compared to other methods, which was the original question.
I wanted to expand vstm's answer - Check only for child directories (and not files):
/**
* Check if directory contains child directories.
*/
function dir_contains_children_dirs($dir) {
$result = false;
if($dh = opendir($dir)) {
while (!$result && ($file = readdir($dh))) {
$result = $file !== "." && $file !== ".." && is_dir($dir.'/'.$file);
}
closedir($dh);
}
return $result;
}
I've been trying to replicate Gnu Find ("find .") in PHP, but it seems impossible to get even close to its speed. The PHP implementations use at least twice the time of Find. Are there faster ways of doing this with PHP?
EDIT: I added a code example using the SPL implementation -- its performance is equal to the iterative approach
EDIT2: When calling find from PHP it was actually slower than the native PHP implementation. I guess I should be satisfied with what I've got :)
// measured to 317% of gnu find's speed when run directly from a shell
function list_recursive($dir) {
if ($dh = opendir($dir)) {
while (false !== ($entry = readdir($dh))) {
if ($entry == '.' || $entry == '..') continue;
$path = "$dir/$entry";
echo "$path\n";
if (is_dir($path)) list_recursive($path);
}
closedir($d);
}
}
// measured to 315% of gnu find's speed when run directly from a shell
function list_iterative($from) {
$dirs = array($from);
while (NULL !== ($dir = array_pop($dirs))) {
if ($dh = opendir($dir)) {
while (false !== ($entry = readdir($dh))) {
if ($entry == '.' || $entry == '..') continue;
$path = "$dir/$entry";
echo "$path\n";
if (is_dir($path)) $dirs[] = $path;
}
closedir($dh);
}
}
}
// measured to 315% of gnu find's speed when run directly from a shell
function list_recursivedirectoryiterator($path) {
$it = new RecursiveDirectoryIterator($path);
foreach ($it as $file) {
if ($file->isDot()) continue;
echo $file->getPathname();
}
}
// measured to 390% of gnu find's speed when run directly from a shell
function list_gnufind($dir) {
$dir = escapeshellcmd($dir);
$h = popen("/usr/bin/find $dir", "r");
while ('' != ($s = fread($h, 2048))) {
echo $s;
}
pclose($h);
}
I'm not sure if the performance is better, but you could use a recursive directory iterator to make your code simpler... See RecursiveDirectoryIterator and 'SplFileInfo`.
$it = new RecursiveDirectoryIterator($from);
foreach ($it as $file)
{
if ($file->isDot())
continue;
echo $file->getPathname();
}
Before you start changing anything, profile your code.
Use something like Xdebug (plus kcachegrind for a pretty graph) to find out where the slow parts are. If you start changing things blindly, you won't get anywhere.
My only other advice is to use the SPL directory iterators as posted already. Letting the internal C code do the work is almost always faster.
PHP just cannot perform as fast as C, plain and simple.
Why would you expect the interpreted PHP code to be as fast as the compiled C version of find? Being only twice as slow is actually pretty good.
About the only advice I would add is to do a ob_start() at the beginning and ob_get_contents(), ob_end_clean() at the end. That might speed things up.
You're keeping N directory streams open where N is the depth of the directory tree. Instead, try reading an entire directory's worth of entries at once, and then iterate over the entries. At the very least you'll maximize use of the desk I/O caches.
You might want to seriously consider just using GNU find. If it's available, and safe mode isn't turned on, you'll probably like the results just fine:
function list_recursive($dir) {
$dir=escapeshellcmd($dir);
$h = popen("/usr/bin/find $dir -type f", "r")
while ($s = fgets($h,1024)) {
echo $s;
}
pclose($h);
}
However there might to be some directory that's so big, you're not going to want to bother with this either. Consider amortizing the slowness in other ways. Your second try can be checkpointed (for example) by simply saving the directory stack in the session. If you're giving the user a list of files, simply collect a pageful then save the rest of the state in the session for page 2.
Try using scandir() to read a whole directory at once, as Jason Cohen has suggested. I've based the following code on code from the php manual comments for scandir()
function scan( $dir ){
$dirs = array_diff( scandir( $dir ), Array( ".", ".." ));
$dir_array = Array();
foreach( $dirs as $d )
$dir_array[ $d ] = is_dir($dir."/".$d) ? scan( $dir."/".$d) : print $dir."/".$d."\n";
}