PHP: scandir() is too slow - php

I have to make a function that lists all subfolders into a folder. I have a no-file filter, but the function uses scandir() for listing. That makes the application very slow. Is there an alternative of scandir(), even a not native php function?
Thanks in advance!

You can use readdir which may be faster, something like this:
function readDirectory($Directory,$Recursive = true)
{
if(is_dir($Directory) === false)
{
return false;
}
try
{
$Resource = opendir($Directory);
$Found = array();
while(false !== ($Item = readdir($Resource)))
{
if($Item == "." || $Item == "..")
{
continue;
}
if($Recursive === true && is_dir($Item))
{
$Found[] = readDirectory($Directory . $Item);
}else
{
$Found[] = $Directory . $Item;
}
}
}catch(Exception $e)
{
return false;
}
return $Found;
}
May require some tweeking but this is essentially what scandir does, and it should be faster, if not please write an update as i would like to see if i can make a faster solution.
Another issue is if your reading a very large directory your filling an array up within the internal memory and that may be where your memory is going.
You could try and create a function that reads in offsets so that you can return 50 files at a time!
reading chunks of files at a time would be just as simple to use, would be like so:
$offset = 0;
while(false !== ($Batch = ReadFilesByOffset("/tmp",$offset)))
{
//Use $batch here which contains 50 or less files!
//Increment the offset:
$offset += 50;
}

Don't write your own. PHP has a Recursive Directory Iterator built specifically for this:
http://php.net/manual/en/class.recursivedirectoryiterator.php
As a rule of thumb (aka not 100% of the time), since it's implemented in straight C, anything you build in PHP is going to be slower.

Related

what is the best way for search in json file in php?

hi i have many data files in json format in a folder.
now i want to search a filed in them .my search word maybe not exist in some of them and may be exist in one of them files.
i have read this function and if not exits in a file i call the function to read another file.
when i echo the result show me and works fine but return not working and no data returned.
function get_shenavari_in_files($search,$type)
{
static $counter =1 ;
$darsadi = 0;
$find = false;
$file_name = get_files_in_dir(); // make an array of file names
$file_number = count($file_name)-$counter ;
$file="files/" .$file_name[$file_number];
$file_data = read_json($file);
for($i = 0 ; $i<count($file_data) ; $i++)
{
if($file_data[$i][$type] == $search )
{
$darsadi = $file_data[$i]['darsadi'] ;
$find = true;
echo $darsadi ; //this works and show the data
return $darsadi; // this is my problem no data return.
break;
}
}
if($find == false)
{
$counter ++;
get_shenavari_in_files($search,$type);
}
}
var_dump(get_shenavari_in_files('Euro','symbol')); //return null
Once you recurse into get_shenavari_in_files, any found value is never returned back to the inital caller, i.e. instead of
if($find == false)
{
...
get_shenavari_in_files($search,$type);
}
you simply need to prepend the function call with a returnstatement
if($find == false)
{
...
return get_shenavari_in_files($search,$type);
}
Having said that, I would try a much simpler (and thereby less error-prone) approach, e.g.:
function get_shenavari_in_files($search, $type) {
$files = glob("files/*.json"); // Get names of all JSON files in a given path
$matches = [];
foreach ($files as $file) {
$data = json_decode(file_get_contents($file), true);
foreach ($data as $row) {
if (array_key_exists($type, $row) && $row[$type] == $search) {
$matches[$file] = $search;
}
}
}
return $matches;
}
This way, you would be able to eliminate the need for a recursive call to get_shenavari_in_files. Also, the function itself would become more performant because it doesn't have to scan the file system over and over again.

Need php to load files from folders and subfolders, and then sort the results added to a array

I need to search into folders and subfolders in search for files. In this search I need to know the files names and their path, because I have different folders and files inside of those.
I have this name 05-Navy, and inside this folder I have 3 files called 05_Navy_White_BaseColor.jpg, 05_Navy_White_Normal.jpg and 05_Navy_White_OcclusionRoughnessMetallic.jpg.
I need to only get one of them at a time because I need to add they separately to different lists.
Then I came up with the code below:
function getDirContents($dir, &$results = array()) {
$files = scandir($dir);
$findme = '_BaseColor';
$mypathCordas = null;
$findmeCordas = 'Cordas';
foreach ($files as $key => $value) {
$path = realpath($dir . DIRECTORY_SEPARATOR . $value);
$mypathCordas = $path;
$pos = strpos($mypathCordas, $findme);
$posCordas = strpos($mypathCordas, $findmeCordas);
if (!is_dir($path)) {
if($posCordas == true){
if($pos == true){
$results[] = $path;
}
}
}
else if ($value != "." && $value != ".." ) {
if($posCordas == true){
echo "</br>";
getDirContents($path, $results);
//$results[] = $path;
}
}
}
sort( $results );
for($i = 0; $i < count($results); $i++){
echo $results[$i];
echo "</br>";
}
return $results;
}
getDirContents('scenes/Texturas');
as output result I get this: Results1
Which is not ideal at all, the biggest problem is that the list inserts the same values every time it has do add new ones, and as you can see, it doesn't sort one bit, but it shuffles. I did other things, like I have tried to use DirectoryIterator which worked really well, but I couldn't sort at all...
The printing each time something new is on the list might be my for(), but I am relatively new to php, so I can't be sure.
Also, there's this thing where it gets all the path, and I already tried using other methods but got only errors, where I would only need the scenes/texturas/ instead of the absolute path....

Trouble reading huge CSV file with php fgetcsv - understanding memory consumption

Good morning,
I´m actually going through some hard lessons while trying to handle huge csv files up to 4GB.
Goal is to search some items in a csv file (Amazon datafeed) by a given browsenode and also by some given item id´s (ASIN). To get a mix of existing items (in my database) plus some additional new itmes since from time to time items disapear on the marketplace. I also filter the title of the items because there are many items using the same.
I have been reading here lots af tips and finally decided to use php´s fgetcsv() and thought this function will not exhaust memory, since it reads the file line by line.
But no matter what I try I´m always running out of memory.
I can not understand why my code uses so much memory.
I set the memory limit to 4096MB, time limit is 0. Server has 64 GB Ram and two SSD hardisks.
May someone please check out my piece of code and explain how it is possible that im running out of memory and more important how memory is used?
private function performSearchByASINs()
{
$found = 0;
$needed = 0;
$minimum = 84;
if(is_array($this->searchASINs) && !empty($this->searchASINs))
{
$needed = count($this->searchASINs);
}
if($this->searchFeed == NULL || $this->searchFeed == '')
{
return false;
}
$csv = fopen($this->searchFeed, 'r');
if($csv)
{
$l = 0;
$title_array = array();
while(($line = fgetcsv($csv, 0, ',', '"')) !== false)
{
$header = array();
if(trim($line[6]) != '')
{
if($l == 0)
{
$header = $line;
}
else
{
$asin = $line[0];
$title = $this->prepTitleDesc($line[6]);
if(is_array($this->searchASINs)
&& !empty($this->searchASINs)
&& in_array($asin, $this->searchASINs)) //search for existing items to get them updated
{
$add = true;
if(in_array($title, $title_array))
{
$add = false;
}
if($add === true)
{
$this->itemsByASIN[$asin] = new stdClass();
foreach($header as $k => $key)
{
if(isset($line[$k]))
{
$this->itemsByASIN[$asin]->$key = trim(strip_tags($line[$k], '<br><br/><ul><li>'));
}
}
$title_array[] = $title;
$found++;
}
}
if(($line[20] == $this->bnid || $line[21] == $this->bnid)
&& count($this->itemsByKey) < $minimum
&& !isset($this->itemsByASIN[$asin])) // searching for new items
{
$add = true;
if(in_array($title, $title_array))
{
$add = false;
}
if($add === true)
{
$this->itemsByKey[$asin] = new stdClass();
foreach($header as $k => $key)
{
if(isset($line[$k]))
{
$this->itemsByKey[$asin]->$key = trim(strip_tags($line[$k], '<br><br/><ul><li>'));
}
}
$title_array[] = $title;
$found++;
}
}
}
$l++;
if($l > 200000 || $found == $minimum)
{
break;
}
}
}
fclose($csv);
}
}
I know my answer is a bit late but I had a similar problem with fgets() and things based on fgets() like SplFileObject->current() function. In my case it was on a windows system when trying to read a +800MB file. I think fgets() doesn't free the memory of the previous line in a loop. So every line that was read stayed in memory and let to a fatal out of memory error. I fixed it using fread($lineLength) instead but it is a bit trickier since you must supply the length.
It is very hard to manage large data using array without encountering timeout issue. Instead why not parse this datafeed to a database table and do the heavy lifting from there.
Have you tried this? SplFileObject::fgetcsv
<?php
$file = new SplFileObject("data.csv");
while (!$file->eof()) {
//your code here
}
?>
You are running out of memory because you use variables, and you are never doing an unset(); and use too many nested foreach. You could shrink that code in more functions
A solution should be, use a real Database instead.

How can I use PHP to check if a directory is empty?

I am using the following script to read a directory. If there is no file in the directory it should say empty. The problem is, it just keeps saying the directory is empty even though there ARE files inside and vice versa.
<?php
$pid = $_GET["prodref"];
$dir = '/assets/'.$pid.'/v';
$q = (count(glob("$dir/*")) === 0) ? 'Empty' : 'Not empty';
if ($q=="Empty")
echo "the folder is empty";
else
echo "the folder is NOT empty";
?>
It seems that you need scandir instead of glob, as glob can't see unix hidden files.
<?php
$pid = basename($_GET["prodref"]); //let's sanitize it a bit
$dir = "/assets/$pid/v";
if (is_dir_empty($dir)) {
echo "the folder is empty";
}else{
echo "the folder is NOT empty";
}
function is_dir_empty($dir) {
if (!is_readable($dir)) return null;
return (count(scandir($dir)) == 2);
}
?>
Note that this code is not the summit of efficiency, as it's unnecessary to read all the files only to tell if directory is empty. So, the better version would be
function dir_is_empty($dir) {
$handle = opendir($dir);
while (false !== ($entry = readdir($handle))) {
if ($entry != "." && $entry != "..") {
closedir($handle);
return false;
}
}
closedir($handle);
return true;
}
By the way, do not use words to substitute boolean values. The very purpose of the latter is to tell you if something empty or not. An
a === b
expression already returns Empty or Non Empty in terms of programming language, false or true respectively - so, you can use the very result in control structures like IF() without any intermediate values
I think using the FilesystemIterator should be the fastest and easiest way:
// PHP 5 >= 5.3.0
$iterator = new \FilesystemIterator($dir);
$isDirEmpty = !$iterator->valid();
Or using class member access on instantiation:
// PHP 5 >= 5.4.0
$isDirEmpty = !(new \FilesystemIterator($dir))->valid();
This works because a new FilesystemIterator will initially point to the first file in the folder - if there are no files in the folder, valid() will return false. (see documentation here.)
As pointed out by abdulmanov.ilmir, optionally check if the directory exists before using the FileSystemIterator because otherwise it'll throw an UnexpectedValueException.
I found a quick solution
<?php
$dir = 'directory'; // dir path assign here
echo (count(glob("$dir/*")) === 0) ? 'Empty' : 'Not empty';
?>
use
if ($q == "Empty")
instead of
if ($q="Empty")
For a object oriented approach using the RecursiveDirectoryIterator from the Standard PHP Library (SPL).
<?php
namespace My\Folder;
use RecursiveDirectoryIterator;
class FileHelper
{
/**
* #param string $dir
* #return bool
*/
public static function isEmpty($dir)
{
$di = new RecursiveDirectoryIterator($dir, FilesystemIterator::SKIP_DOTS);
return iterator_count($di) === 0;
}
}
No need to make an instance of your FileHelper whenever you need it, you can access this static method wherever you need it like this:
FileHelper::isEmpty($dir);
The FileHelper class can be extended with other useful methods for copying, deleting, renaming, etc.
There is no need to check the validity of the directory inside the method because if it is invalid the constructor of the RecursiveDirectoryIterator will throw an UnexpectedValueException which that covers that part sufficiently.
This is a very old thread, but I thought I'd give my ten cents. The other solutions didn't work for me.
Here is my solution:
function is_dir_empty($dir) {
foreach (new DirectoryIterator($dir) as $fileInfo) {
if($fileInfo->isDot()) continue;
return false;
}
return true;
}
Short and sweet. Works like a charm.
I used:
if(is_readable($dir)&&count(scandir($dir))==2) ... //then the dir is empty
Try this:
<?php
$dirPath = "Add your path here";
$destdir = $dirPath;
$handle = opendir($destdir);
$c = 0;
while ($file = readdir($handle)&& $c<3) {
$c++;
}
if ($c>2) {
print "Not empty";
} else {
print "Empty";
}
?>
Probably because of assignment operator in if statement.
Change:
if ($q="Empty")
To:
if ($q=="Empty")
# Your Common Sense
I think your performant example could be more performant using strict comparison:
function is_dir_empty($dir) {
if (!is_readable($dir)) return null;
$handle = opendir($dir);
while (false !== ($entry = readdir($handle))) {
if ($entry !== '.' && $entry !== '..') { // <-- better use strict comparison here
closedir($handle); // <-- always clean up! Close the directory stream
return false;
}
}
closedir($handle); // <-- always clean up! Close the directory stream
return true;
}
Function count usage maybe slow on big array. isset is ever faster
This will work properly on PHP >= 5.4.0 (see Changelog here)
function dir_is_empty($path){ //$path is realpath or relative path
$d = scandir($path, SCANDIR_SORT_NONE ); // get dir, without sorting improve performace (see Comment below).
if ($d){
// avoid "count($d)", much faster on big array.
// Index 2 means that there is a third element after ".." and "."
return !isset($d[2]);
}
return false; // or throw an error
}
Otherwise, using #Your Common Sense solution it's better for avoid load file list on RAM
Thanks and vote up to #soger too, to improve this answer using SCANDIR_SORT_NONE option.
Just correct your code like this:
<?php
$pid = $_GET["prodref"];
$dir = '/assets/'.$pid.'/v';
$q = count(glob("$dir/*")) == 0;
if ($q) {
echo "the folder is empty";
} else {
echo "the folder is NOT empty";
}
?>
Even an empty directory contains 2 files . and .., one is a link to the current directory and the second to the parent. Thus, you can use code like this:
$files = scandir("path to directory/");
if(count($files) == 2) {
//do something if empty
}
I use this method in my Wordpress CSV 2 POST plugin.
public function does_folder_contain_file_type( $path, $extension ){
$all_files = new RecursiveIteratorIterator( new RecursiveDirectoryIterator( $path ) );
$html_files = new RegexIterator( $all_files, '/\.'.$extension.'/' );
foreach( $html_files as $file) {
return true;// a file with $extension was found
}
return false;// no files with our extension found
}
It works by specific extension but is easily changed to suit your needs by removing "new RegexIterator(" line. Count $all_files.
public function does_folder_contain_file_type( $path, $extension ){
$all_files = new RecursiveIteratorIterator( new RecursiveDirectoryIterator( $path ) );
return count( $all_files );
}
I had a similar problem recently, although, the highest up-voted answer did not really work for me, hence, I had to come up with a similar solution. and again this may also not be the most efficient way to go about the problem,
I created a function like so
function is_empty_dir($dir)
{
if (is_dir($dir))
{
$objects = scandir($dir);
foreach ($objects as $object)
{
if ($object != "." && $object != "..")
{
if (filetype($dir."/".$object) == "dir")
{
return false;
} else {
return false;
}
}
}
reset($objects);
return true;
}
and used it to check for empty dricetory like so
if(is_empty_dir($path)){
rmdir($path);
}
You can use this:
function isEmptyDir($dir)
{
return (($files = #scandir($dir)) && count($files) <= 2);
}
The first question is when is a directory empty? In a directory there are 2 files the '.' and '..'.
Next to that on a Mac there maybe the file '.DS_Store'. This file is created when some kind of content is added to the directory. If these 3 files are in the directory you may say the directory is empty.
So to test if a directory is empty (without testing if $dir is a directory):
function isDirEmpty( $dir ) {
$count = 0;
foreach (new DirectoryIterator( $dir ) as $fileInfo) {
if ( $fileInfo->isDot() || $fileInfo->getBasename() == '.DS_Store' ) {
continue;
}
$count++;
}
return ($count === 0);
}
#Your Common Sense,#Enyby
Some improvement of your code:
function dir_is_empty($dir) {
$handle = opendir($dir);
$result = true;
while (false !== ($entry = readdir($handle))) {
if ($entry != "." && $entry != "..") {
$result = false;
break 2;
}
}
closedir($handle);
return $result;
}
I use a variable for storing the result and set it to true.
If the directory is empty the only files that are returned are . and .. (on a linux server, you could extend the condition for mac if you need to) and therefore the condition is true.
Then the value of result is set to false and break 2 exit the if and the while loop so the next statement executed is closedir.
Therefore the while loop will only have 3 circles before it will end regardless if the directory is empty or not.
$is_folder_empty = function(string $folder) : bool {
if (!is_dir($folder))
return TRUE;
// This wont work on non linux OS.
return is_null(shell_exec("ls {$folder}"));
};
$is_folder_empty2 = function(string $folder) : bool {
if (!is_dir($folder))
return TRUE;
// Empty folders have two files in it. Single dot and
// double dot.
return count(scandir($folder)) === 2;
};
var_dump($is_folder_empty('/tmp/demo'));
var_dump($is_folder_empty2('/tmp/demo'));

Enumerate files in parent directory

I have the following function that enumerates files and directories in a given folder. It works fine for doing subfolders, but for some reason, it doesn't want to work on a parent directory. Any ideas why? I imagine it might be something with PHP's settings or something, but I don't know where to begin. If it is, I'm out of luck since this is will be running on a cheap shared hosting setup.
Here's how you use the function. The first parameter is the path to enumerate, and the second parameter is a list of filters to be ignored. I've tried passing the full path as listed below. I've tried passing just .., ./.. and realpath('..'). Nothing seems to work. I know the function isn't silently failing somehow. If I manually add a directory to the dirs array, I get a value returned.
$projFolder = '/hsphere/local/home/customerid/sitename/foldertoindex';
$items = enumerateDirs($projFolder, array(0 => "Admin", 1 => "inc"));
Here's the function itself
function enumerateDirs($directory, $filterList)
{
$handle = opendir($directory);
while (false !== ($item = readdir($handle)))
{
if ($item != "." && $item != ".." && $item != "inc" && array_search($item, $filterList) === false)
{
$path = "{$directory->path}/{$item}";
if (is_dir($item))
{
$tmp['name'] = $item;
$dirs[$item] = $tmp;
unset($tmp);
}
elseif (is_file($item))
{
$tmp['name'] = $item;
$files[] = $tmp;
unset($tmp);
}
}
}
ksort($dirs, SORT_STRING);
sort($dirs);
ksort($files, SORT_STRING);
sort($files);
return array("dirs" => $dirs, "files" => $files);
}
You are mixing up opendir and dir. You also need to pass the full path (including the directory component) to is_dir and is_file. (I assume that's what you meant to do with $path.) Otherwise, the functions will look for the corresponding file system objects in the script file's directory.
Try this for a quick fix:
<?php
function enumerateDirs($directory, $filterList)
{
$handle = dir($directory);
while (false !== ($item = $handle->read()))
{
if ($item != "." && $item != ".." && $item != "inc" && array_search($item, $filterList) === false)
{
$path = "{$handle->path}/{$item}";
$tmp['name'] = $item;
if (is_dir($path))
{
$dirs[] = $tmp;
}
elseif (is_file($path))
{
$files[] = $tmp;
}
unset($tmp);
}
}
$handle->close();
/* Anonymous functions will need PHP 5.3+. If your version is older, take a
* look at create_function
*/
$sortFunc = function ($a, $b) { return strcmp($a['name'], $b['name']); };
usort($dirs, $sortFunc);
usort($files, $sortFunc);
return array("dirs" => $dirs, "files" => $files);
}
$ret = enumerateDirs('../', array());
var_dump($ret);
Note: $files or $dirs might be not set after the while loop. (There might be no files or directories.) In that case, usort will throw an error. You should check for that in some way.

Categories