I have a folder with huge amount pictures(10000 files at least) and I need to get names of all this files using PHP.
The problem is when I use scandir() I got error about memory limit.
Also, I tried to use code like this:
$files = [];
$dir = opendir($this->path);
$i = 0;
while(($file = readdir($dir)) !== false) {
$files[] = $file;
$i++;
if ($i == 100)
break;
}
This code works fine, but it's not what I need. When I try to get all files, script still crashes.
Besides I thought about saving state of pointer in $dir somehow for using it later through AJAX requests and getting all files, but I can't find any solution for that purpose.
Is there any method of set limit and offset for reading files?
I mean like pagination.
You can use RecursiveDirectoryIterator with a Generator if memory is a huge issue.
function recursiveDirectoryIterator($path) {
foreach (new RecursiveIteratorIterator(new RecursiveDirectoryIterator($path)) as $file) {
if (!$file->isDir()) {
yield $file->getFilename() . $file->getExtension();
}
}
}
$start = microtime(true);
$instance = recursiveDirectoryIterator('../vendor');
$total_files = 0;
foreach($instance as $value) {
// echo $value
$total_files++;
}
echo "Mem peak usage: " . (memory_get_peak_usage(true)/1024/1024)." MiB";
echo "Total number of files: " . $total_files;
echo "Completed in: ", microtime(true) - $start, " seconds";
Here's what I got on my not-so-great laptop.
I have a Unix background, so you could do (assuming you are running your PHP on Linux or Unix):
system call to: /bin/ls -c1 > files.list
you can make system call as complicated as you want to sort, parse, edit, ...
read files.list and display that
you could use file_get_contents() to read the file.
Related
I have a number of different hosting accounts set up for clients and need to calculate the amount of storage space being used on each account, which would update regularly.
I have a database set up to record each clients storage usage.
I attempted this first using a PHP file on each account, run by a Cron Job. If run manually by myself, it would output the correct filesize and update the correct size to the database, although when run from the Cron Job, it would output 0.
I then attempted to run this file from a Cron Job from the main account but figured this wouldn't actually work as my hosting would block files from another server and I would end up with the same result as before.
I am now playing around with FTP access to each account from a Cron Job from the main account which looks something like below, the only problem is I don't know how to calculate directory size rather than single file sizes using FTP access, and don't know how to reiterate this way? Hoping somebody might be able to help here before I end up going around in circles?
I will also add the previous first attempt too.
$ftp_conn = ftp_connect($ftp_host, 21, 420) or die("Could not connect to server");
$ftp_login = ftp_login($ftp_conn, $ftp_username, 'mypassword');
$total_size = 0;
$contents = ftp_nlist($ftp_conn, ".");
// output $contents
foreach($contents as $folder){
while($search == true){
if($folder == '..' || $folder == '.'){
} else {
$file = $folder;
$res = ftp_size($ftp_conn, $file);
if ($res != -1) {
$total_size = $total_size + $res;
} else {
$total_size = $total_size;
}
}
}
}
ftp_close($ftp_conn);
This doesn't work as it doesn't calculate folder sizes and I don't know how to open the reiterate using this method?
This second script did work but would only work if opened manually, and return 0 if run by the cron job.
class Directory_Calculator {
function calculate_whole_directory($directory)
{
if ($handle = opendir($directory))
{
$size = 0;
$folders = 0;
$files = 0;
while (false !== ($file = readdir($handle)))
{
if ($file != "." && $file != "..")
{
if(is_dir($directory.$file))
{
$array = $this->calculate_whole_directory($directory.$file.'/');
$size += $array['size'];
$files += $array['files'];
$folders += $array['folders'];
}
else
{
$size += filesize($directory.$file);
$files++;
}
}
}
closedir($handle);
}
$folders++;
return array('size' => $size, 'files' => $files, 'folders' => $folders);
}
}
/* Path to Directory - IMPORTANT: with '/' at the end */
$directory = '../public_html/';
// return an array with: size, total files & folders
$array = $directory_size->size($directory);
$size_of_site = $array['size'];
echo $size_of_site;
Please bare in mind that I am currently testing and none of the MySQLi or PHP scripts are secure yet.
If your server supports MLSD command and you have PHP 7.2 or newer, you can use ftp_mlsd function:
function calculate_whole_directory($ftp_conn, $directory)
{
$files = ftp_mlsd($ftp_conn, $directory) or die("Cannot list $directory");
$result = 0;
foreach ($files as $file)
{
if (($file["type"] == "cdir") || ($file["type"] == "pdir"))
{
$size = 0;
}
else if ($file["type"] == "dir")
{
$size = calculate_whole_directory($ftp_conn, $directory."/".$file["name"]);
}
else
{
$size = intval($file["size"]);
}
$result += $size;
}
return $result;
}
If you do not have PHP 7.2, you can try to implement the MLSD command on your own. For a start, see user comment of the ftp_rawlist command:
https://www.php.net/manual/en/function.ftp-rawlist.php#101071
If you cannot use MLSD, you will particularly have problems telling if an entry is a file or folder. While you can use the ftp_size trick, as you do, calling ftp_size for each entry can take ages.
But if you need to work against one specific FTP server only, you can use ftp_rawlist to retrieve a file listing in a platform-specific format and parse that.
The following code assumes a common *nix format.
function calculate_whole_directory($ftp_conn, $directory)
{
$lines = ftp_rawlist($ftp_conn, $directory) or die("Cannot list $directory");
$result = 0;
foreach ($lines as $line)
{
$tokens = preg_split("/\s+/", $line, 9);
$name = $tokens[8];
if ($tokens[0][0] === 'd')
{
$size = calculate_whole_directory($ftp_conn, "$directory/$name");
}
else
{
$size = intval($tokens[4]);
}
$result += $size;
}
return $result;
}
Based on PHP FTP recursive directory listing.
Regarding cron: I'd guess that the cron does not start your script with a correct working directory, so you calculate a size of a non-existing directory.
Use an absolute path here:
$directory = '../public_html/';
Though you better add some error checking so that you can see yourself what goes wrong.
So I have this app that processes CSV files. I have a line of code to load the file.
$myFile = "data/FrontlineSMS_Message_Export_20120721.csv"; //The name of the CSV file
$fh = fopen($myFile, 'r'); //Open the file
I would like to find a way in which I could look in the data directory and get the newest file (they all have date tags so they would be in order inside of data) and set the name equal to $myFile.
I really couldn't find and understand the documentation of php directories so any helpful resources would be appreciated as well. Thank you.
Here's an attempt using scandir, assuming the only files in the directory have timestamped filenames:
$files = scandir('data', SCANDIR_SORT_DESCENDING);
$newest_file = $files[0];
We first list all files in the directory in descending order, then, whichever one is first in that list has the "greatest" filename — and therefore the greatest timestamp value — and is therefore the newest.
Note that scandir was added in PHP 5, but its documentation page shows how to implement that behavior in PHP 4.
For a search with wildcard you can use:
<?php
$path = "/var/www/html/*";
$latest_ctime = 0;
$latest_filename = '';
$files = glob($path);
foreach($files as $file)
{
if (is_file($file) && filectime($file) > $latest_ctime)
{
$latest_ctime = filectime($file);
$latest_filename = $file;
}
}
return $latest_filename;
?>
My solution, improved solution from Max Hofmann:
$ret = [];
$dir = Yii::getAlias("#app") . "/web/uploads/problem-letters/{$this->id}"; // set directory in question
if(is_dir($dir)) {
$ret = array_diff(scandir($dir), array(".", "..")); // get all files in dir as array and remove . and .. from it
}
usort($ret, function ($a, $b) use ($dir) {
if(filectime($dir . "/" . $a) < filectime($dir . "/" . $b)) {
return -1;
} else if(filectime($dir . "/" . $a) == filectime($dir . "/" . $b)) {
return 0;
} else {
return 1;
}
}); // sort array by file creation time, older first
echo $ret[count($ret)-1]; // filename of last created file
Here's an example where I felt more confident in using my own validator rather than simply relying on a timestamp with scandir().
In this context, I want to check if my server has a more recent file version than the client's version. So I compare version numbers from the file names.
$clientAppVersion = "1.0.5";
$latestVersionFileName = "";
$directory = "../../download/updates/darwin/"
$arrayOfFiles = scandir($directory);
foreach ($arrayOfFiles as $file) {
if (is_file($directory . $file)) {
// Your custom code here... For example:
$serverFileVersion = getVersionNumberFromFileName($file);
if (isVersionNumberGreater($serverFileVersion, $clientAppVersion)) {
$latestVersionFileName = $file;
}
}
}
// function declarations in my php file (used in the forEach loop)
function getVersionNumberFromFileName($fileName) {
// extract the version number with regEx replacement
return preg_replace("/Finance D - Tenue de livres-darwin-(x64|arm64)-|\.zip/", "", $fileName);
}
function removeAllNonDigits($semanticVersionString) {
// use regex replacement to keep only numeric values in the semantic version string
return preg_replace("/\D+/", "", $semanticVersionString);
}
function isVersionNumberGreater($serverFileVersion, $clientFileVersion): bool {
// receives two semantic versions (1.0.4) and compares their numeric value (104)
// true when server version is greater than client version (105 > 104)
return removeAllNonDigits($serverFileVersion) > removeAllNonDigits($clientFileVersion);
}
Using this manual comparison instead of a timestamp I can achieve a more surgical result. I hope this can give you some useful ideas if you have a similar requirement.
(PS: I took time to post because I was not satisfied with the answers I found relating to the specific requirement I had. Please be kind I'm also not very used to StackOverflow - Thanks!)
I am trying to get a list of dir from a ftp using php the following code is outputting the following information.
httpdocs/user_images
httpdocs/user_images/inc
httpdocs/user_images/inc/smarty
httpdocs/user_images/header
httpdocs/user_images/header/logo80.jpg
httpdocs/user_images/header/logo80.jpg
httpdocs/user_images/header/logo80.jpg
httpdocs/user_images/header/logo80.jpg
It keeps on repeating the follow httpdocs/user_images/header/logo80.jpg over 60 times.
Here is my code
function ListOfFolder($folder_listarry,$conn_id){
for ($i=0; $i<sizeof($folder_listarry); $i++) {
echo $folder_listarry[$i]."<br>";
$contents = ftp_nlist($conn_id, $folder_listarry[$i]);
ListOfFolder($contents,$conn_id);
}
}
$contents = ftp_nlist($conn_id, "httpdocs/");
ListOfFolder($contents,$conn_id);
I an not sure it is a guess only. you can try by modifying your function with is_dir
function ListOfFolder($folder_listarry,$conn_id){
for ($i=0; $i<sizeof($folder_listarry); $i++) {
echo $folder_listarry[$i]."<br>";
if (is_dir($folder_listarry[$i]) === false)
{
continue;
}
$contents = ftp_nlist($conn_id, $folder_listarry[$i]);
ListOfFolder($contents,$conn_id);
}
}
ftp_nlist will also return
.
and
..
for same and parent dir.
You need to exlude them.
I've been trying to replicate Gnu Find ("find .") in PHP, but it seems impossible to get even close to its speed. The PHP implementations use at least twice the time of Find. Are there faster ways of doing this with PHP?
EDIT: I added a code example using the SPL implementation -- its performance is equal to the iterative approach
EDIT2: When calling find from PHP it was actually slower than the native PHP implementation. I guess I should be satisfied with what I've got :)
// measured to 317% of gnu find's speed when run directly from a shell
function list_recursive($dir) {
if ($dh = opendir($dir)) {
while (false !== ($entry = readdir($dh))) {
if ($entry == '.' || $entry == '..') continue;
$path = "$dir/$entry";
echo "$path\n";
if (is_dir($path)) list_recursive($path);
}
closedir($d);
}
}
// measured to 315% of gnu find's speed when run directly from a shell
function list_iterative($from) {
$dirs = array($from);
while (NULL !== ($dir = array_pop($dirs))) {
if ($dh = opendir($dir)) {
while (false !== ($entry = readdir($dh))) {
if ($entry == '.' || $entry == '..') continue;
$path = "$dir/$entry";
echo "$path\n";
if (is_dir($path)) $dirs[] = $path;
}
closedir($dh);
}
}
}
// measured to 315% of gnu find's speed when run directly from a shell
function list_recursivedirectoryiterator($path) {
$it = new RecursiveDirectoryIterator($path);
foreach ($it as $file) {
if ($file->isDot()) continue;
echo $file->getPathname();
}
}
// measured to 390% of gnu find's speed when run directly from a shell
function list_gnufind($dir) {
$dir = escapeshellcmd($dir);
$h = popen("/usr/bin/find $dir", "r");
while ('' != ($s = fread($h, 2048))) {
echo $s;
}
pclose($h);
}
I'm not sure if the performance is better, but you could use a recursive directory iterator to make your code simpler... See RecursiveDirectoryIterator and 'SplFileInfo`.
$it = new RecursiveDirectoryIterator($from);
foreach ($it as $file)
{
if ($file->isDot())
continue;
echo $file->getPathname();
}
Before you start changing anything, profile your code.
Use something like Xdebug (plus kcachegrind for a pretty graph) to find out where the slow parts are. If you start changing things blindly, you won't get anywhere.
My only other advice is to use the SPL directory iterators as posted already. Letting the internal C code do the work is almost always faster.
PHP just cannot perform as fast as C, plain and simple.
Why would you expect the interpreted PHP code to be as fast as the compiled C version of find? Being only twice as slow is actually pretty good.
About the only advice I would add is to do a ob_start() at the beginning and ob_get_contents(), ob_end_clean() at the end. That might speed things up.
You're keeping N directory streams open where N is the depth of the directory tree. Instead, try reading an entire directory's worth of entries at once, and then iterate over the entries. At the very least you'll maximize use of the desk I/O caches.
You might want to seriously consider just using GNU find. If it's available, and safe mode isn't turned on, you'll probably like the results just fine:
function list_recursive($dir) {
$dir=escapeshellcmd($dir);
$h = popen("/usr/bin/find $dir -type f", "r")
while ($s = fgets($h,1024)) {
echo $s;
}
pclose($h);
}
However there might to be some directory that's so big, you're not going to want to bother with this either. Consider amortizing the slowness in other ways. Your second try can be checkpointed (for example) by simply saving the directory stack in the session. If you're giving the user a list of files, simply collect a pageful then save the rest of the state in the session for page 2.
Try using scandir() to read a whole directory at once, as Jason Cohen has suggested. I've based the following code on code from the php manual comments for scandir()
function scan( $dir ){
$dirs = array_diff( scandir( $dir ), Array( ".", ".." ));
$dir_array = Array();
foreach( $dirs as $d )
$dir_array[ $d ] = is_dir($dir."/".$d) ? scan( $dir."/".$d) : print $dir."/".$d."\n";
}
I'm writing a photo gallery script in PHP and have a single directory where the user will store their pictures. I'm attempting to set up page caching and have the cache refresh only if the contents of the directory has changed. I thought I could do this by caching the last modified time of the directory using the filemtime() function and compare it to the current modified time of the directory. However, as I've come to realize, the directory modified time does not change as files are added or removed from that directory (at least on Windows, not sure about Linux machines yet).
So my questions is, what is the simplest way to check if the contents of a directory have been modified?
As already mentioned by others, a better way to solve this would be to trigger a function when particular events happen, that changes the folder.
However, if your server is a unix, you can use inotifywait to watch the directory, and then invoke a PHP script.
Here's a simple example:
#!/bin/sh
inotifywait --recursive --monitor --quiet --event modify,create,delete,move --format '%f' /path/to/directory/to/watch |
while read FILE ; do
php /path/to/trigger.php $FILE
done
See also: http://linux.die.net/man/1/inotifywait
What about touching the directory after a user has submitted his image?
Changelog says: Requires php 5.3 for windows to work, but I think it should work on all other environments
with inotifywait inside php
$watchedDir = 'watch';
$in = popen("inotifywait --monitor --quiet --format '%e %f' --event create,moved_to '$watchedDir'", 'r');
if ($in === false)
throw new Exception ('fail start notify');
while (($line = fgets($in)) !== false)
{
list($event, $file) = explode(' ', rtrim($line, PHP_EOL), 2);
echo "$event $file\n";
}
Uh. I'd simply store the md5 of a directory listing. If the contents change, the md5(directory-listing) will change. You might get the very occasional md5 clash, but I think that chance is tiny enough..
Alternatively, you could store a little file in that directory that contains the "last modified" date. But I'd go with md5.
PS. on second thought, seeing as how you're looking at performance (caching) requesting and hashing the directory listing might not be entirely optimal..
IMO edubem's answer is the way to go, however you can do something like this:
if (sha1(serialize(Map('/path/to/directory/', true))) != /* previous stored hash */)
{
// directory contents has changed
}
Or a more weak / faster version:
if (Size('/path/to/directory/', true) != /* previous stored size */)
{
// directory contents has changed
}
Here are the functions used:
function Map($path, $recursive = false)
{
$result = array();
if (is_dir($path) === true)
{
$path = Path($path);
$files = array_diff(scandir($path), array('.', '..'));
foreach ($files as $file)
{
if (is_dir($path . $file) === true)
{
$result[$file] = ($recursive === true) ? Map($path . $file, $recursive) : $this->Size($path . $file, true);
}
else if (is_file($path . $file) === true)
{
$result[$file] = Size($path . $file);
}
}
}
else if (is_file($path) === true)
{
$result[basename($path)] = Size($path);
}
return $result;
}
function Size($path, $recursive = true)
{
$result = 0;
if (is_dir($path) === true)
{
$path = Path($path);
$files = array_diff(scandir($path), array('.', '..'));
foreach ($files as $file)
{
if (is_dir($path . $file) === true)
{
$result += ($recursive === true) ? Size($path . $file, $recursive) : 0;
}
else if (is_file() === true)
{
$result += sprintf('%u', filesize($path . $file));
}
}
}
else if (is_file($path) === true)
{
$result += sprintf('%u', filesize($path));
}
return $result;
}
function Path($path)
{
if (file_exists($path) === true)
{
$path = rtrim(str_replace('\\', '/', realpath($path)), '/');
if (is_dir($path) === true)
{
$path .= '/';
}
return $path;
}
return false;
}
Here's what you may try. Store all pictures in a single directory (or in /username subdirectories inside it to speed things up and to lessen the stress on the FS) and set up Apache (or whaterver you're using) to serve them as static content with "expires-on" set to 100 years in the future. File names should contain some unique prefix or suffix (timestamp, SHA1 hash of file content, etc), so whenever uses changes the file its name gets changed and Apache will serve a new version, which will get cached along the way.
You're thinking the wrong way.
You should execute your directory indexer script as soon as someone's uploaded a new file and it's moved to the target location.
Try deleting the cached version when a user uploads a file to his directory.
When someone tries to view the gallery, look if there's a cached version first. If there's a cached version, load it, otherwise, generate the page, cache it, done.
I was looking for something similar and I just found this:
http://www.franzone.com/2008/06/05/php-script-to-monitor-ftp-directory-changes/
For me looks like a great solution since I'll have a lot of control (I'll be doing an AJAX call to see if anything changed).
Hope that this helps.
Here is a code sample, that would return 0 if the directory was changed.
I use it in backups.
The changed status is determined by presence of files and their filesizes.
You could easily change this, to compare file contents by replacing
$longString .= filesize($file);
with
$longString .= crc32(file_get_contents($file));
but it will affect execution speed.
#!/usr/bin/php
<?php
$dirName = $argv[1];
$basePath = '/var/www/vhosts/majestichorseporn.com/web/';
$dataFile = './backup_dir_if_changed.dat';
# startup checks
if (!is_writable($dataFile))
die($dataFile . ' is not writable!');
if (!is_dir($basePath . $dirName))
die($basePath . $dirName . ' is not a directory');
$dataFileContent = file_get_contents($dataFile);
$data = #unserialize($dataFileContent);
if ($data === false)
$data = array();
# find all files ang concatenate their sizes to calculate crc32
$files = glob($basePath . $dirName . '/*', GLOB_BRACE);
$longString = '';
foreach ($files as $file) {
$longString .= filesize($file);
}
$longStringHash = crc32($longString);
# do changed check
if (isset ($data[$dirName]) && $data[$dirName] == $longStringHash)
die('Directory did not change.');
# save hash do DB
$data[$dirName] = $longStringHash;
file_put_contents($dataFile, serialize($data));
die('0');