Searching for specific file extensions in a folder/directory (PHP) - php

I'm trying to design a program in PHP that would allow me to find files with specific file extensions (example .jpg, .shp etc) in a known directory which consists of multiple folders.
Sample code, documentation or information about what methods I will be required to use will be much appreciated.

glob is pretty easy:
<?php
foreach (glob("*.txt") as $filename) {
echo "$filename size " . filesize($filename) . "\n";
}
?>
There are a few suggestions for recursive descent at the readdir page.

Take a look at PHP's SPL DirectoryIterator.

I believe PHP's glob() function is exactly what you are looking for:
http://php.net/manual/en/function.glob.php

Use readdir to get a list of files, and fnmatch to work out if it matches your required filename pattern. Do all this inside a function, and call your function when you find directories. Ask another question if you get stuck implementing this (or comment if you really have no idea where to start).

glob will get you all the files in a given directory, but not the sub directories. If you need that too, you will need to: 10. get recursive, 20. goto 10.
Here's the pseudo pseudocode:
function getFiles($pattern, $dir) {
$files = glob($dir . $pattern);
$folders = glob($dir, GLOB_ONLYDIR);
foreach ($folders as $folder) {
$files = $files + getFiles($folder);
}
return $files;
}
The above will obviously need to be tweaked to get it working, but hopefully you get the idea (remember not to follow directory links to ".." or "." or you'll be in infinite loop town).

Related

Remove files which have not filename duplicates

For each document (.pdf, .txt, .docx ecc) I have also a corresponding json file with the same filename.
Example:
file1.json,
file1.pdf,
file2.json,
file2.txt,
filex.json,
filex.pdf,
But I got also some json files which are not accompanied with the corresponding document.
I want to delete all json files which have no corresponding document. Im really stucked because I cant find a proper solution to my problem.
I know how to scandir() get the filename, extensions from pathinfo() ecc. but the issue is that for each json file I find in directory I have to perform another foreach on that directory excluding all json files and see If the same filename exists or not so than I can decide to delete it. (This is how I think to solve it).
The problem here is with performance since there are millions of files and for each json I have to run a foreach on millions of files.
Can anyone guide me to a better solution?
Thank you!
Edit: Since no one will help without first posting a piece of code (and this approach in stackoverflow is definitively wrong) here is how I'm trying.:
<?php
$dir = "2000/";
$files = scandir($dir);
foreach ($files as $file) {
$fullName = pathinfo($file);
if ($fullName['extension'] === 'json') {
if (!in_array($fullName['filename'].'.pdf', $files)){
unlink($dir.$file);
}
}
}
Now as you can see I can only search only for one type of document (.pdf in this case). I want to search for every extension excluding .json and also I don't want that for each json file to run a foreach/in_array() but achieving all this in just one foreach.
Maybe you should consider it in another way? I mean, iterate through all files, and try to find corresponding files to json, if not found remove it.
It would look like follows:
$dir = "2000/";
foreach (glob($dir . "*.json") as $file) {
$file = new \SplFileInfo($dir . $file);
if (count(glob($dir . $file->getBasename('.' . $file->getExtension()) . ".*")) === 1) {
unlink($dir . $file->getFilename());
}
}
Manual
PHP: SplFileInfo
PHP: glob

easiest way to get only folder names with .zip in php

I have seen a few example on the web but they seem pretty messy. I am looking for a nice and clean way to say, get me only the files/folders that have .zip on them. What I have so far is:
foreach(scandir(__DIR__) as $files) {
var_dump($files);
}
What I wonder is is if I need pre match or if the ZipArchive class has any functions that state "return only files with .zip
This should work for you, you can use glob():
<?php
foreach (glob("*.zip") as $filename) {
echo $filename . "<br />";
}
?>
possible Output:
test - Kopie.zip
test.zip
test2.zip
For more information about glob() see the manual: http://php.net/manual/en/function.glob.php

Check if a file matches a wildcarded spec, in a given directory, with PHP

I have a directory that files are uploaded to, and I want to be able to display a download link if the file exists. The file however has to match a particular pattern as this is the identifier of who uploaded it.
The pattern starts with /ClientFiles/ then it needs to find all files that starts with the user ID. So for example: /ClientFiles/123-UploadData.xls
So it would need to look in the ClientFiles directory and find all files that start with '123-' no matter what comes after.
Cheers
To look for files by a certain pattern you can use glob, then use is_readable to check if you can read the files.
$files = array();
foreach(glob($dirname . DIRECTORY_SEPARATOR . $clientId . '-*' as $file) {
if(is_readable($file) {
$files[] = $file;
}
}
Simply use the file_exists() function
php has a function file_exists. Use that to make some logic about if you show a link or not.

unlink files with a case-insensitive (glob-like) pattern

I have two folders, in one i have the videos and in the second one the configuration files for each video(3 files per video). Now if i want to delete a video i have to delete files by hand.
I found this :
<?php
$filename = 'name.of.the.video.xml';
$term = str_replace(".xml","", $filename);
$dirPath = ("D:/test/");
foreach (glob($dirPath.$term.".*") as $removeFile)
{
unlink ($removeFile);
}
?>
A echo will return:
D:/test/name.of.the.video.jpg
D:/test/name.of.the.video.srt
D:/test/name.of.the.video.xml
Is ok and it help me a lot, but i have a problem here.
Not all files are the same ex:
Name.of.The.video.jpg
Name.Of.The.Video.xml
If i echo the folder looking for that string and is not identic with the $filename will return empty.
So, my question is, how can i make that search Case insensitive?
Thank you.
You are making use of the glob function which is case sensitive. You are using the wrong function therefore to get the list of files.
You should therefore first normalize the filenames in the directory so they all share the same case (e.g. all lowercase). Or you need to use another method to get the directory listing case-insensitive. I suggest the first, however if that is not an option, why don't you glob for all files first and then filter the list of files using preg_grep which allows to specify patterns that are case-insensitive?
Which leads me to the point that it's more practicable to use DirectoryIterator with a RegexIterator:
$filename = 'name.of.the.video.xml';
$term = basename($filename, ".xml");
$files = new DirectoryIterator($dirPath);
$filesFiltered = new RegexIterator($files, sprintf('(^%s\\..*$)i', preg_quote($term)));
foreach($filesFiltered as $file)
{
printf("delete: %s\n", $file);
unlink($file->getPathname());
}
A good example of the flexibility of the Iterators code are your changed requirements: Do that for two directories at once. You just create two DirectoryIterators and append the one to the other with an AppendIterator. Job done. The rest of the code stays the same:
...
$files = new AppendIterator();
$files->append(new DirectoryIterator($dirPath1));
$files->append(new DirectoryIterator($dirPath2));
...
Voilá. Sounds good? glob is okay for some quick jobs that need just it. For everything else with directory operations start to consider the SPL. It has much more power.
Is strcasecmp() a valid function for this? Its a case insensitive str comparison function?
Surely if you know the file name and you can echo it out, you can pass this to unlink()?

PHP glob() doesnt find .htaccess

Simple question - How to list .htaccess files using glob()?
glob() does list "hidden" files (files starting with . including the directories . and ..), but only if you explicitly ask it for:
glob(".*");
Filtering the returned glob() array for .htaccess entries with preg_grep:
$files = glob(".*") AND $files = preg_grep('/\.htaccess$/', $files);
The alternative to glob of course would be just using scandir() and a filter (fnmatch or regex):
preg_grep('/^\.\w+/', scandir("."))
in case any body come to here,
since the SPL implemented in PHP, and offers some cool iterators, you may make use of the to list your hidden files such as .htaccess files or it's alternative hidden linux files.
using DirectoryIterator to list all of directory contents and excluding the . and .. as follows:
$path = 'path/to/dir';
$files = new DirectoryIterator($path);
foreach ($files as $file) {
// excluding the . and ..
if ($file->isDot() === false) {
// make some stuff
}
}

Categories