Obtaining deepest file path with PHP - php

Does anyone have a brilliant idea how to obtain the elements with the deepest path from an array with file paths? If this sounds weird, imagine the following array:
/a/b
/a
/1/2/3/4
/1/2
/1/2/3/5
/a/b/c/d/e
What I want to obtain is:
/1/2/3/4
/1/2/3/5
/a/b/c/d/e
Wondering what the fastest method is without having to iterate over the whole array over and over again. Language is PHP (5.2).

Following your clarifications, here's a function that would do it. It keeps an array of the "deepest paths" found and compares each path against it. The best-case scenario is O(n) (if all paths are subpaths of the largest one) and worst-case scenario is O(n2) (if all paths are completely distinct).
Note that continue 2 means "continue on the outer loop".
<?php
function getDeepestPaths($array)
{
$deepestPaths = array();
foreach ($array as $path)
{
$pathLength = strlen($path);
// look for all the paths we consider the longest
// (note how we're using references to the array members)
foreach ($deepestPaths as &$deepPath)
{
$deepPathLength = strlen($deepPath);
// if $path is prefixed by $deepPath, this means that $path is
// deeper, so we replace $deepPath with $path
if (substr($path, 0, $deepPathLength) == $deepPath)
{
$deepPath = $path;
continue 2;
}
// otherwise, if $deepPath is prefixed by $path, this means that
// $path is shallower; so we should stop looking
else if (substr($deepPath, 0, $pathLength) == $path)
{
continue 2;
}
}
// $path matches nothing currently in $deepestPaths, so we should
// add it to the array
$deepestPaths[] = $path;
}
return $deepestPaths;
}
$paths = array('/a/b', '/a', '/1/2/3/4', '/1/2', '/1/2/3/5', '/a/b/c/d/e');
print_r(getDeepestPaths($paths));
?>
If your folder names don't end with slashes, you'll want to do an additional check in the two ifs: that the character next to the prefix in the deeper path is a slash, because otherwise a path like /foo/bar will be seen as a "deeper path" than /foo/b (and will replace it).
if (substr($path, 0, $deepPathLength) == $deepPath && $path[$deepPathLength] == '/')
if (substr($deepPath, 0, $path) == $path && $deepPath[$path] == '/')

$aPathes = array(
'/a/b',
'/a',
'/1/2/3/4',
'/1/2',
'/1/2/3/5',
'/a/b/c/d/e'
);
function getDepth($sPath) {
return substr_count($sPath, '/');
}
$aPathDepths = array_map('getDepth', $aPathes);
arsort($aPathDepths);
foreach ($aPathDepths as $iKey => $iDepth) {
echo $aPathes[$iKey] . "\n";
}
Also see this example.
=== UPDATE ===
$aUsed = array();
foreach ($aPathes as $sPath) {
foreach ($aUsed as $iIndex => $sUsed) {
if (substr($sUsed, 0, strlen($sPath)) == $sPath || substr($sPath, 0, strlen($sUsed)) == $sUsed) {
if (strlen($sUsed) < strlen($sPath)) {
array_splice($aUsed, $iIndex, 1);
$aUsed[] = $sPath;
}
continue 2;
}
}
$aUsed[] = $sPath;
}
Also see this example.

If you can guarantee that the "spelling" is always the same (ie: "/a/b c/d" vs. /a/b\ /c/d) then you should be able to do some simple string comparation to see if one of the strings is fully contained within the other. If that is true discard the string.
Note that you will need to compare in both directions.

Related

Is this the correct way to hide a file or folder in PHP

I am just learning more about using classes in PHP. I know the code below is crap has I need help. Can someone just let me know if I am going in the right direction?
while($entryName=readdir($myDirectory)) {
$type = array("index.php", "style.css", "sorttable.js", "host-img");
if($entryName != $type[0]){
if($entryName != $type[1]){
if($entryName != $type[2]){
if($entryName != $type[3]){
$dirArray[]=$entryName;
}
}
}
}
}
What you seem to want is a list of all the files in your directory that do not have one of four specific names.
The code that most resembles yours that would do it more efficiently is
$exclude = array("index.php", "style.css", "sorttable.js", "host-img");
$dirArray = [];
while ($entryName = readdir($myDirectory)) {
if (!in_array($entryName, $exclude)) {
$dirArray[] = $entryName;
}
}
Alternately, you can dispense with the loop (as written, will include both files and directories in the directory you supply)
$exclude = array("index.php", "style.css", "sorttable.js", "host-img");
$contents = scandir($myDirectory);
$dirArray = array_diff($contents, $exclude);
Edit to add for posterity:
#arkascha had an answer that used array_filter, and while that example was just an implementation of array_diff, the motivation for that pattern is a good one: There may be times when you want to exclude more than just a simple list. It is entirely reasonable, for instance, to imagine you want to exclude specific files and all directories. So you have to filter directories from your list. And just for fun, let's also not return any file whose name begins with ..
$exclude = ["index.php", "style.css", "sorttable.js", "host-img"];
$contents = scandir($myDirectory); // myDirectory is a valid path to the directory
$dirArray = array_filter($contents, function($fileName) use ($myDirectory, $exclude) {
if (!in_array($fileName, $exclude) && strpos('.', $fileName) !== 0) {
return !is_dir($myDirectory.$fileName));
} else {
return false;
}
}
You actually want to filter your input:
<?php
$input = [".", "..", "folderA", "folderB", "file1", "file2", "file3"];
$blacklist = [".", "..", "folderA", "file1"];
$output = array_filter($input, function($entry) use ($blacklist) {
return !in_array($entry, $blacklist);
});
print_r($output);
The output is:
Array
(
[3] => folderB
[5] => file2
[6] => file3
)
Such approach allows to implement more complex filter conditions without having to pass over the input data multiple times. For example if you want to add another filter condition based on file name extensions or file creation time, even on file content.

Delete files not matching a list

So I'm trying to make a simple script, it will have a list of predefined files, search for anything that's not on the list and delete it.
I have this for now
<?php
$directory = "/home/user/public_html";
$files = glob($directory . "*.*");
foreach($files as $file)
{
$sql = mysql_query("SELECT id FROM files WHERE FileName='$file'");
if(mysql_num_rows($sql) == 0)
unlink($directory . $file);
}
?>
However, I'd like to avoid the query so I can run the script more often (there's about 60-70 files, and I want to run this every 20 seconds or so?) so how would I embedd a file list into the php file and check against that instead of database?
Thanks!
You are missing a trailing / twice.. In glob() you are giving /home/user/public_html*.* as the argument, I think you mean /home/user/public_html/*.*.
This is why I bet nothing matches the files in your table..
This won't give an error either because the syntax is fine.
Then where you unlink() you do this again.. your argument home/user/public_htmltestfile.html should be home/user/public_html/testfile.html.
I like this syntax style: "{$directory}/{$file}" because it's short and more readable. If the / is missing, you see it immediately. You can also change it to $directory . "/" . $file, it you prefer it. The same goes for one line conditional statements.. So here it comes..
<?php
$directory = "/home/user/public_html";
$files = glob("{$directory}/*.*");
foreach($files as $file)
{
$sql = mysql_query("SELECT id FROM files WHERE FileName=\"{$file}\";");
if(mysql_num_rows($sql) == 0)
{
unlink("{$directory}/{$file}");
}
}
?>
EDIT: You requested recursion. Here it goes..
You need to make a function that you can run once with a path as it's argument. Then you can run that function from inside that function on subdirectories. Like this:
<?php
/*
ListDir list files under directories recursively
Arguments:
$dir = directory to be scanned
$recursive = in how many levels of recursion do you want to search? (0 for none), default: -1 (for "unlimited")
*/
function ListDir($dir, $recursive=-1)
{
// if recursive == -1 do "unlimited" but that's no good on a live server! so let's say 999 is enough..
$recursive = ($recursive == -1 ? 999 : $recursive);
// array to hold return value
$retval = array();
// remove trailing / if it is there and then add it, to make sure there is always just 1 /
$dir = rtrim($dir,"/") . "/*";
// read the directory contents and process each node
foreach(glob($dir) as $node)
{
// skip hidden files
if(substr($node,-1) == ".") continue;
// if $node is a dir and recursive is greater than 0 (meaning not at the last level or disabled)
if(is_dir($node) && $recursive > 0)
{
// substract 1 of recursive for ever recursion.
$recursive--;
// run this same function again on itself, merging the return values with the return array
$retval = array_merge($retval, ListDir($node, $recursive));
}
// if $node is a file, we add it to the array that will be returned from this function
elseif(is_file($node))
{
$retval[] = $node;
// NOTE: if you want you can do some action here in your case you can unlink($node) if it matches your requirements..
}
}
return $retval;
}
// Output the result
echo "<pre>";
print_r(ListDir("/path/to/dir/",1));
echo "</pre>";
?>
If the list is not dynamic, store it in an array:
$myFiles = array (
'some.ext',
'next.ext',
'more.ext'
);
$directory = "/home/user/public_html/";
$files = glob($directory . "*.*");
foreach($files as $file)
{
if (!in_array($file, $myFiles)) {
unlink($directory . $file);
}
}

How can I exclude directories using RecursiveDirectoryIterator

I have the function below. Which goes through directories and recursively searches through them to grab a random image file and then attaches that to a post. What I want to do is exclude some files from the search.
I have a comma separated list which I explode into an array, I tried using a filter but couldn't get this to work.
Current function without filter is
function swmc_get_imgs($start_dir, $ext, $exclude=array()){
$dir = new RecursiveIteratorIterator(new RecursiveDirectoryIterator($start_dir));
$files = array();
// Force array of extensions and make them all lower-case
if ( ! is_array($ext))
{
$ext = (array) $ext;
}
$ext = array_unique(array_map('strtolower', $ext));
foreach($dir as $file)
{
// Skip anything that isn't a file
if ( ! $file->isFile())
continue;
// If the file has one of our desired extensions, add it to files array
if (in_array(strtolower(pathinfo($file->getFilename(), PATHINFO_EXTENSION)), $ext)) {
$files[] = $file->getPathname();
}
}
return $files;
}
So the above works but can be fairly expensive still especially with a lot of directories, as such I want to exclude a list of directories stored in a comma list.
I tried the following
class SwmcOnlyFilter extends RecursiveFilterIterator {
public function accept() {
// Accept the current item if we can recurse into it
// or it is a value starting with "test"
return $this->hasChildren() || !in_array($this->current(), explode(",",get_option('swmc_image_excl')));
}
}
And then changing the first part of the swmc_get_imgs function to
$dirIterator = new RecursiveDirectoryIterator($start_dir);
$filter = new SwmcOnlyFilter($dirIterator);
$dir = new RecursiveIteratorIterator($filter);
However the filter doesn't jump over that directory but instead goes into it.
The directories could look like
/uploads/2009/1/2011_pic.jpg
/uploads/2011/1/john_smith.jpg
and so on.
So I may want to exclude 2011 as a directory but not exclude the image that lives in 2009 with 2011 in its title.
CLARIFICATION:
I could filter out these manually by skipping them in the foreach loop, however this still checks them and wastes memory and time. I would prefer to skip these at the time of the grab if possible.
figured it out using the following
function swmc_iterate_imgs($start_dir) {
$directory = $start_dir;
$excludedDirs = explode(",",get_option('swmc_image_excl')); // array of subdirectory paths, relative to $directory, to exclude from search
$it = new RecursiveIteratorIterator(new RecursiveDirectoryIterator($directory));
$fileArr = array(); // numerically indexed array with your files
$x = -1;
while ($it->valid())
{
if (!$it->isDot() && !in_array($it->getSubPath(), $excludedDirs) && preg_match('/(\.(jpg|jpeg|gif|png))$/i', $it->key()) == 1)
{
$fileArr[] = $it->key();
}
$it->next();
}
return $fileArr;
}

Replace PHP's realpath()

Apparently, realpath is very buggy. In PHP 5.3.1, it causes random crashes.
In 5.3.0 and less, realpath randomly fails and returns false (for the same string of course), plus it always fails on realpath-ing the same string twice/more (and of course, it works the first time).
Also, it is so buggy in earlier PHP versions, that it is completely unusable. Well...it already is, since it's not consistent.
Anyhow, what options do I have? Maybe rewrite it by myself? Is this advisable?
Thanks to Sven Arduwie's code (pointed out by Pekka) and some modification, I've built a (hopefully) better implementation:
/**
* This function is to replace PHP's extremely buggy realpath().
* #param string The original path, can be relative etc.
* #return string The resolved path, it might not exist.
*/
function truepath($path){
// whether $path is unix or not
$unipath=strlen($path)==0 || $path{0}!='/';
// attempts to detect if path is relative in which case, add cwd
if(strpos($path,':')===false && $unipath)
$path=getcwd().DIRECTORY_SEPARATOR.$path;
// resolve path parts (single dot, double dot and double delimiters)
$path = str_replace(array('/', '\\'), DIRECTORY_SEPARATOR, $path);
$parts = array_filter(explode(DIRECTORY_SEPARATOR, $path), 'strlen');
$absolutes = array();
foreach ($parts as $part) {
if ('.' == $part) continue;
if ('..' == $part) {
array_pop($absolutes);
} else {
$absolutes[] = $part;
}
}
$path=implode(DIRECTORY_SEPARATOR, $absolutes);
// resolve any symlinks
if(file_exists($path) && linkinfo($path)>0)$path=readlink($path);
// put initial separator that could have been lost
$path=!$unipath ? '/'.$path : $path;
return $path;
}
NB: Unlike PHP's realpath, this function does not return false on error; it returns a path which is as far as it could to resolving these quirks.
Note 2: Apparently some people can't read properly. Truepath() does not work on network resources including UNC and URLs. It works for the local file system only.
here is the modified code that supports UNC paths as well
static public function truepath($path)
{
// whether $path is unix or not
$unipath = strlen($path)==0 || $path{0}!='/';
$unc = substr($path,0,2)=='\\\\'?true:false;
// attempts to detect if path is relative in which case, add cwd
if(strpos($path,':') === false && $unipath && !$unc){
$path=getcwd().DIRECTORY_SEPARATOR.$path;
if($path{0}=='/'){
$unipath = false;
}
}
// resolve path parts (single dot, double dot and double delimiters)
$path = str_replace(array('/', '\\'), DIRECTORY_SEPARATOR, $path);
$parts = array_filter(explode(DIRECTORY_SEPARATOR, $path), 'strlen');
$absolutes = array();
foreach ($parts as $part) {
if ('.' == $part){
continue;
}
if ('..' == $part) {
array_pop($absolutes);
} else {
$absolutes[] = $part;
}
}
$path = implode(DIRECTORY_SEPARATOR, $absolutes);
// resolve any symlinks
if( function_exists('readlink') && file_exists($path) && linkinfo($path)>0 ){
$path = readlink($path);
}
// put initial separator that could have been lost
$path = !$unipath ? '/'.$path : $path;
$path = $unc ? '\\\\'.$path : $path;
return $path;
}
I know this is an old thread, but it is really helpful.
I meet a weird Phar::interceptFileFuncs issue when I implemented relative path in phpctags, the realpath() is really really buggy inside phar.
Thanks this thread give me some lights, here comes with my implementation based on christian's implemenation from this thread and this comments.
Hope it works for you.
function relativePath($from, $to)
{
$fromPath = absolutePath($from);
$toPath = absolutePath($to);
$fromPathParts = explode(DIRECTORY_SEPARATOR, rtrim($fromPath, DIRECTORY_SEPARATOR));
$toPathParts = explode(DIRECTORY_SEPARATOR, rtrim($toPath, DIRECTORY_SEPARATOR));
while(count($fromPathParts) && count($toPathParts) && ($fromPathParts[0] == $toPathParts[0]))
{
array_shift($fromPathParts);
array_shift($toPathParts);
}
return str_pad("", count($fromPathParts)*3, '..'.DIRECTORY_SEPARATOR).implode(DIRECTORY_SEPARATOR, $toPathParts);
}
function absolutePath($path)
{
$isEmptyPath = (strlen($path) == 0);
$isRelativePath = ($path{0} != '/');
$isWindowsPath = !(strpos($path, ':') === false);
if (($isEmptyPath || $isRelativePath) && !$isWindowsPath)
$path= getcwd().DIRECTORY_SEPARATOR.$path;
// resolve path parts (single dot, double dot and double delimiters)
$path = str_replace(array('/', '\\'), DIRECTORY_SEPARATOR, $path);
$pathParts = array_filter(explode(DIRECTORY_SEPARATOR, $path), 'strlen');
$absolutePathParts = array();
foreach ($pathParts as $part) {
if ($part == '.')
continue;
if ($part == '..') {
array_pop($absolutePathParts);
} else {
$absolutePathParts[] = $part;
}
}
$path = implode(DIRECTORY_SEPARATOR, $absolutePathParts);
// resolve any symlinks
if (file_exists($path) && linkinfo($path)>0)
$path = readlink($path);
// put initial separator that could have been lost
$path= (!$isWindowsPath ? '/'.$path : $path);
return $path;
}
For those Zend users out there, THIS answer may help you, as it did me:
$path = APPLICATION_PATH . "/../directory";
$realpath = new Zend_Filter_RealPath(new Zend_Config(array('exists' => false)));
$realpath = $realpath->filter($path);
I have never heard of such massive problems with realpath() (I always thought that it just interfaces some underlying OS functionality - would be interested in some links), but the User Contributed Notes to the manual page have a number of alternative implementations. Here is one that looks okay.
Of course, it's not guaranteed these implementations take care of all cross-platform quirks and issues, so you'd have to do thorough testing to see whether it suits your needs.
As far as I can see though, none of them returns a canonicalized path, they only resolve relative paths. If you need that, I'm not sure whether you can get around realpath() (except perhaps executing a (system-dependent) console command that gives you the full path.)
On Windows 7, the code works fine. On Linux, there is a problem in that the path generated starts with (in my case) home/xxx when it should start with /home/xxx ... ie the initial /, indicating the root folder, is missing.
The problem is not so much with this function, but with what getcwd returns in Linux.

Extract direct sub directory from path string

I need to extract the name of the direct sub directory from a full path string.
For example, say we have:
$str = "dir1/dir2/dir3/dir4/filename.ext";
$dir = "dir1/dir2";
Then the name of the sub-directory in the $str path relative to $dir would be "dir3". Note that $dir never has '/' at the ends.
So the function should be:
$subdir = getsubdir($str,$dir);
echo $subdir; // Outputs "dir3"
If $dir="dir1" then the output would be "dir2". If $dir="dir1/dir2/dir3/dir4" then the output would be "" (empty). If $dir="" then the output would be "dir1". Etc..
Currently this is what I have, and it works (as far as I've tested it). I'm just wondering if there's a simpler way since I find I'm using a lot of string functions. Maybe there's some magic regexp to do this in one line? (I'm not too good with regexp unfortunately).
function getsubdir($str,$dir) {
// Remove the filename
$str = dirname($str);
// Remove the $dir
if(!empty($dir)){
$str = str_replace($dir,"",$str);
}
// Remove the leading '/' if there is one
$si = stripos($str,"/");
if($si == 0){
$str = substr($str,1);
}
// Remove everything after the subdir (if there is anything)
$lastpart = strchr($str,"/");
$str = str_replace($lastpart,"",$str);
return $str;
}
As you can see, it's a little hacky in order to handle some odd cases (no '/' in input, empty input, etc). I hope all that made sense. Any help/suggestions are welcome.
Update (altered solution):
Well Alix Axel had it spot on. Here's his solution with slight tweaks so that it matches my exact requirements (eg: it must return a string, only directories should be outputted (not files))
function getsubdir($str,$dir) {
$str = dirname($str);
$temp = array_slice(array_diff(explode('/', $str), explode('/', $dir)), 0, 1);
return $temp[0];
}
Here you go:
function getSubDir($dir, $sub)
{
return array_slice(array_diff(explode('/', $dir), explode('/', $sub)), 0, 1);
}
EDIT - Foolproof implementation:
function getSubDirFoolproof($dir, $sub)
{
/*
This is the ONLY WAY we have to make SURE that the
last segment of $dir is a file and not a directory.
*/
if (is_file($dir))
{
$dir = dirname($dir);
}
// Is it necessary to convert to the fully expanded path?
$dir = realpath($dir);
$sub = realpath($sub);
// Do we need to worry about Windows?
$dir = str_replace('\\', '/', $dir);
$sub = str_replace('\\', '/', $sub);
// Here we filter leading, trailing and consecutive slashes.
$dir = array_filter(explode('/', $dir));
$sub = array_filter(explode('/', $sub));
// All done!
return array_slice(array_diff($dir, $sub), 0, 1);
}
How about splitting the whole thing into an array:
$fullpath = explode("/", "dir1/dir2/dir3/dir4/filename.ext");
$fulldir = explode("/", "dir1/dir2");
// Will result in array("dir1","dir2","dir3", "dir4", "filename.ext");
// and array("dir1", "dir2");
you should then be able to use array_diff():
$remainder = array_diff($fullpath, $fulldir);
// Should return array("dir3", "dir4", "filename.ext");
then, getting the direct child is easy:
echo $remainder[0];
I can't test this right now but it should work.
Here's a similar "short" solution, this time using string functions rather than array functions. If there is no corresponding part to be gotten from the string, getsubdir will return FALSE. The strtr segment is a quick way to escape the percents, which have special meaning to sscanf.
function getsubdir($str, $dir) {
return sscanf($str, strtr($dir, '%', '%%').'/%[^/]', $name) === 1 ? $name : FALSE;
}
And a quick test so you can see how it behaves:
$str = "dir1/dir2/dir3/dir4/filename.ext";
var_dump(
getSubDir($str, "dir1"),
getSubDir($str, "dir1/dir2/dir3"),
getSubDir($str, "cake")
);
// string(4) "dir2"
// string(4) "dir4"
// bool(false)

Categories