PHP concatenation of paths - php

Is there a PHP internal function for path concatenation ? What possibilities do I have to merge several paths (absolute and relative).
//Example:
$path1="/usr/home/username/www";
$path2="domainname";
$path3="images/thumbnails";
$domain="exampledomain.com";
//As example: Now I want to create a new path (domain + path3) on the fly.
$result = $domain.DIRECTORY_SEPARATOR.$path3
Ok, there is an easy solution for this example, but what if there are different dictionary separators or some paths are a little bit more complicated?
Is there an existing solution for trim it like this: /home/uploads/../uploads/tmp => /home/uploads/tmp ....
And how would a platform-independent version of an path-concat-function look like?
should an relative path start with "./" as prefix or is "home/path/img/" the common way?

I ran into this problem myself, primarily regarding the normalization of paths.
Normalization is:
One separator (I've chosen to support, but never return a backwards slash \\)
Resolving indirection: /../
Removing duplicate separators: /home/www/uploads//file.ext
Always remove trailing separator.
I've written a function that achieves this. I don't have access to that code right now, but it's also not that hard to write it yourself.
Whether a path is absolute or not doesn't really matter for the implementation of this normalization function, just watch out for the leading separator and you're good.
I'm not too worried about OS dependence. Both Windows and Linux PHP understand / so for the sake of simplicity I'm just always using that - but I guess it doesn't really matter what separator you use.
To answer your question: path concatenation can be very easy if you just always use / and assume that a directory has no trailing separator. 'no trailing separator' seems like a good assumption because functions like dirname remove the trailing separator.
Then it's always safe to do: $dir . "/" . $file.
And even if the result path is /home/uploads/../uploads//my_uploads/myfile.ext it's still going to work fine.
Normalization becomes useful when you need to store the path somewhere. And because you have this normalization function you can make these assumptions.
An additional useful function is a function to make relative paths.
/files/uploads
/files/uploads/my_uploads/myfile.ext
It can be useful to derive from those two paths, what the relative path to the file is.
realpath
I've found realpath to be extremely performance heavy. It's not so bad if you're calling it once but if you're doing it in a loop somewhere you get a pretty big hit. Keep in mind that each realpath call is a call to the filesystem as well. Also, it will simply return false if you pass in something silly, I'd rather have it throw an Exception.
To me the realpath function is a good example of a BAD function because it does two things: 1. It normalizes the path and 2. it checks if the path exists. Both of these functions are useful of course but they must be separated. It also doesn't distinguish between files and directories. For windows this typically isn't a problem, but for Linux it can be.
And I think there is some quirky-ness when using realpath("") on Windows. I think it will return \\ - which can be profoundly unacceptable.
/**
* This function is a proper replacement for realpath
* It will _only_ normalize the path and resolve indirections (.. and .)
* Normalization includes:
* - directiory separator is always /
* - there is never a trailing directory separator
* #param $path
* #return String
*/
function normalize_path($path) {
$parts = preg_split(":[\\\/]:", $path); // split on known directory separators
// resolve relative paths
for ($i = 0; $i < count($parts); $i +=1) {
if ($parts[$i] === "..") { // resolve ..
if ($i === 0) {
throw new Exception("Cannot resolve path, path seems invalid: `" . $path . "`");
}
unset($parts[$i - 1]);
unset($parts[$i]);
$parts = array_values($parts);
$i -= 2;
} else if ($parts[$i] === ".") { // resolve .
unset($parts[$i]);
$parts = array_values($parts);
$i -= 1;
}
if ($i > 0 && $parts[$i] === "") { // remove empty parts
unset($parts[$i]);
$parts = array_values($parts);
}
}
return implode("/", $parts);
}
/**
* Removes base path from longer path. The resulting path will never contain a leading directory separator
* Base path must occur in longer path
* Paths will be normalized
* #throws Exception
* #param $base_path
* #param $longer_path
* #return string normalized relative path
*/
function make_relative_path($base_path, $longer_path) {
$base_path = normalize_path($base_path);
$longer_path = normalize_path($longer_path);
if (0 !== strpos($longer_path, $base_path)) {
throw new Exception("Can not make relative path, base path does not occur at 0 in longer path: `" . $base_path . "`, `" . $longer_path . "`");
}
return substr($longer_path, strlen($base_path) + 1);
}

If the path actually exists, you can use realpath to expand it.
echo realpath("/home/../home/dogbert")
/home/dogbert

Another problem with realpath is that it doesn't work with URLs, and yet the logic of concatenation is essentially the same (there should be exactly one slash between two joined components). Obviously the protocol portion at the beginning with two slashes is an exception to this rule. But still, a function that joined the pieces of a URL together would be really nice.

Related

PHP: The fastest way to check if the directory is empty?

On Stack Overflow there are several answers to the question of how to check if the directory is empty, but which is the fastest, which way is the most effective?
Answer 1: https://stackoverflow.com/a/7497848/4437206
function dir_is_empty($dir) {
$handle = opendir($handle);
while (false !== ($entry = readdir($handle))) {
if ($entry != "." && $entry != "..") {
closedir($handle); // <= I added this
return FALSE;
}
}
closedir($handle); // <= I added this
return TRUE;
}
Answer 2: https://stackoverflow.com/a/18856880/4437206
$isDirEmpty = !(new \FilesystemIterator($dir))->valid();
Answer 3: https://stackoverflow.com/a/19243116/4437206
$dir = 'directory'; // dir path assign here
echo (count(glob("$dir/*")) === 0) ? 'Empty' : 'Not empty';
Or, there is a completely different way which is faster and more effective than these three above?
As for the Answer 1, please note that I added closedir($handle);, but I'm not sure if that's necessary (?).
EDIT: Initially I added closedir($dir); instead of closedir($handle);, but I corrected that as #duskwuff stated in his answer.
The opendir()/readdir() and FilesystemIterator approaches are both conceptually equivalent, and perform identical system calls (as tested on PHP 7.2 running under Linux). There's no fundamental reason why either one would be faster than the other, so I would recommend that you run benchmarks if you need to microoptimize.
The approach using glob() will perform worse. glob() returns an array of all the filenames in the directory; constructing that array can take some time. If there are many files in the directory, it will perform much worse, as it must iterate through the entire contents of the directory.
Using glob() will also give incorrect results in a number of situations:
If $dir is a directory name which contains certain special characters, including *, ?, and [/]
If $dir contains only dotfiles (i.e, filenames starting with .)
As for the Answer 1, please note that I added closedir($dir);, but I'm not sure if that's necessary (?).
It's a good idea, but you've implemented it incorrectly. The directory handle that needs to be closed is $handle, not $dir.

How to compute the smallest common denominator of a set of paths and symlinks in PHP?

I get an array of paths (combined from default and user settings) and need to perform a recursive search for some data files which can be hidden between tens of thousands of of files in any of these paths.
I do the recursive search with a RecursiveDirectoryIterator but it is quite slow and the suggested alternative exec("find") is even slower. To save time, I/O and processing power I'd like to do some preprocessing beforehand to avoid searching directory trees multiple times and compute the smallest common denominator of the given paths. I would appreciate any advice on how to do this.
The catch is that any of the given paths might not only be ancestors of others or just symlinked into each other but might be given as either realpaths or paths to a symlink. At least one may assume that there won't be any circling symlinks (although a check wouldn't be bad).
I need to implement this in PHP and I sketched out the follwing Code, which doesn't cover all cases yet.
// make all given paths absolute and resolve symlinks
$search_paths = array_map( function($path) {
return realpath( $path ) ?: $path;
}, $search_paths );
// remove all double entries
$search_paths = array_unique( $search_paths );
// sort by length of path, shortest first
usort($search_paths, function($a, $b) {
return strlen($a) - strlen($b);
});
// iterate over all paths but the last
for ( $i = 0; $i < count( $search_paths ) - 1; $i++ ) {
// iterate over all paths following the current
for ( $j = $i; $j < count( $search_paths ); $j++ ) {
if ( strpos ( $search_paths[$j], $search_paths[$i] ) === 0 ) {
// longer path starts with shorter one, thus it's a child. Nuke it!
unset( $search_paths[$j] );
}
}
}
Where this code falls short:
Imagine these paths in $search_paths
/e/f
/a/b/c/d
/e/f/g/d
with /e/f/g/d being a symlink to /a/b/c/d.
The code above would leave these two:
/e/f
/a/b/c/d
but searching /e/f would actually be sufficient as it covers /a/b/c/d via the symlink /e/f/g/d. This might sound like an edge case but is actually quite likely in my situation.
Tricky, eh?
I'm pretty sure I'm not the only one with this problem but I couldn't find a solution using google. Maybe I just don't get the right wording to the problem.
Thanks for reading this far! :)

Handle $_GET safely PHP

I have a code like this:
$myvar=$_GET['var'];
// a bunch of code without any connection to DB where $myvar is used like this:
$local_directory=dirname(__FILE__).'/images/'.$myvar;
if ($myvar && $handle = opendir($local_directory)) {
$i=0;
while (false !== ($entry = readdir($handle))) {
if(strstr($entry, 'sample_'.$language.'-'.$type)) {
$result[$i]=$entry;
$i++;
}
}
closedir($handle);
} else {
echo 'error';
}
I'm a little confused with a number of stripping and escaping functions, so the question is, what do i need to do with $myvar for this code to be safe? In my case i don't make any database connections.
You are trying to prevent directory traversal attacks, so you don't want the person putting in ./../../../ or something, hoping to read out files or filenames, depending on what you are doing.
I often using something like this:
$myvar = preg_replace("/[^a-zA-Z0-9-]/","",$_GET['var']);
This replaces anything that isn't a-zA-Z0-9- with a blank, so if the variable contains say, *, this code would delete that.
I then change the a-zA-Z0-9- to match which characters I want to be allowed in the string. I can then lock it down to only containing numbers or whatever I need.
It's really, really dangerous to do something like: opendir($local_directory) where $local_directory is a value which could come from the outside.
What if someone passes in something like ../../../../../../../../../etc ...or something like that? You risk of compromising security of your host.
You can take a glance here, to start:
http://php.net/manual/en/book.filter.php
IMHO, if you don't create anything on the fly, you should have something like:
$allowed_dirs = array('dir1','dir2', 'dir3');
if (!in_array($myvar, $allowed_dirs)) {
// throw an error and log what has happened
}
You can do this right after you receive your input from "outside". If it's impractical for you to do this because the number of image dirs can vary with time and you're afraid of missing the sync with your codebase, you could also populate the array of valid values making a scan of subdirectories you have into the image folders first.
So, at the end, you could have something like:
$allowed_dirs = array();
if ($handle = opendir(dirname(__FILE__) . '/images')) {
while (false !== ($entry = readdir($handle))) {
$allowed_dirs[] = $entry;
}
closedir($handle);
}
$myvar=$_GET['var'];
// you can deny access to dirs you want to protect like this
unset($allowed_dirs['private_stuff']);
// rest of code
$local_directory = dirname(__FILE__) . "/images/.$myvar";
if (in_array(".$myvar", $allowed_dirs) && $handle = opendir($local_directory)) {
$i=0;
while (false !== ($entry = readdir($handle))) {
if(strstr($entry, 'sample_'.$language.'-'.$type)) {
$result[$i]=$entry;
$i++;
}
}
closedir($handle);
} else {
echo 'error';
}
Code above is NOT optimized. But let's avoid premature optimization in this case (stating this to avoid another "nice" downvote); snippet is just to get you the idea of explicitly allowing values VS alternate approach of allowing everything unless matching a certain pattern. I think the former is more secure.
Let me just note for completeness that, if you can be sure your code will only be run on Unixish systems (such as Linux), the only things you need to ensure are that:
$myvar does not contain any slash ("/", U+002F) or null ("\0", U+0000) characters, and that
$myvar is not empty or equal to "." (or, equivalently, that ".$myvar" is not equal to "." or "..").
That's because, on a Unix filesystem, the only directory separator character (and one of the two characters not allowed in filenames, the other being the null character "\0") is the slash, and the only special directory entries pointing upwards in the directory tree are "." and "..".
However, if your code might someday be run on Windows, then you'll need to disallow more characters (at least the backslash, "\\", and probably others too). I'm not familiar enough with Windows filesystem conventions to say exactly which characters you'd need to disallow there, but the safe approach is to do as Rich Bradshaw suggests and only allow characters that you know are safe.
As with every data that comes from an untrusted source: Validate it before use and encode it properly when passing it to another context.
As for the former, you first need to specify what properties the data must have to be considered valid. This primarily depends on the purpose of its use.
In your case, the value of $myvar should probably be at least a valid directory name but it could also be a valid relative path composed of directory names, depending on your requirements. At this point, you are supposed to specify these requirements.

What kind of special characters can glob pattern handle?

So far I have only worked with *, but, are there something like lookaheads, groups?
I would like to get all *.php except controller.php.
What I have to alter in this glob(dirname(__FILE__) . DIRECTORY_SEPARATOR . '*.php') call, to exclude controller.php?
Or should I avoid glob and work with something else instead?
php glob() uses the rules used by the libc glob() function, which is similar to the rules used by common shells. So the patterns that you are allowed to use are rather limited.
glob() returns an array of all the paths that match the given pattern. Filtering controller.php out the result array is one solution.
As per http://www.manpagez.com/man/3/glob/: (the backend behind php's glob()) The glob() function is a pathname generator that implements the rules for file name pattern matching used by the shell.
It is a single filter, no exceptions. If you want *.php, you'll get *.php.
Try this,
<?php
$availableFiles = glob("*.txt");
foreach ($availableFiles as $key => $filename) {
if($filename == "controller.php"){
unset($availableFiles[$key]);
}
}
echo "<pre>"; print_r($availableFiles);
?>

How can I split a string if it is a MS-DOS type path OR Unix type path?

I have a string that represents a path to a directory. I want split the string if it is a unix type path or a ms-dos type path.
How can this be done?
For example:
<?php
$a = some_path1/some_path2/some_path3; // unix type path
$b = some_path1\some_path2\some_path3; // MS-DOS type path
$foo = preg_split("something", $a); // what regex can be used here?
// the above should work with $a OR $b
?>
Your regex would be
preg_split('_[\\\\/]_', $a);
The backslashes are escaped once for the string and again for the regular expression engine.
Edit:
If you know what type of path you've got (for example you know it's of the type of the current OS) you can do:
preg_split('_' . preg_quote($pathSep, '_') . '_', $a)
You could use the constant DIRECTORY_SEPARATOR in place of $pathSep depending on your needs.
This would solve the problem pointed out by #Alan Storm
If you are trying to write portable code (Linux, Windows, Mac OS, etc) and if the separator is always the same as the separator of the server, you can do the following:
<?php
$url = 'what/ever/';
$urlExploded = explode(DIRECTORY_SEPARATOR, $url);
?>
http://www.php.net/manual/en/dir.constants.php
Use a pipe operator to match either a forward slash or a backslash:
$foo = preg_split("/\\\|\//", $a);
In this example, echo $foo[0] of $a will output some_path1 and echo $foo[0] of $b will also output some_path1.
EDIT: You should put quotes around the values that you set $a and $b to. Otherwise, your code will trigger an error.
First, it seems to me that preg_split is overkill here; just use explode with the path separator (which is faster, according to the PHP docs: http://us2.php.net/manual/en/function.preg-split.php ).
Maybe PHP has a way to find the path separator, similar to Python's os.sep, but I couldn't find it. However, you could do something like:
$os = PHP_OS;
if (stristr($os, "WIN")) { $sep = '\\'; } else { $sep = '/'; } #
$pathFields = explode($sep, $pathToFile);
Note that this assumes you're only going to run this on Windows or a Unix-like system. If you might run it on some other OS with a different path separator, you'll need to do extra checks to be sure you're on a *nix system.

Categories