I have a few strings to combine to build a full path. e.g.
$base = "http://foo.com";
$subfolder = "product/data";
$filename = "foo.xml";
// How to do this?
$url = append_url_parts($base, $subfolder, $filename); ???
String concatenation won't do, that would omit the necessary forward slashes.
In Win32 I'd use PathCombine() or PathAppend(), which would handle adding any necessary slashes between strings, without doubling them up. In PHP, what should I use?
Try this:
$base = "http://foo.com";
$subfolder = "product/data";
$filename = "foo.xml";
function stripTrailingSlash(&$component) {
$component = rtrim($component, '/');
}
$array = array($base, $subfolder, $filename);
array_walk_recursive($array, 'stripTrailingSlash');
$url = implode('/', $array);
when it comes down to something like this I like to use a special function with unlimited parameters.
define('BASE_URL','http://mysite.com'); //Without last slash
function build_url()
{
return BASE_URL . '/' . implode(func_get_args(),'/');
}
OR
function build_url()
{
$Path = BASE_URL;
foreach(func_get_args() as $path_part)
{
$Path .= '/' . $path_part;
}
return $Path;
}
So that when I use the function I can do
echo build_url('home'); //http://mysite.com/home
echo build_url('public','css','style.css'); //http://mysite.com/public/css/style.css
echo build_url('index.php'); //http://mysite.com/index.php
hope this helps you, works really well for me especially within an Framework Environment.
to use with params you can append the url like so for simplicity.
echo build_url('home') . '?' . http_build_query(array('hello' => 'world'));
Would produce: http://mysite.com/home?hello=world
not sure why you say string concat won't do, because something like this is basically similar to a string concat. (untested semi-pseudo)
function append_url_parts($base, $subf, $file) {
$url = sprintf("%s%s%s", $base, (($subf)? "/$subf": ""), (($file)? "/$file": ""));
return $url;
}
with string concat, we'd have to write a slightly longer block like so:
function append_url_parts($base, $subf, $file) {
$subf = ($subf)? "/$subf": "";
$file = ($file)? "/$file": "";
$url = "$base$subf$file";
return $url;
}
I usually go simple:
<?
$url = implode('/', array($base, $subfolder, $filename));
Either that or use a framework, and then use whatever route system it has.
There are a few considerations first.
Are you interested in getting the current path of the script or some other path?
How flexible do you need this to be? Is it something that is going to change all the time? Is it something an admin will set once and forget?
You want to be careful not to include the slash bug where your document has a slash added at the end because you were too lazy to figure out how to separate directory vars from the file var. There will only be one file and one base per URL and unknown number of directories in each path, right? :)
If you want to make sure there are no duplicate slashes within the resultant path, I like this little function...simply pass it an array of path part you want combined and it will return a formatted path - no need to worry whether any of the parts contain a slash alerady or not:
function build_url($arr)
{
foreach ( $arr as $path ) $url[] = rtrim ( $path, '/' );
return implode( $url, '/' );
}
This should work on all versions of PHP too.
Not my code, but a handy function which takes an absolute URL and a relative URL and combines the two to make a new absolute path.
The function has been modified to ignore an absolute URL passed as relative ( basically anything that includes a schema ).
$url = "http://www.goat.com/money/dave.html";
$rel = "../images/cheese.jpg";
$com = InternetCombineURL($url,$rel);
public function InternetCombineUrl($absolute, $relative) {
$p = parse_url($relative);
if(isset($p["scheme"]))return $relative;
extract(parse_url($absolute));
$path = dirname($path);
if($relative{0} == '/') {
$cparts = array_filter(explode("/", $relative));
}
else {
$aparts = array_filter(explode("/", $path));
$rparts = array_filter(explode("/", $relative));
$cparts = array_merge($aparts, $rparts);
foreach($cparts as $i => $part) {
if($part == '.') {
$cparts[$i] = null;
}
if($part == '..') {
$cparts[$i - 1] = null;
$cparts[$i] = null;
}
}
$cparts = array_filter($cparts);
}
$path = implode("/", $cparts);
$url = "";
if($scheme) {
$url = "$scheme://";
}
if(isset($user)) {
$url .= "$user";
if($pass) {
$url .= ":$pass";
}
$url .= "#";
}
if($host) {
$url .= "$host/";
}
$url .= $path;
return $url;
}
I wrote this function for all cases to combine url parts with no duplicate slashes.
It accepts many arguments or an array of parts.
Some parts may be empty strings, that does not produce double slashes.
It keeps starting and ending slashes if they are present.
function implodePath($parts)
{
if (!is_array($parts)) {
$parts = func_get_args();
if (count($parts) < 2) {
throw new \RuntimeException('implodePath() should take array as a single argument or more than one argument');
}
} elseif (count($parts) == 0) {
return '';
} elseif (count($parts) == 1) {
return $parts[0];
}
$resParts = [];
$first = array_shift($parts);
if ($first === '/') {
$resParts[] = ''; // It will keep one starting slash
} else {
// It may be empty or have some letters
$first = rtrim($first, '/');
if ($first !== '') {
$resParts[] = $first;
}
}
$last = array_pop($parts);
foreach ($parts as $part) {
$part = trim($part, '/');
if ($part !== '') {
$resParts[] = $part;
}
}
if ($last === '/') {
$resParts[] = ''; // To keep trailing slash
} else {
$last = ltrim($last, '/');
if ($last !== '') {
$resParts[] = $last; // Adding last part if not empty
}
}
return implode('/', $resParts);
}
Here is a check list from unit test. Left array is input and right part is result string.
[['/www/', '/eee/'], '/www/eee/'],
[['/www', 'eee/'], '/www/eee/'],
[['www', 'eee'], 'www/eee'],
[['www', ''], 'www'],
[['www', '/'], 'www/'],
[['/www/', '/aaa/', '/eee/'], '/www/aaa/eee/'],
[['/www', 'aaa/', '/eee/'], '/www/aaa/eee/'],
[['/www/', '/aaa/', 'eee/'], '/www/aaa/eee/'],
[['/www', 'aaa', 'eee/'], '/www/aaa/eee/'],
[['/www/', '/aaa/'], '/www/aaa/'],
[['/www', 'aaa/'], '/www/aaa/'],
[['/www/', 'aaa/'], '/www/aaa/'],
[['/www', '/aaa/'], '/www/aaa/'],
[['/www', '', 'eee/'], '/www/eee/'],
[['www/', '/aaa/', '/eee'], 'www/aaa/eee'],
[['/www/', '/aaa', ''], '/www/aaa'],
[['', 'aaa/', '/eee/'], 'aaa/eee/'],
[['', '', ''], ''],
[['aaa', '', '/'], 'aaa/'],
[['aaa', '/', '/'], 'aaa/'],
[['/', 'www', '/'], '/www/'],
It can be used as implodePath('aaa', 'bbb') or implodePath(['aaa', 'bbb'])
Related
I want to convert ../ into full paths.For example I have following urls in css in https://example.com/folder1/folder2/style.css
img/example1.png
/img/example2.png
../img/example3.png
../../img/example4.png
https://example.com/folder1/folder2/example5.png
I want to convert them into full path like below for above examples
https://example.com/folder1/folder2/img/example1.png
https://example.com/folder1/folder2/img/example1.png
https://example.com/folder1/img/example1.png
https://example.com/img/example1.png
https://example.com/folder1/folder2/example5.png
I tried something like below
$domain = "https://example.com";
function convertPath($str)
{
global $domain;
if(substr( $str, 0, 4 ) == "http")
{
return $str;
}
if(substr( $str, 0, 1 ) == "/")
{
return $domain.$str;
}
}
I know am complicating it , There must be some easy way to this kind of operation.Please guide me .Thank you.
A simple idea:
build an array of folders with the url
when the folder (of the path) is .., pop the last item of the array
when it is ., do nothing
For other folders, push them.
Then you only have to join the folder array with / and to prepend the scheme and the domain.
$url = 'https://example.com/folder1/folder2/style.css';
$paths = [ 'img/example1.png',
'/img/example2.png',
'../img/example3.png',
'../../img/example4.png',
'https://example.com/folder1/folder2/example5.png' ];
$folders = explode('/', trim(parse_url($url, PHP_URL_PATH), '/'));
array_pop($folders);
$prefix = explode('/' . $folders[0] . '/', $url)[0]; // need to be improved using parse_url to re-build
// properly the url with the correct syntax for each scheme.
function getURLFromPath($path, $prefix, $folders) {
if ( parse_url($path, PHP_URL_SCHEME) )
return $path;
foreach (explode('/', ltrim($path, '/')) as $item) {
if ( $item === '..' ) {
array_pop($folders);
} elseif ( $item === '.' ) {
} else {
$folders[] = $item;
}
}
return $prefix . '/' . implode('/', $folders);
}
foreach ($paths as $path) {
echo getURLFromPath($path, $prefix, $folders), PHP_EOL;
}
demo
Before i make my question I wanna say that I have tried every related post I found from stackoverflow such as PHP - Convert File system path to URL and nothing worked for me, meaning I did not get the Url I was looking for.
I need a
function PathToUrl($path)
{
...
}
which returns the actual Url of the $path
Example usage: echo PathToUrl('../songs'); should output http://www.mywebsite.com/files/morefiles/songs.
Note: The problem with the functions i found on stackoverflow is that they dont work with path that contains ../ for example on echo PathToUrl('../songs'); i would get something similar to http://www.mywebsite.com/files/morefiles/../songs which is not what i am looking for.
Here you go, I made this function and it works perfectly:
function getPath($path)
{
$url = "http".(!empty($_SERVER['HTTPS'])?"s":"").
"://".$_SERVER['SERVER_NAME'].$_SERVER['REQUEST_URI'];
$dirs = explode('/', trim(preg_replace('/\/+/', '/', $path), '/'));
foreach ($dirs as $key => $value)
if (empty($value)) unset($dirs[$key]);
$parsedUrl = parse_url($url);
$pathUrl = explode('/', trim($parsedUrl['path'], '/'));
foreach ($pathUrl as $key => $value)
if (empty($value)) unset($pathUrl[$key]);
$count = count($pathUrl);
foreach ($dirs as $key => $dir)
if ($dir === '..')
if ($count > 0)
array_pop($pathUrl);
else
throw new Exception('Wrong Path');
else if ($dir !== '.')
if (preg_match('/^(\w|\d|\.| |_|-)+$/', $dir)) {
$pathUrl[] = $dir;
++$count;
}
else
throw new Exception('Not Allowed Char');
return $parsedUrl['scheme'].'://'.$parsedUrl['host'].'/'.implode('/', $pathUrl);
}
Example:
Let's say your current location is http://www.mywebsite.com/files/morefiles/movies,
echo getPath('../songs');
OUTPUT
http://www.mywebsite.com/files/morefiles/songs
echo getPath('./songs/whatever');
OUTPUT
http://www.mywebsite.com/files/morefiles/movies/songs/whatever
echo getPath('../../../../../../');
In this case it will throw an exception.
NOTE
I use this regex '/^(\w|\d|\.| |_|-)+$/' to check the path directories which mean that only digit, word characters, '.', '-', '_' and ' ' are allowed. An exception will be trown for others characters.
The path .///path will be corrected by this function.
USEFUL
explode
implode
array_pop
parse_url
trim
preg_match
preg_replace
I see an answer was already posted, however I will post my solution which allows for direct specification of the URL :
echo up('http://www.google.ca/mypage/losethisfolder/',1);
function up($url, $howmany=1) {
$p = explode('/', rtrim($url, '/'));
if($howmany < count($p)) {
$popoff = 0;
while($popoff < $howmany) {
array_pop($p);
$popoff++;
}
}
return rtrim(implode('/', $p), '/') . '/';
}
Following is a much more elegant solution for virtual paths :
$path = '/foo/../../bar/../something/../results/in/skipthis/../this/';
echo virtualpath($path);
function virtualpath($path = null) {
$loopcount = 0;
$vpath = $path;
if(is_null($vpath)) { $vpath = parse_url($_SERVER['REQUEST_URI'], PHP_URL_PATH); }
$vpath = rtrim($vpath,'/') . '/';
while(strpos($vpath,'../') !== false ) {
if(substr($vpath,0,4) == '/../') { $vpath = '/'. substr($vpath,4); }
$vpath = preg_replace('/(\/[\w\d\s\ \-\.]+?\/\.\.\/)/', '/', $vpath );
}
return $vpath;
}
I would like to normalize a path from an external resource to prevent directory traversal attacks. I know about the realpath() function, but sadly this function returns only the path of existing directories. So if the directory doesn't exist (yet) the realpath() function cuts off the whole part of the path which doesn't exist.
So my Question is: Do you know a PHP function which only normalizes the path?
PS: I also don't want to create all possible directories in advance ;-)
There's no built-in PHP function for this. Use something like the following instead:
function removeDots($path) {
$root = ($path[0] === '/') ? '/' : '';
$segments = explode('/', trim($path, '/'));
$ret = array();
foreach($segments as $segment){
if (($segment == '.') || strlen($segment) === 0) {
continue;
}
if ($segment == '..') {
array_pop($ret);
} else {
array_push($ret, $segment);
}
}
return $root . implode('/', $ret);
}
Thanks to Benubird / Cragmonkey corrected me that under some situation my previous answer didn't work.
thus I make a new one, for the original purpose: Perform good, fewer lines, and with pure regular expression:
This time I tested with much more strict test case as below.
$path = '/var/.////./user/./././..//.//../////../././.././test/////';
function normalizePath($path) {
$patterns = array('~/{2,}~', '~/(\./)+~', '~([^/\.]+/(?R)*\.{2,}/)~', '~\.\./~');
$replacements = array('/', '/', '', '');
return preg_replace($patterns, $replacements, $path);
}
The correct answer would be /test/.
Not meant to do competition, but performance test is a must:
test case:
for loop 100k times, on an Windows 7, i5-3470 Quad Core, 3.20 GHz.
mine: 1.746 secs.
Tom Imrei: 4.548 secs.
Benubird: 3.593 secs.
Ursa: 4.334 secs.
It doesn't means my version is always better. In several situation they perform simular.
I think Tamas' solution will work, but it is also possible to do it with regex, which may be less efficient but looks neater. Val's solution is incorrect; but this one works.
function normalizePath($path) {
do {
$path = preg_replace(
array('#//|/\./#', '#/([^/.]+)/\.\./#'),
'/', $path, -1, $count
);
} while($count > 0);
return $path;
}
Yes, it does not handle all the possible different encodings of ./\ etc. that there can be, but that is not the purpose of it; one function should do one thing only, so if you want to also convert %2e%2e%2f into ../, run it through a separate function first.
Realpath also resolves symbolic links, which is obviously impossible if the path doesn't exist; but we can strip out the extra '/./', '/../' and '/' characters.
Strict, but safe implementation. If you use only ASCII for file names it would be suitable:
/**
* Normalise a file path string so that it can be checked safely.
*
* #param $path string
* The path to normalise.
* #return string
* Normalised path or FALSE, if $path cannot be normalized (invalid).
*/
function normalisePath($path) {
// Skip invalid input.
if (!isset($path)) {
return FALSE;
}
if ($path === '') {
return '';
}
// Attempt to avoid path encoding problems.
$path = preg_replace("/[^\x20-\x7E]/", '', $path);
$path = str_replace('\\', '/', $path);
// Remember path root.
$prefix = substr($path, 0, 1) === '/' ? '/' : '';
// Process path components
$stack = array();
$parts = explode('/', $path);
foreach ($parts as $part) {
if ($part === '' || $part === '.') {
// No-op: skip empty part.
} elseif ($part !== '..') {
array_push($stack, $part);
} elseif (!empty($stack)) {
array_pop($stack);
} else {
return FALSE; // Out of the root.
}
}
// Return the "clean" path
$path = $prefix . implode('/', $stack);
return $path;
}
My 2 cents. The regexp is used only for empty blocks of path:
<?php
echo path_normalize('/a/b/c/../../../d/e/file.txt');
echo path_normalize('a/b/../c');
echo path_normalize('./../../etc/passwd');
echo path_normalize('/var/user/.///////././.././.././././test/');
function path_normalize($path){
$path = str_replace('\\','/',$path);
$blocks = preg_split('#/#',$path,null,PREG_SPLIT_NO_EMPTY);
$res = array();
while(list($k,$block) = each($blocks)){
switch($block){
case '.':
if($k == 0)
$res = explode('/',path_normalize(getcwd()));
break;
case '..';
if(!$res) return false;
array_pop($res);
break;
default:
$res[] = $block;
break;
}
}
return implode('/',$res);
}
?>
I have a URL which can be any of the following formats:
http://example.com
https://example.com
http://example.com/foo
http://example.com/foo/bar
www.example.com
example.com
foo.example.com
www.foo.example.com
foo.bar.example.com
http://foo.bar.example.com/foo/bar
example.net/foo/bar
Essentially, I need to be able to match any normal URL. How can I extract example.com (or .net, whatever the tld happens to be. I need this to work with any TLD.) from all of these via a single regex?
Well you can use parse_url to get the host:
$info = parse_url($url);
$host = $info['host'];
Then, you can do some fancy stuff to get only the TLD and the Host
$host_names = explode(".", $host);
$bottom_host_name = $host_names[count($host_names)-2] . "." . $host_names[count($host_names)-1];
Not very elegant, but should work.
If you want an explanation, here it goes:
First we grab everything between the scheme (http://, etc), by using parse_url's capabilities to... well.... parse URL's. :)
Then we take the host name, and separate it into an array based on where the periods fall, so test.world.hello.myname would become:
array("test", "world", "hello", "myname");
After that, we take the number of elements in the array (4).
Then, we subtract 2 from it to get the second to last string (the hostname, or example, in your example)
Then, we subtract 1 from it to get the last string (because array keys start at 0), also known as the TLD
Then we combine those two parts with a period, and you have your base host name.
It is not possible to get the domain name without using a TLD list to compare with as their exist many cases with completely the same structure and length:
nas.db.de (Subdomain)
bbc.co.uk (Top-Level-Domain)
www.uk.com (Subdomain)
big.uk.com (Second-Level-Domain)
Mozilla's public suffix list should be the best option as it is used by all major browsers:
https://publicsuffix.org/list/public_suffix_list.dat
Feel free to use my function:
function tld_list($cache_dir=null) {
// we use "/tmp" if $cache_dir is not set
$cache_dir = isset($cache_dir) ? $cache_dir : sys_get_temp_dir();
$lock_dir = $cache_dir . '/public_suffix_list_lock/';
$list_dir = $cache_dir . '/public_suffix_list/';
// refresh list all 30 days
if (file_exists($list_dir) && #filemtime($list_dir) + 2592000 > time()) {
return $list_dir;
}
// use exclusive lock to avoid race conditions
if (!file_exists($lock_dir) && #mkdir($lock_dir)) {
// read from source
$list = #fopen('https://publicsuffix.org/list/public_suffix_list.dat', 'r');
if ($list) {
// the list is older than 30 days so delete everything first
if (file_exists($list_dir)) {
foreach (glob($list_dir . '*') as $filename) {
unlink($filename);
}
rmdir($list_dir);
}
// now set list directory with new timestamp
mkdir($list_dir);
// read line-by-line to avoid high memory usage
while ($line = fgets($list)) {
// skip comments and empty lines
if ($line[0] == '/' || !$line) {
continue;
}
// remove wildcard
if ($line[0] . $line[1] == '*.') {
$line = substr($line, 2);
}
// remove exclamation mark
if ($line[0] == '!') {
$line = substr($line, 1);
}
// reverse TLD and remove linebreak
$line = implode('.', array_reverse(explode('.', (trim($line)))));
// we split the TLD list to reduce memory usage
touch($list_dir . $line);
}
fclose($list);
}
#rmdir($lock_dir);
}
// repair locks (should never happen)
if (file_exists($lock_dir) && mt_rand(0, 100) == 0 && #filemtime($lock_dir) + 86400 < time()) {
#rmdir($lock_dir);
}
return $list_dir;
}
function get_domain($url=null) {
// obtain location of public suffix list
$tld_dir = tld_list();
// no url = our own host
$url = isset($url) ? $url : $_SERVER['SERVER_NAME'];
// add missing scheme ftp:// http:// ftps:// https://
$url = !isset($url[5]) || ($url[3] != ':' && $url[4] != ':' && $url[5] != ':') ? 'http://' . $url : $url;
// remove "/path/file.html", "/:80", etc.
$url = parse_url($url, PHP_URL_HOST);
// replace absolute domain name by relative (http://www.dns-sd.org/TrailingDotsInDomainNames.html)
$url = trim($url, '.');
// check if TLD exists
$url = explode('.', $url);
$parts = array_reverse($url);
foreach ($parts as $key => $part) {
$tld = implode('.', $parts);
if (file_exists($tld_dir . $tld)) {
return !$key ? '' : implode('.', array_slice($url, $key - 1));
}
// remove last part
array_pop($parts);
}
return '';
}
What it makes special:
it accepts every input like URLs, hostnames or domains with- or without scheme
the list is downloaded row-by-row to avoid high memory usage
it creates a new file per TLD in a cache folder so get_domain() only needs to check through file_exists() if it exists so it does not need to include a huge database on every request like TLDExtract does it.
the list will be automatically updated every 30 days
Test:
$urls = array(
'http://www.example.com',// example.com
'http://subdomain.example.com',// example.com
'http://www.example.uk.com',// example.uk.com
'http://www.example.co.uk',// example.co.uk
'http://www.example.com.ac',// example.com.ac
'http://example.com.ac',// example.com.ac
'http://www.example.accident-prevention.aero',// example.accident-prevention.aero
'http://www.example.sub.ar',// sub.ar
'http://www.congresodelalengua3.ar',// congresodelalengua3.ar
'http://congresodelalengua3.ar',// congresodelalengua3.ar
'http://www.example.pvt.k12.ma.us',// example.pvt.k12.ma.us
'http://www.example.lib.wy.us',// example.lib.wy.us
'com',// empty
'.com',// empty
'http://big.uk.com',// big.uk.com
'uk.com',// empty
'www.uk.com',// www.uk.com
'.uk.com',// empty
'stackoverflow.com',// stackoverflow.com
'.foobarfoo',// empty
'',// empty
false,// empty
' ',// empty
1,// empty
'a',// empty
);
Recent version with explanations (German):
http://www.programmierer-forum.de/domainnamen-ermitteln-t244185.htm
My solution in https://gist.github.com/pocesar/5366899
and the tests are here http://codepad.viper-7.com/GAh1tP
It works with any TLD, and hideous subdomain patterns (up to 3 subdomains).
There's a test included with many domain names.
Won't paste the function here because of the weird indentation for code in StackOverflow (could have fenced code blocks like github)
echo getDomainOnly("http://example.com/foo/bar");
function getDomainOnly($host){
$host = strtolower(trim($host));
$host = ltrim(str_replace("http://","",str_replace("https://","",$host)),"www.");
$count = substr_count($host, '.');
if($count === 2){
if(strlen(explode('.', $host)[1]) > 3) $host = explode('.', $host, 2)[1];
} else if($count > 2){
$host = getDomainOnly(explode('.', $host, 2)[1]);
}
$host = explode('/',$host);
return $host[0];
}
I recommend using TLDExtract library for all operations with domain name.
I think the best way to handle this problem is:
$second_level_domains_regex = '/\.asn\.au$|\.com\.au$|\.net\.au$|\.id\.au$|\.org\.au$|\.edu\.au$|\.gov\.au$|\.csiro\.au$|\.act\.au$|\.nsw\.au$|\.nt\.au$|\.qld\.au$|\.sa\.au$|\.tas\.au$|\.vic\.au$|\.wa\.au$|\.co\.at$|\.or\.at$|\.priv\.at$|\.ac\.at$|\.avocat\.fr$|\.aeroport\.fr$|\.veterinaire\.fr$|\.co\.hu$|\.film\.hu$|\.lakas\.hu$|\.ingatlan\.hu$|\.sport\.hu$|\.hotel\.hu$|\.ac\.nz$|\.co\.nz$|\.geek\.nz$|\.gen\.nz$|\.kiwi\.nz$|\.maori\.nz$|\.net\.nz$|\.org\.nz$|\.school\.nz$|\.cri\.nz$|\.govt\.nz$|\.health\.nz$|\.iwi\.nz$|\.mil\.nz$|\.parliament\.nz$|\.ac\.za$|\.gov\.za$|\.law\.za$|\.mil\.za$|\.nom\.za$|\.school\.za$|\.net\.za$|\.co\.uk$|\.org\.uk$|\.me\.uk$|\.ltd\.uk$|\.plc\.uk$|\.net\.uk$|\.sch\.uk$|\.ac\.uk$|\.gov\.uk$|\.mod\.uk$|\.mil\.uk$|\.nhs\.uk$|\.police\.uk$/';
$domain = $_SERVER['HTTP_HOST'];
$domain = explode('.', $domain);
$domain = array_reverse($domain);
if (preg_match($second_level_domains_regex, $_SERVER['HTTP_HOST']) {
$domain = "$domain[2].$domain[1].$domain[0]";
} else {
$domain = "$domain[1].$domain[0]";
}
$onlyHostName = implode('.', array_slice(explode('.', parse_url($link, PHP_URL_HOST)), -2));
Using https://subdomain.domain.com/some/path as example
parse_url($link, PHP_URL_HOST) returns subdomain.domain.com
explode('.', parse_url($link, PHP_URL_HOST)) then breaks subdomain.domain.com into an array:
array(3) {
[0]=>
string(5) "subdomain"
[1]=>
string(7) "domain"
[2]=>
string(3) "com"
}
array_slice then slices the array so only the last 2 values are in the array (signified by the -2):
array(2) {
[0]=>
string(6) "domain"
[1]=>
string(3) "com"
}
implode then combines those two array values back together, ultimately giving you the result of domain.com
Note: this will only work when end domain you're expecting only has one . in it, like something.domain.com or else.something.domain.net
It will not work for something.domain.co.uk where you would expect domain.co.uk
There are two ways to extract subdomain from a host:
The first method that is more accurate is to use a database of tlds (like public_suffix_list.dat) and match domain with it. This is a little heavy in some cases. There are some PHP classes for using it like php-domain-parser and TLDExtract.
The second way is not as accurate as the first one, but is very fast and it can give the correct answer in many case, I wrote this function for it:
function get_domaininfo($url) {
// regex can be replaced with parse_url
preg_match("/^(https|http|ftp):\/\/(.*?)\//", "$url/" , $matches);
$parts = explode(".", $matches[2]);
$tld = array_pop($parts);
$host = array_pop($parts);
if ( strlen($tld) == 2 && strlen($host) <= 3 ) {
$tld = "$host.$tld";
$host = array_pop($parts);
}
return array(
'protocol' => $matches[1],
'subdomain' => implode(".", $parts),
'domain' => "$host.$tld",
'host'=>$host,'tld'=>$tld
);
}
Example:
print_r(get_domaininfo('http://mysubdomain.domain.co.uk/index.php'));
Returns:
Array
(
[protocol] => https
[subdomain] => mysubdomain
[domain] => domain.co.uk
[host] => domain
[tld] => co.uk
)
Here's a function I wrote to grab the domain without subdomain(s), regardless of whether the domain is using a ccTLD or a new style long TLD, etc... There is no lookup or huge array of known TLDs, and there's no regex. It can be a lot shorter using the ternary operator and nesting, but I expanded it for readability.
// Per Wikipedia: "All ASCII ccTLD identifiers are two letters long,
// and all two-letter top-level domains are ccTLDs."
function topDomainFromURL($url) {
$url_parts = parse_url($url);
$domain_parts = explode('.', $url_parts['host']);
if (strlen(end($domain_parts)) == 2 ) {
// ccTLD here, get last three parts
$top_domain_parts = array_slice($domain_parts, -3);
} else {
$top_domain_parts = array_slice($domain_parts, -2);
}
$top_domain = implode('.', $top_domain_parts);
return $top_domain;
}
function getDomain($url){
$pieces = parse_url($url);
$domain = isset($pieces['host']) ? $pieces['host'] : '';
if(preg_match('/(?P<domain>[a-z0-9][a-z0-9\-]{1,63}\.[a-z\.]{2,6})$/i', $domain, $regs)){
return $regs['domain'];
}
return FALSE;
}
echo getDomain("http://example.com"); // outputs 'example.com'
echo getDomain("http://www.example.com"); // outputs 'example.com'
echo getDomain("http://mail.example.co.uk"); // outputs 'example.co.uk'
I had problems with the solution provided by pocesar.
When I would use for instance subdomain.domain.nl it would not return domain.nl. Instead it would return subdomain.domain.nl
Another problem was that domain.com.br would return com.br
I am not sure but i fixed these issues with the following code (i hope it will help someone, if so I am a happy man):
function get_domain($domain, $debug = false){
$original = $domain = strtolower($domain);
if (filter_var($domain, FILTER_VALIDATE_IP)) {
return $domain;
}
$debug ? print('<strong style="color:green">»</strong> Parsing: '.$original) : false;
$arr = array_slice(array_filter(explode('.', $domain, 4), function($value){
return $value !== 'www';
}), 0); //rebuild array indexes
if (count($arr) > 2){
$count = count($arr);
$_sub = explode('.', $count === 4 ? $arr[3] : $arr[2]);
$debug ? print(" (parts count: {$count})") : false;
if (count($_sub) === 2){ // two level TLD
$removed = array_shift($arr);
if ($count === 4){ // got a subdomain acting as a domain
$removed = array_shift($arr);
}
$debug ? print("<br>\n" . '[*] Two level TLD: <strong>' . join('.', $_sub) . '</strong> ') : false;
}elseif (count($_sub) === 1){ // one level TLD
$removed = array_shift($arr); //remove the subdomain
if (strlen($arr[0]) === 2 && $count === 3){ // TLD domain must be 2 letters
array_unshift($arr, $removed);
}elseif(strlen($arr[0]) === 3 && $count === 3){
array_unshift($arr, $removed);
}else{
// non country TLD according to IANA
$tlds = array(
'aero',
'arpa',
'asia',
'biz',
'cat',
'com',
'coop',
'edu',
'gov',
'info',
'jobs',
'mil',
'mobi',
'museum',
'name',
'net',
'org',
'post',
'pro',
'tel',
'travel',
'xxx',
);
if (count($arr) > 2 && in_array($_sub[0], $tlds) !== false){ //special TLD don't have a country
array_shift($arr);
}
}
$debug ? print("<br>\n" .'[*] One level TLD: <strong>'.join('.', $_sub).'</strong> ') : false;
}else{ // more than 3 levels, something is wrong
for ($i = count($_sub); $i > 1; $i--){
$removed = array_shift($arr);
}
$debug ? print("<br>\n" . '[*] Three level TLD: <strong>' . join('.', $_sub) . '</strong> ') : false;
}
}elseif (count($arr) === 2){
$arr0 = array_shift($arr);
if (strpos(join('.', $arr), '.') === false && in_array($arr[0], array('localhost','test','invalid')) === false){ // not a reserved domain
$debug ? print("<br>\n" .'Seems invalid domain: <strong>'.join('.', $arr).'</strong> re-adding: <strong>'.$arr0.'</strong> ') : false;
// seems invalid domain, restore it
array_unshift($arr, $arr0);
}
}
$debug ? print("<br>\n".'<strong style="color:gray">«</strong> Done parsing: <span style="color:red">' . $original . '</span> as <span style="color:blue">'. join('.', $arr) ."</span><br>\n") : false;
return join('.', $arr);
}
Here's one that works for all domains, including those with second level domains like "co.uk"
function strip_subdomains($url){
# credits to gavingmiller for maintaining this list
$second_level_domains = file_get_contents("https://raw.githubusercontent.com/gavingmiller/second-level-domains/master/SLDs.csv");
# presume sld first ...
$possible_sld = implode('.', array_slice(explode('.', $url), -2));
# and then verify it
if (strpos($second_level_domains, $possible_sld)){
return implode('.', array_slice(explode('.', $url), -3));
} else {
return implode('.', array_slice(explode('.', $url), -2));
}
}
Looks like there's a duplicate question here: delete-subdomain-from-url-string-if-subdomain-is-found
Very late, I see that you marked regex as a keyword and my function works like a charm, so far I haven't found a url that fails:
function get_domain_regex($url){
$pieces = parse_url($url);
$domain = isset($pieces['host']) ? $pieces['host'] : '';
if (preg_match('/(?P<domain>[a-z0-9][a-z0-9\-]{1,63}\.[a-z\.]{2,6})$/i', $domain, $regs)) {
return $regs['domain'];
}else{
return false;
}
}
if you want one without regex I have this one, which I am sure I also took from this post
function get_domain($url){
$parseUrl = parse_url($url);
$host = $parseUrl['host'];
$host_array = explode(".", $host);
$domain = $host_array[count($host_array)-2] . "." . $host_array[count($host_array)-1];
return $domain;
}
They both work amazing, BUT, this took me a while to realize if the url doesn't start with http:// or https:// it will fail so make sure the url string starts with the protocol.
Simply try this:
preg_match('/(www.)?([^.]+\.[^.]+)$/', $yourHost, $matches);
echo "domain name is: {$matches[0]}\n";
this working for majority of domains.
This function will return the domain name without the extension of any url given even if you parse a url without the http:// or https://
You can extend this code
(?:\.co)?(?:\.com)?(?:\.gov)?(?:\.net)?(?:\.org)?(?:\.id)?
with more extensions if you want to handle more second level domainnames.
function get_domain_name($url){
$pieces = parse_url($url);
$domain = isset($pieces['host']) ? $pieces['host'] : $url;
$domain = strtolower($domain);
$domain = preg_replace('/.international$/', '.com', $domain);
if (preg_match('/(?P<domain>[a-z0-9][a-z0-9\-]{1,90}\.[a-z\.]{2,6})$/i', $domain, $regs)) {
if (preg_match('/(.*?)((?:\.co)?(?:\.com)?(?:\.gov)?(?:\.net)?(?:\.org)?(?:\.id)?(?:\.asn)?.[a-z]{2,6})$/i', $regs['domain'], $matches)) {
return $matches[1];
}else return $regs['domain'];
}else{
return $url;
}
}
I'm using this to achieve the same target and it always works, I hope it will help others.
$url = https://use.fontawesome.com/releases/v5.11.2/css/all.css?ver=2.7.5
$handle = pathinfo( parse_url( $url )['host'] )['filename'];
$final_handle = substr( $handle , strpos( $handle , '.' ) + 1 );
print_r($final_handle); // fontawesome
Simplest solution
#preg_replace('#\/(.)*#', '', #preg_replace('#^https?://(www.)?#', '', $url))
Simply try this:
<?php
$host = $_SERVER['HTTP_HOST'];
preg_match("/[^\.\/]+\.[^\.\/]+$/", $host, $matches);
echo "domain name is: {$matches[0]}\n";
?>
Is there a builtin function in PHP to intelligently join path strings? The function, given abc/de/ and /fg/x.php as arguments, should return abc/de/fg/x.php; the same result should be given using abc/de and fg/x.php as arguments for that function.
If not, is there an available class? It could also be valuable for splitting paths or removing parts of them. If you have written something, may you share your code here?
It is ok to always use /, I am coding for Linux only.
In Python there is os.path.join, which is great.
function join_paths() {
$paths = array();
foreach (func_get_args() as $arg) {
if ($arg !== '') { $paths[] = $arg; }
}
return preg_replace('#/+#','/',join('/', $paths));
}
My solution is simpler and more similar to the way Python os.path.join works
Consider these test cases
array my version #deceze #david_miller #mark
['',''] '' '' '/' '/'
['','/'] '/' '' '/' '/'
['/','a'] '/a' 'a' '//a' '/a'
['/','/a'] '/a' 'a' '//a' '//a'
['abc','def'] 'abc/def' 'abc/def' 'abc/def' 'abc/def'
['abc','/def'] 'abc/def' 'abc/def' 'abc/def' 'abc//def'
['/abc','def'] '/abc/def' 'abc/def' '/abc/def' '/abc/def'
['','foo.jpg'] 'foo.jpg' 'foo.jpg' '/foo.jpg' '/foo.jpg'
['dir','0','a.jpg'] 'dir/0/a.jpg' 'dir/a.jpg' 'dir/0/a.jpg' 'dir/0/a.txt'
Since this seems to be a popular question and the comments are filling with "features suggestions" or "bug reports"... All this code snippet does is join two strings with a slash without duplicating slashes between them. That's all. No more, no less. It does not evaluate actual paths on the hard disk nor does it actually keep the beginning slash (add that back in if needed, at least you can be sure this code always returns a string without starting slash).
join('/', array(trim("abc/de/", '/'), trim("/fg/x.php", '/')));
The end result will always be a path with no slashes at the beginning or end and no double slashes within. Feel free to make a function out of that.
EDIT:
Here's a nice flexible function wrapper for above snippet. You can pass as many path snippets as you want, either as array or separate arguments:
function joinPaths() {
$args = func_get_args();
$paths = array();
foreach ($args as $arg) {
$paths = array_merge($paths, (array)$arg);
}
$paths = array_map(create_function('$p', 'return trim($p, "/");'), $paths);
$paths = array_filter($paths);
return join('/', $paths);
}
echo joinPaths(array('my/path', 'is', '/an/array'));
//or
echo joinPaths('my/paths/', '/are/', 'a/r/g/u/m/e/n/t/s/');
:o)
#deceze's function doesn't keep the leading / when trying to join a path that starts with a Unix absolute path, e.g. joinPaths('/var/www', '/vhosts/site');.
function unix_path() {
$args = func_get_args();
$paths = array();
foreach($args as $arg) {
$paths = array_merge($paths, (array)$arg);
}
foreach($paths as &$path) {
$path = trim($path, '/');
}
if (substr($args[0], 0, 1) == '/') {
$paths[0] = '/' . $paths[0];
}
return join('/', $paths);
}
My take:
function trimds($s) {
return rtrim($s,DIRECTORY_SEPARATOR);
}
function joinpaths() {
return implode(DIRECTORY_SEPARATOR, array_map('trimds', func_get_args()));
}
I'd have used an anonymous function for trimds, but older versions of PHP don't support it.
Example:
join_paths('a','\\b','/c','d/','/e/','f.jpg'); // a\b\c\d\e\f.jpg (on Windows)
Updated April 2013 March 2014 May 2018:
function join_paths(...$paths) {
return preg_replace('~[/\\\\]+~', DIRECTORY_SEPARATOR, implode(DIRECTORY_SEPARATOR, $paths));
}
This one will correct any slashes to match your OS, won't remove a leading slash, and clean up and multiple slashes in a row.
If you know the file/directory exists, you can add extra slashes (that may be unnecessary), then call realpath, i.e.
realpath(join('/', $parts));
This is of course not quite the same thing as the Python version, but in many cases may be good enough.
As a fun project, I created yet another solution. Should be universal for all operating systems.
For PHP 7.2+:
<?php
/**
* Join string into a single URL string.
*
* #param string $parts,... The parts of the URL to join.
* #return string The URL string.
*/
function join_paths(...$parts) {
if (sizeof($parts) === 0) return '';
$prefix = ($parts[0] === DIRECTORY_SEPARATOR) ? DIRECTORY_SEPARATOR : '';
$processed = array_filter(array_map(function ($part) {
return rtrim($part, DIRECTORY_SEPARATOR);
}, $parts), function ($part) {
return !empty($part);
});
return $prefix . implode(DIRECTORY_SEPARATOR, $processed);
}
For PHP version before 7.2:
/**
* Join string into a single URL string.
*
* #param string $parts,... The parts of the URL to join.
* #return string The URL string.
*/
function join_paths() {
$parts = func_get_args();
if (sizeof($parts) === 0) return '';
$prefix = ($parts[0] === DIRECTORY_SEPARATOR) ? DIRECTORY_SEPARATOR : '';
$processed = array_filter(array_map(function ($part) {
return rtrim($part, DIRECTORY_SEPARATOR);
}, $parts), function ($part) {
return !empty($part);
});
return $prefix . implode(DIRECTORY_SEPARATOR, $processed);
}
Some test case for its behaviour.
// relative paths
var_dump(join_paths('hello/', 'world'));
var_dump(join_paths('hello', 'world'));
var_dump(join_paths('hello', '', 'world'));
var_dump(join_paths('', 'hello/world'));
echo "\n";
// absolute paths
var_dump(join_paths('/hello/', 'world'));
var_dump(join_paths('/hello', 'world'));
var_dump(join_paths('/hello/', '', 'world'));
var_dump(join_paths('/hello', '', 'world'));
var_dump(join_paths('', '/hello/world'));
var_dump(join_paths('/', 'hello/world'));
Results:
string(11) "hello/world"
string(11) "hello/world"
string(11) "hello/world"
string(11) "hello/world"
string(12) "/hello/world"
string(12) "/hello/world"
string(12) "/hello/world"
string(12) "/hello/world"
string(12) "/hello/world"
string(12) "/hello/world"
Update: Added a version that supports PHP before 7.2.
An alternative is using implode() and explode().
$a = '/a/bc/def/';
$b = '/q/rs/tuv/path.xml';
$path = implode('/',array_filter(explode('/', $a . $b)));
echo $path; // -> a/bc/def/q/rs/tuv/path.xml
The solution below uses the logic proposed by #RiccardoGalli, but is improved to avail itself of the DIRECTORY_SEPARATOR constant, as #Qix and #FélixSaparelli suggested, and, more important, to trim each given element to avoid space-only folder names appearing in the final path (it was a requirement in my case).
Regarding the escape of directory separator inside the preg_replace() pattern, as you can see I used the preg_quote() function which does the job fine.
Furthermore, I would replace mutiple separators only (RegExp quantifier {2,}).
// PHP 7.+
function paths_join(string ...$parts): string {
$parts = array_map('trim', $parts);
$path = [];
foreach ($parts as $part) {
if ($part !== '') {
$path[] = $part;
}
}
$path = implode(DIRECTORY_SEPARATOR, $path);
return preg_replace(
'#' . preg_quote(DIRECTORY_SEPARATOR) . '{2,}#',
DIRECTORY_SEPARATOR,
$path
);
}
Elegant Python-inspired PHP one-liner way to join path.
This code doesn't use unnecessary array.
Multi-platform
function os_path_join(...$parts) {
return preg_replace('#'.DIRECTORY_SEPARATOR.'+#', DIRECTORY_SEPARATOR, implode(DIRECTORY_SEPARATOR, array_filter($parts)));
}
Unix based systems
function os_path_join(...$parts) {
return preg_replace('#/+#', '/', implode('/', array_filter($parts)));
}
Unix based system without REST parameters (don't respect explicit PEP8 philosophy) :
function os_path_join() {
return preg_replace('#/+#', '/', implode('/', array_filter(func_get_args())));
}
Usage
$path = os_path_join("", "/", "mydir/", "/here/");
Bonus : if you want really follow Python os.path.join(). First argument is required :
function os_path_join($path=null, ...$paths) {
if (!is_null($path)) {
throw new Exception("TypeError: join() missing 1 required positional argument: 'path'", 1);
}
$path = rtrim($path, DIRECTORY_SEPARATOR);
foreach ($paths as $key => $current_path) {
$paths[$key] = $paths[$key] = trim($current_path, DIRECTORY_SEPARATOR);
}
return implode(DIRECTORY_SEPARATOR, array_merge([$path], array_filter($paths)));
}
Check os.path.join() source if you want : https://github.com/python/cpython/blob/master/Lib/ntpath.py
Warning : This solution is not suitable for urls.
for getting parts of paths you can use pathinfo
http://nz2.php.net/manual/en/function.pathinfo.php
for joining the response from #deceze looks fine
A different way of attacking this one:
function joinPaths() {
$paths = array_filter(func_get_args());
return preg_replace('#/{2,}#', '/', implode('/', $paths));
}
This is a corrected version of the function posted by deceze. Without this change, joinPaths('', 'foo.jpg') becomes '/foo.jpg'
function joinPaths() {
$args = func_get_args();
$paths = array();
foreach ($args as $arg)
$paths = array_merge($paths, (array)$arg);
$paths2 = array();
foreach ($paths as $i=>$path)
{ $path = trim($path, '/');
if (strlen($path))
$paths2[]= $path;
}
$result = join('/', $paths2); // If first element of old path was absolute, make this one absolute also
if (strlen($paths[0]) && substr($paths[0], 0, 1) == '/')
return '/'.$result;
return $result;
}
This seems to be work quite well, and looks reasonably neat to me.
private function JoinPaths() {
$slash = DIRECTORY_SEPARATOR;
$sections = preg_split(
"#[/\\\\]#",
implode('/', func_get_args()),
null,
PREG_SPLIT_NO_EMPTY);
return implode($slash, $sections);
}
Best solution found:
function joinPaths($leftHandSide, $rightHandSide) {
return rtrim($leftHandSide, '/') .'/'. ltrim($rightHandSide, '/');
}
NOTE: Copied from the comment by user89021
OS-independent version based on the answer by mpen but encapsulated into a single function and with the option to add a trailing path separator.
function joinPathParts($parts, $trailingSeparator = false){
return implode(
DIRECTORY_SEPARATOR,
array_map(
function($s){
return rtrim($s,DIRECTORY_SEPARATOR);
},
$parts)
)
.($trailingSeparator ? DIRECTORY_SEPARATOR : '');
}
Or for you one-liner lovers:
function joinPathParts($parts, $trailingSeparator = false){
return implode(DIRECTORY_SEPARATOR, array_map(function($s){return rtrim($s,DIRECTORY_SEPARATOR);}, $parts)).($trailingSeparator ? DIRECTORY_SEPARATOR : '');
}
Simply call it with an array of path parts:
// No trailing separator - ex. C:\www\logs\myscript.txt
$logFile = joinPathParts([getcwd(), 'logs', 'myscript.txt']);
// Trailing separator - ex. C:\www\download\images\user1234\
$dir = joinPathParts([getcwd(), 'download', 'images', 'user1234'], true);
Note OP is asking for something slightly different from https://docs.python.org/3/library/os.path.html#os.path.join which does more than just join paths with the right number of separators.
While what they have asked for has been answered, for anyone skim reading the Q&A, there will be the following differences and ambiguous cases between what was asked for and os.path.join():
Many of the above solutions don't work for the root only case ['/'] => '/'
os.path.join drop all args to the left of the rightmost absolute path e.g. ['a', 'b', '/c'] => '/c' which to be fair is probably not the behaviour you want if you are refactoring existing php which has a lot of path segments appear like they are absolute paths.
Another difference with os.path.join is it won't drop additional separators within a single string ['a///', 'b', 'c'] => 'a///b/c'
Another special case is one or more empty strings resulting in a trailing slash for os.path.join: ['a', ''] or ['a', '', ''] => 'a/'
Here's a function that behaves like Node's path.resolve:
function resolve_path() {
$working_dir = getcwd();
foreach(func_get_args() as $p) {
if($p === null || $p === '') continue;
elseif($p[0] === '/') $working_dir = $p;
else $working_dir .= "/$p";
}
$working_dir = preg_replace('~/{2,}~','/', $working_dir);
if($working_dir === '/') return '/';
$out = [];
foreach(explode('/',rtrim($working_dir,'/')) as $p) {
if($p === '.') continue;
if($p === '..') array_pop($out);
else $out[] = $p;
}
return implode('/',$out);
}
Test cases:
resolve_path('/foo/bar','./baz') # /foo/bar/baz
resolve_path('/foo/bar','/tmp/file/') # /tmp/file
resolve_path('/foo/bar','/tmp','file') # /tmp/file
resolve_path('/foo//bar/../baz') # /foo/baz
resolve_path('/','foo') # /foo
resolve_path('/','foo','/') # /
resolve_path('wwwroot', 'static_files/png/', '../gif/image.gif')
# __DIR__.'/wwwroot/static_files/gif/image.gif'
From the great answer of Ricardo Galli, a bit of improvement to avoid killing the protocol prefix.
The idea is to test for the presence of a protocol in one argument, and maintain it into the result. WARNING: this is a naive implementation!
For example:
array("http://domain.de","/a","/b/")
results to (keeping protocol)
"http://domain.de/a/b/"
instead of (killing protocol)
"http:/domain.de/a/b/"
But http://codepad.org/hzpWmpzk needs a better code writing skill.
I love Riccardo's answer and I think it is the best answer.
I am using it to join paths in url building, but with one small change to handle protocols' double slash:
function joinPath () {
$paths = array();
foreach (func_get_args() as $arg) {
if ($arg !== '') { $paths[] = $arg; }
}
// Replace the slash with DIRECTORY_SEPARATOR
$paths = preg_replace('#/+#', '/', join('/', $paths));
return preg_replace('#:/#', '://', $paths);
}
function path_combine($paths) {
for ($i = 0; $i < count($paths); ++$i) {
$paths[$i] = trim($paths[$i]);
}
$dirty_paths = explode(DIRECTORY_SEPARATOR, join(DIRECTORY_SEPARATOR, $paths));
for ($i = 0; $i < count($dirty_paths); ++$i) {
$dirty_paths[$i] = trim($dirty_paths[$i]);
}
$unslashed_paths = array();
for ($i = 0; $i < count($dirty_paths); ++$i) {
$path = $dirty_paths[$i];
if (strlen($path) == 0) continue;
array_push($unslashed_paths, $path);
}
$first_not_empty_index = 0;
while(strlen($paths[$first_not_empty_index]) == 0) {
++$first_not_empty_index;
}
$starts_with_slash = $paths[$first_not_empty_index][0] == DIRECTORY_SEPARATOR;
return $starts_with_slash
? DIRECTORY_SEPARATOR . join(DIRECTORY_SEPARATOR, $unslashed_paths)
: join(DIRECTORY_SEPARATOR, $unslashed_paths);
}
Example usage:
$test = path_combine([' ', '/cosecheamo', 'pizze', '///// 4formaggi', 'GORGONZOLA']);
echo $test;
Will output:
/cosecheamo/pizze/4formaggi/GORGONZOLA
Here is my solution:
function joinPath(): string {
$path = '';
foreach (func_get_args() as $numArg => $arg) {
$arg = trim($arg);
$firstChar = substr($arg, 0, 1);
$lastChar = substr($arg, -1);
if ($numArg != 0 && $firstChar != '/') {
$arg = '/'.$arg;
}
# Eliminamos el slash del final
if ($lastChar == '/') {
$arg = rtrim($arg, '/');
}
$path .= $arg;
}
return $path;
}
Hmmm most seem a bit over complicated. Dunno, this is my take on it:
// Takes any amount of arguments, joins them, then replaces double slashes
function join_urls() {
$parts = func_get_args();
$url_part = implode("/", $parts);
return preg_replace('/\/{1,}/', '/', $url_part);
}
For people who want a join function that does the Windows backslash and the Linux forward slash.
Usage:
<?php
use App\Util\Paths
echo Paths::join('a','b'); //Prints 'a/b' on *nix, or 'a\\b' on Windows
Class file:
<?php
namespace App\Util;
class Paths
{
public static function join_with_separator($separator, $paths) {
$slash_delimited_path = preg_replace('#\\\\#','/', join('/', $paths));
$duplicates_cleaned_path = preg_replace('#/+#', $separator, $slash_delimited_path);
return $duplicates_cleaned_path;
}
public static function join() {
$paths = array();
foreach (func_get_args() as $arg) {
if ($arg !== '') { $paths[] = $arg; }
}
return Paths::join_with_separator(DIRECTORY_SEPARATOR, $paths);
}
}
Here's the test function:
<?php
namespace Tests\Unit;
use PHPUnit\Framework\TestCase;
use App\Util\Paths;
class PathsTest extends TestCase
{
public function testWindowsPaths()
{
$TEST_INPUTS = [
[],
['a'],
['a','b'],
['C:\\','blah.txt'],
['C:\\subdir','blah.txt'],
['C:\\subdir\\','blah.txt'],
['C:\\subdir','nested','1/2','blah.txt'],
];
$EXPECTED_OUTPUTS = [
'',
'a',
'a\\b',
'C:\\blah.txt',
'C:\\subdir\\blah.txt',
'C:\\subdir\\blah.txt',
'C:\\subdir\\nested\\1\\2\\blah.txt',
];
for ($i = 0; $i < count($TEST_INPUTS); $i++) {
$actualPath = Paths::join_with_separator('\\', $TEST_INPUTS[$i]);
$expectedPath = $EXPECTED_OUTPUTS[$i];
$this->assertEquals($expectedPath, $actualPath);
}
}
public function testNixPaths()
{
$TEST_INPUTS = [
[],
['a'],
['a','b'],
['/home','blah.txt'],
['/home/username','blah.txt'],
['/home/username/','blah.txt'],
['/home/subdir','nested','1\\2','blah.txt'],
];
$EXPECTED_OUTPUTS = [
'',
'a',
'a/b',
'/home/blah.txt',
'/home/username/blah.txt',
'/home/username/blah.txt',
'/home/subdir/nested/1/2/blah.txt',
];
for ($i = 0; $i < count($TEST_INPUTS); $i++) {
$actualPath = Paths::join_with_separator('/', $TEST_INPUTS[$i]);
$expectedPath = $EXPECTED_OUTPUTS[$i];
$this->assertEquals($expectedPath, $actualPath);
}
}
}
$args = [sys_get_temp_dir(), "path1","path2", "filename.pdf"];
$filename = implode( DIRECTORY_SEPARATOR, $args);
// output "C:\Users\User\AppData\Local\Temp\path1\path2\filename.pdf"
I liked several solutions presented. But those who does replacing all '/+' into '/' (regular expressions) are forgetting that os.path.join() from python can handle this kind of join:
os.path.join('http://example.com/parent/path', 'subdir/file.html')
Result: 'http://example.com/parent/path/subdir/file.html'