PHP - reduce multiple slashes to single slash - php

I have a regular expression that I use to reduce multiple slashes to single slashes. The purpose is to read a url that is previously converted to a human readable link using mod_rewrite in apache, like this :
http://www.website.com/about/me
This works :
$uri = 'about//me';
$uri = preg_replace('#//+#', '/', $uri);
echo $uri; // echoes 'about/me'
This doesn't work :
$uri = '/about//me';
$uri = preg_replace('#//+#', '/', $uri);
echo $uri; // echoes '/about/me'
I need to be able to work with each url parameter alone, but in the second example, if I explode the trailling slash, it would return me 3 segments instead of 2 segments. I can verify in PHP if any if the parameters is empty, but as I'm using that regular expression, it would be nice if the regular expression already take care of that for me, so that I don't need to worry about segment validation.
Any thoughts?

str_replace may be faster in this case
$uri = str_replace("//","/",$uri)
Secondly: use trim: http://hu.php.net/manual/en/function.trim.php
$uri = trim($uri,"/");

This converts double slashes in a string to a single slash, but the advantage of this code is that the slashes in the protocol portion of the string (http://) are kept.
preg_replace("#(^|[^:])//+#", "\\1/", $str);

How about running a second replace on $uri?
$uri = preg_replace('#^/#', '', $uri);
That way a trailing slash is removed. Doing it all in one preg_replace beats me :)
Using ltrim could also be a way to go (probably even faster).

I need to be able to work with each
url parameter alone, but in the second
example, if I explode the trailling
slash, it would return me 3 segments
instead of 2 segments.
One fix for this is to use preg_split with the third argument set to PREG_SPLIT_NO_EMPTY:
$uri = '/about//me';
$uri_segments = preg_split('#/#', $uri, PREG_SPLIT_NO_EMPTY);
// $uri_segments[0] == 'about';
// $uri_segments[1] == 'me';

you can combine all three alternatives into one regexp
$urls = array(
'about/me',
'/about//me',
'/about///me/',
'////about///me//'
);
print_r(
preg_replace('~^/+|/+$|/(?=/)~', '', $urls)
);

You may split the string via preg_split instead, skipping the sanitizing altogether. You still have to deal with the empty chunks, though.

Late but all these methods will remove http:// slashes too, but this.
function to_single_slashes($input) {
return preg_replace('~(^|[^:])//+~', '\\1/', $input);
}
# out: http://localhost/lorem-ipsum/123/456/
print to_single_slashes('http:///////localhost////lorem-ipsum/123/////456/');

Related

Write regular expression in preg_replace

I still don't understand how regular expression work with preg_replace. I have some url in text:
site.com/user/login.php?valid=tru
site.com/eng/page/some_page.php?valid=tru&anothervar=1
I want to change it so it become this
site.com/user/login/
site.com/eng/page/some_page/
preg_replace(" 'no_mater_what_1'.php'no_mater_what_2' " , 'no_mater_what_1'/ , $some_var);
To avoid traps, like an other .php substring in the path, you can use this replacement:
$url = preg_replace('~\.php(?:[?#]\N*|\z)~i', '', $url, -1, $c);
if (!$c) // not a php file, do something else
or in this way:
if (preg_match('~[^?#]+\.php(?=[?#]|\z)~Ai', $url, $m))
$url = $m[0];
else
// not a php file, do something else
This way ensures that the .php matched is the extension of the file because the regex engine will find the leftmost result that is followed by either a ? for the query part, a # for the fragment part or the end of the string.
pattern elements:
\N: a character that isn't a newline.
\z: anchor for the end of the string.
A: modifier that anchors the pattern at the start of the string
(?=...): lookahead assertion
The advantage of this approach is the safety with a good efficiency.
An other way with parse_url:
You can use parse_url to separate an url into parts. If this way is a little fastidious because you need to rebuild the url after (and the way you will rebuild it depends of the elements present in the url), it's however far from impossible and provides too a safe way.
But why not simply do this:
$replace = explode('.php',$some_var);
$replace = $replace[0] . '/';
Because that I find it necessary to use a regular expression, because ".php" is not repeated in the string.
This should work
$subject = 'site.com/eng/page/some_page.php?valid=tru&anothervar=1';
if (preg_match('/(.*)\.php(?:\?.*)/', $subject, $regs)) {
$result = $regs[1] .'/';
echo $subject .' => '. $result;
} else {
echo 'NOT FOUND';
}
The regular expression doing the magic is this
/(.*)\.php(?:\?.*)?/
by parts:
(.*)\.php
Capture everything until (excluding) ".php"
(?:\?.*)
Search for the pattern "?..."
?
Make that last pattern optional
Because your two examples shows up on the same line, this looks a bit confusing. However, it appears that you want to replace everything from .php to the end of the line with a /. So, use:
$new_link = preg_replace('/\.php.*$/', '/', $old_link);
You need the \ in front of the . because . is a special character that needs to be escaped to make it work like a period. Then, you look for php, in that order, followed by anything to the end of the line ($ means end of the line). You replace all of that with /.

Remove trailing slash on domain extensions without trailing directory

I'm importing data from a csv and I've been looking high and low for a particular regular expression to remove trailing slashes from domain names without a directory after it. See the following example:
example.com/ (remove trailing slash)
example.co.uk/ (remove trailing slash)
example.com/gb/ (do not remove trailing slash)
Can anyone help me out with this or at least point me in the right direction?
Edit: This is my progress so far, I've only matched the extension at the moment but it's picking up those domains with trailing directories.
[a-z0-9\-]+[a-z0-9]\/[a-z]
Many thanks
I don't know how it would compare to a regular expression performance-wise, but you can do it without one.
A simple example:
$string = rtrim ($string, '/');
$string .= (strpos($string, '/') === false) ? '' : '/';
In the second line I'm only adding a / at the end if the string already contains one (to separate domain from folder).
A more solid approach would probably be to only rtrim if the first / found, is the last character of the string.
not sure,
but you can try this,
if it is a $_SERVER['SERVER_NAME'] only then remove slash otherwise keep it
because $_SERVER['SERVER_NAME'] will return URL without any directory
try this
/^(http|https|ftp)\:\/\/[a-z0-9\-\.]+\.[a-z]{2,3}(:[a-z0-9]*)?\/?([a-z0-9\-\._\?\,\'\/\\\+&%\$#\=~])*$/i
you could test for a match on /[a-z]/, then remove the last charater if it's not found.
this is javascript, but it'd be similar in php.
/\/[a-z]+\//
var txt = 'example.com/gb/';
var match = txt.match(/\/[a-z]+\//);
if (!match) {
alert(txt.substring(txt,txt.length-1));
}
else {
alert(txt);
}
http://jsfiddle.net/xjKTS/
Try this, it works:
<?
$result = preg_replace('/^([^\/]+)(\/)$/','$1',$your_data);
?>
I have tested like this:
$reg = '/^([^\/]+)(\/)$/';
echo preg_replace($reg,'$1',$str1);//example.com
echo preg_replace($reg,'$1',$str2);//example.co.uk
echo preg_replace($reg,'$1',$str3);//example.com/gb/
?>

Convert absolute to relative url with preg_replace

(I searched, and found lots of questions about converting relative to absolute urls, but nothing for absolute to relative.)
I'd like to take input from a form field and end up with a relative url. Ideally, this would be able to handle any of the following inputs and end up with /page-slug.
http://example.com/page-slug
http://www.example.com/page-slug
https://example.com/page-slug
https://www.example.com/page-slug
example.com/page-slug
/page-slug
And maybe more I'm not thinking of...?
Edit: I'd also like this to work for something where the relative url is e.g. /page/post (i.e. something with more than one slash).
Take a look at parse_url if you are always working with URLs. Specifically:
parse_url($url, PHP_URL_PATH)
FYI, I tested it against all your input, and it worked on all except: example.com/page-slug
Try this regexp.
#^ The start of the string
(
:// Match either ://
| Or
[^/] Not a /
)* Any number of times
#
And replace it with the empty string.
$pattern = '#^(://|[^/])+#';
$replacement = '';
echo preg_replace($pattern, $replacement, $string);
I think you want the part of the URL after the hostname, you can use parse_url:
$path = parse_url($url, PHP_URL_PATH);
Note that this gets the whole of the URL after the hostname, so http://example.com/page/slug will give /page/slug.
I would just do this a little hacky way if you know your application. I would use a regex to search for
[a-z].([(com|org|net)])

How to fix a path with regex in php for PATHS and not break URLs?

I want to replace // but not ://. I'm using this function to fix broken urls:
function fix ($path)
{
return preg_replace( "/\/+/", "/", $path );
}
For example:
Input:
a//a//s/b/d//df//a/s/
Output (collapsed blocks of more than one slash):
a/a/s/b/d/df/a/s/
That is OK, but if I pass a URL I break the http:// part, and end up with http:/. For example:
http://www.domain.com/a/a/s/b/d/df/a/s/
I get:
http:/www.domain.com/a/a/s/b/d/df/a/s/
I want to keep the http:// intact:
http://www.domain.com/a/a/s/b/d/df/a/s/
You can solve it rather easily using a negative lookbehind:
function fix ($path)
{
return preg_replace("#(?<!:)/{2,}#", "/", $path);
}
Note that I've also changed your delimiter from / to #, so you don't have to escape slashes.
Working example: http://ideone.com/6zGBg
This can still match the second slash if you have more than two (file://// -> file://). If this is a problem, you can use #(?<![:/])/{2,}#.
Example: http://ideone.com/T2mlR
return preg_replace("/[^:]\/+/", "/", $path);

Remove first forward slash in a link?

I need to remove the first forward slash inside link formatted like this:
/directory/link.php
I need to have:
directory/link.php
I'm not literate in regular expressions (preg_replace?) and those slashes are killing me..
I need your help stackoverflow!
Thank you very much!
Just because nobody has mentioned it before:
$uri = "/directory/link.php";
$uri = ltrim($uri, '/');
The benefit of this one is:
compared to the substr() solution: it works also with paths that do not start with a slash. So using the same procedure multiple times on an uri is safe.
compared to the preg_replace() solution: it's certainly much more faster. Actuating the regex-engine for such a trivial task is, in my opinion, overkill.
preg_replace('/^\//', '', $link);
If it's always the first character, you won't need a regex:
$uri = "/directory/link.php";
$uri = substr($uri, 1);

Categories