Remove trailing slash on domain extensions without trailing directory - php

I'm importing data from a csv and I've been looking high and low for a particular regular expression to remove trailing slashes from domain names without a directory after it. See the following example:
example.com/ (remove trailing slash)
example.co.uk/ (remove trailing slash)
example.com/gb/ (do not remove trailing slash)
Can anyone help me out with this or at least point me in the right direction?
Edit: This is my progress so far, I've only matched the extension at the moment but it's picking up those domains with trailing directories.
[a-z0-9\-]+[a-z0-9]\/[a-z]
Many thanks

I don't know how it would compare to a regular expression performance-wise, but you can do it without one.
A simple example:
$string = rtrim ($string, '/');
$string .= (strpos($string, '/') === false) ? '' : '/';
In the second line I'm only adding a / at the end if the string already contains one (to separate domain from folder).
A more solid approach would probably be to only rtrim if the first / found, is the last character of the string.

not sure,
but you can try this,
if it is a $_SERVER['SERVER_NAME'] only then remove slash otherwise keep it
because $_SERVER['SERVER_NAME'] will return URL without any directory
try this
/^(http|https|ftp)\:\/\/[a-z0-9\-\.]+\.[a-z]{2,3}(:[a-z0-9]*)?\/?([a-z0-9\-\._\?\,\'\/\\\+&%\$#\=~])*$/i

you could test for a match on /[a-z]/, then remove the last charater if it's not found.
this is javascript, but it'd be similar in php.
/\/[a-z]+\//
var txt = 'example.com/gb/';
var match = txt.match(/\/[a-z]+\//);
if (!match) {
alert(txt.substring(txt,txt.length-1));
}
else {
alert(txt);
}
http://jsfiddle.net/xjKTS/

Try this, it works:
<?
$result = preg_replace('/^([^\/]+)(\/)$/','$1',$your_data);
?>
I have tested like this:
$reg = '/^([^\/]+)(\/)$/';
echo preg_replace($reg,'$1',$str1);//example.com
echo preg_replace($reg,'$1',$str2);//example.co.uk
echo preg_replace($reg,'$1',$str3);//example.com/gb/
?>

Related

Regex to Remove Everything After 4th Slash in URL

I'm working in PHP with friendly URL paths in the form of:
/2011/09/here-is-the-title
/2011/09/here-is-the-title/2
I need to standardize these URL paths to remove anything after the 4 slash including the slash itself. The value after the 4th slash is sometimes a number, but can also be any parameter.
Any thoughts on how I could do this? I imagine regex could handle it, but I'm terrible with it. I also thought a combination of strpos and substr might be able to handle it, but cannot quite figure it out.
You can use explode() function:
$parts = explode('/', '/2011/09/here-is-the-title/2');
$output = implode('/', array_slice($parts, 0, 4));
Replace
%^((/[^/]*){3}).*%g
with $1.
see http://regexr.com?2vlr8 for a live example
If your regex implementation support arbitrary length look-behind assertions you could replace
(?<=^[^/]*(/[^/]*){3})/.*$
with an empty string.
If it does not, you can replace
^([^/]*(?:/[^/]*){3})/.*$
with the contents of the first capturing group. A PHP example for the second one can be found at ideone.com.
you could also use a loop:
result="";
for char c in URL:
if(c is a slash) count++;
if(count<4) result=result+c;
else break;

How to fix a path with regex in php for PATHS and not break URLs?

I want to replace // but not ://. I'm using this function to fix broken urls:
function fix ($path)
{
return preg_replace( "/\/+/", "/", $path );
}
For example:
Input:
a//a//s/b/d//df//a/s/
Output (collapsed blocks of more than one slash):
a/a/s/b/d/df/a/s/
That is OK, but if I pass a URL I break the http:// part, and end up with http:/. For example:
http://www.domain.com/a/a/s/b/d/df/a/s/
I get:
http:/www.domain.com/a/a/s/b/d/df/a/s/
I want to keep the http:// intact:
http://www.domain.com/a/a/s/b/d/df/a/s/
You can solve it rather easily using a negative lookbehind:
function fix ($path)
{
return preg_replace("#(?<!:)/{2,}#", "/", $path);
}
Note that I've also changed your delimiter from / to #, so you don't have to escape slashes.
Working example: http://ideone.com/6zGBg
This can still match the second slash if you have more than two (file://// -> file://). If this is a problem, you can use #(?<![:/])/{2,}#.
Example: http://ideone.com/T2mlR
return preg_replace("/[^:]\/+/", "/", $path);

PHP: how to add trailing slash to absolute URL

I have a list of absolute URLs. I need to make sure that they all have trailing slashes, as applicable. So:
http://www.domain.com/ <-- does not need a trailing slash
http://www.domain.com <-- needs a trailing slash
http://www.domain.com/index.php <-- does not need a trailing slash
http://www.domain.com/?message=hello <-- does not need a trailing slash
I'm guessing I need to use regex, but matching URLs are a pain. Was hoping for an easier solution. Ideas?
For this very specific problem, not using a regex at all might be an option as well. If your list is long (several thousand URLs) and time is of any concern, you could choose to hand-code this very simple manipulation.
This will do the same:
$str .= (substr($str, -1) == '/' ? '' : '/');
It is of course not nearly as elegant or flexible as a regular expression, but it avoids the overhead of parsing the regular expression string and it will run as fast as PHP is able to do it.
It is arguably less readable than the regex, though this depends on how comfortable the reader is with regex syntax (some people might acually find it more readable).
It will certainly not check that the string is really a well-formed URL (such as e.g. zerkms' regex), but you already know that your strings are URLs anyway, so that is a bit redundant.
Though, if your list is something like 10 or 20 URLs, forget this post. Use a regex, the difference will be zero.
Rather than doing this using regex, you could use parse_url() to do this.
For example:
$url = parse_url("http://www.example.com/ab/abc.html?a=b#xyz");
if(!isset($url['path'])) $url['path'] = '/';
$surl = $url['scheme']."://".$url['host'].$url['path'].'?'.$url['query'].'#'.$url['fragment'];
echo $surl;
$url = 'http://www.domain.com';
$need_to_add_trailing_slash = preg_match('~^https?://[^/]+$~', $url);
Try this:
if (!preg_match("/.*\/$/", $url)) {
$url = "$url" . "/";
}
This may not be the most elegant solution, but it works like a charm. First we get the full url, then check to see if it has a a trailing slash. If not, check to see that there is no query string, it isn't an actual file, and isn't an actual directory. If the url meets all these conditions we do a 301 redirect with the trailing slash added.
If you're unfamiliar with PHP headers... note that there cannot be any output - not even whitespace - before this code.
$url = $_SERVER['REQUEST_URI'];
$lastchar = substr( $url, -1 );
if ( $lastchar != '/' ):
if ( !$_SERVER['QUERY_STRING'] and !is_file( $_SERVER['DOCUMENT_ROOT'].$url ) and !is_dir( $_SERVER['DOCUMENT_ROOT'].$url ) ):
header("HTTP/1.1 301 Moved Permanently");
header( "Location: $url/" );
endif;
endif;

removing dots and slashes regex - non relative

how could I remove the trailing slashes and dots from a non root-relative path.
For instance, ../../../somefile/here/ (independently on how deep it is) so I just get /somefile/here/
No regex needed, rather use ltrim() with /. . Like this:
echo "/".ltrim("../../../somefile/here/", "/.");
This outputs:
/somefile/here/
You could use the realpath() function PHP provides. This requires the file to exist, however.
If I understood you correctly:
$path = "/".str_replace("../","","../../../somefile/here/");
This should work:
<?php
echo "/".preg_replace('/\.\.\/+/',"","../../../somefile/here/")
?>
You can test it here.
You could try :
<?php
$str = '../../../somefile/here/';
$str = preg_replace('~(?:\.\./)+~', '/', $str);
echo $str,"\n";
?>
(\.*/)*(?<capturegroup>.*)
The first group matches some number of dots followed by a slash, an unlimited number of times; the second group is the one you're interested in. This will strip your leading slash, so prepend a slash.
Beware that this is doing absolutely no verification that your leading string of slashes and periods isn't something patently stupid. However, it won't strip leading dots off your path, like the obvious ([./])* pattern for the first group would; it finds the longest string of dots and slashes that ends with a slash, so it won't hurt your real path if it begins with a dot.
Be aware that the obvious "/." ltrim() strategy will strip leading dots from directory names, which is Bad if your first directory has one- entirely plausible, since leading dots are used for hidden directories.

Best way to remove trailing slashes in URLs with PHP

I have some URLs, like www.amazon.com/, www.digg.com or www.microsoft.com/ and I want to remove the trailing slash, if it exists, so not just the last character. Is there a trim or rtrim for this?
You put rtrim in your question, why not just look it up?
$url = rtrim($url,"/");
As a side note, look up any PHP function by doing the following:
http://php.net/functionname
http://php.net/rtrim
http://php.net/trim
(rtrim stands for 'Right trim')
Simple and works across both Windows and Unix:
$url = rtrim($url, '/\\')
I came here looking for a way to remove trailing slash and redirect the browser, I have come up with an answer that I would like to share for anyone coming after me:
//remove trailing slash from uri
if( ($_SERVER['REQUEST_URI'] != "/") and preg_match('{/$}',$_SERVER['REQUEST_URI']) ) {
header ('Location: '.preg_replace('{/$}', '', $_SERVER['REQUEST_URI']));
exit();
}
The ($_SERVER['REQUEST_URI'] != "/") will avoid host URI e.g www.amazon.com/ because web browsers always send a trailing slash after a domain name, and preg_match('{/$}',$_SERVER['REQUEST_URI']) will match all other URI with trailing slash as last character. Then preg_replace('{/$}', '', $_SERVER['REQUEST_URI']) will remove the slash and hand over to header() to redirect. The exit() function is important to stop any further code execution.
$urls="www.amazon.com/ www.digg.com/ www.microsoft.com/";
echo preg_replace("/\b\//","",$urls);

Categories