removing dots and slashes regex - non relative - php

how could I remove the trailing slashes and dots from a non root-relative path.
For instance, ../../../somefile/here/ (independently on how deep it is) so I just get /somefile/here/

No regex needed, rather use ltrim() with /. . Like this:
echo "/".ltrim("../../../somefile/here/", "/.");
This outputs:
/somefile/here/

You could use the realpath() function PHP provides. This requires the file to exist, however.

If I understood you correctly:
$path = "/".str_replace("../","","../../../somefile/here/");

This should work:
<?php
echo "/".preg_replace('/\.\.\/+/',"","../../../somefile/here/")
?>
You can test it here.

You could try :
<?php
$str = '../../../somefile/here/';
$str = preg_replace('~(?:\.\./)+~', '/', $str);
echo $str,"\n";
?>

(\.*/)*(?<capturegroup>.*)
The first group matches some number of dots followed by a slash, an unlimited number of times; the second group is the one you're interested in. This will strip your leading slash, so prepend a slash.
Beware that this is doing absolutely no verification that your leading string of slashes and periods isn't something patently stupid. However, it won't strip leading dots off your path, like the obvious ([./])* pattern for the first group would; it finds the longest string of dots and slashes that ends with a slash, so it won't hurt your real path if it begins with a dot.
Be aware that the obvious "/." ltrim() strategy will strip leading dots from directory names, which is Bad if your first directory has one- entirely plausible, since leading dots are used for hidden directories.

Related

how to use preg_replace to replace all ocurrences of a given pattern?

I have a pattern (a slash followed by 1 or more dashes) inside strings that could occur many times like
/hi/--hello/-hi
I want to replace it with
/hi/hello/hi
I have tried
$str = preg_replace('/\/-+/', '/', $subject);
but this does not seem to be working properly. Am I missing something. I use http://www.debuggex.com/ to test my regex and \/-+ does not seem to match the string.
The reason this doesn't work in debuggex.com is that you don't have to put the delimiters on this site.
Remove the slashes at the begining and at the end from the input box.
Write only: \/-+ or /-+ since you don't need to escape the slashes.

regex to clean up url

I am looking for a way to get a valid url out of a string like:
$string = 'http://somesite.com/directory//sites/9/my_forms/3-895a3e/somefilename.jpg|:||:||:||:|19845';
My original solution was:
preg_match('#^[^:|]*#', str_replace('//', '/', $string), $modifiedPath);
But obviously its going to remove a slash from the http:// instead of the one in the middle of the string.
My expected output that I want from the original is:
http://somesite.com/directory/sites/9/my_forms/3-895a3e/somefilename.jpg
I could always break off the http part of the string first but would like a more elegant solution in the form of regex if possible. Thanks.
This will do exactly what you are asking:
<?php
$string = 'http://somesite.com/directory//sites/9/my_forms/3-895a3e/somefilename.jpg|:||:||:||:|19845';
preg_match('/^([^|]+)/', $string, $m); // get everything up to and NOT including the first pipe (|)
$string = $m[1];
$string = preg_replace('/(?<!:)\/\//', '/' ,$string); // replace all occurrences of // as long as they are not preceded by :
echo $string; // outputs: http://somesite.com/directory/sites/9/my_forms/3-895a3e/somefilename.jpg
exit;
?>
EDIT:
(?<!X) in regular expressions is the syntax for what is called a lookbehind. The X is replaced with the character(s) we are testing for.
The following expression would match every instance of double slashes (/):
\/\/
But we need to make sure that the match we are looking for is NOT preceded by the : character so we need to 'lookbehind' our match to see if the : character is there. If it is then we don't want it to be counted as a match:
(?<!:)\/\/
The ! is what says NOT to match in our lookbehind. If we changed it to (?=:)\/\/ then it would only match the double slashes that did have the : preceding them.
Here is a Quick tutorial that can explain it all better than I can lookahead and lookbehind tutorial
Assuming all your strings are in the form given, you don't need any but the simplest of regexes to do this; if you want an elegant solution, then a regex is definitely not what you need. Also, double slashes are legal in a URL, just like in a Unix path, and mean the same thing a single slash does, so you don't really need to get rid of them at all.
Why not just
$url = array_shift(preg_split('/\|/', $string));
?
If you really, really care about getting rid of the double slashes in the URL, then you can follow this with
$url = preg_replace('/([^:])\/\//', '$1/', $url);
or even combine them into
$url = preg_replace('/([^:])\/\//', '$1/', array_shift(preg_split('/\|/', $string)));
although that last form gets a little bit hairy.
Since this is a quite strictly defined situation, I'd consider just one preg to be the most elegant solution.
From the top of my head:
$sanitizedURL = preg_replace('~((?<!:)/(?=/)|\\|.+)~', '', $rawURL);
Basically, what this does is look for any forward slash that IS NOT preceded by a colon (:), and IS followed bij another forward slash. It also searches for any pipe character and any character following it.
Anything found is removed from the result.
I can explain the RegEx in more detail if you like.

php regex vsprintf addign slash on file extension

I have a problem with the following piece of code: pastebin. For example:
/^\/index\.php\/index\/home\/(\w+)$/
It adds a slash before the .php extension. Any ideas how to fix it?
Well, if you pass that example as the uri I see that on line 10 you have preg_quote($uri). That should be the reason. Since dot (.) has a meaning in Regex the function is escaping it.
But that is what you want I believe since if you strip that slash your regex will match ANY character instead of the dot (including the dot). So any of these will be valid:
indexBphp
index-php
indexmphp
index.php
etc...
Dot in Regex means match any character at this position. So I believe that there is nothing wrong, right?
One way to fix this if you still want to have that dot there is to build the regex in two separate parts:
$urlDivided = explode('.php', $url);
$this->finalRegex = preg_quote($urlDivided[0]) . '.php' . preg_quote($urlDivided[1]);
Obviously, the method above assumes that you always have the '.php' extension in the url. You should do sanity checks.

Regex to Remove Everything After 4th Slash in URL

I'm working in PHP with friendly URL paths in the form of:
/2011/09/here-is-the-title
/2011/09/here-is-the-title/2
I need to standardize these URL paths to remove anything after the 4 slash including the slash itself. The value after the 4th slash is sometimes a number, but can also be any parameter.
Any thoughts on how I could do this? I imagine regex could handle it, but I'm terrible with it. I also thought a combination of strpos and substr might be able to handle it, but cannot quite figure it out.
You can use explode() function:
$parts = explode('/', '/2011/09/here-is-the-title/2');
$output = implode('/', array_slice($parts, 0, 4));
Replace
%^((/[^/]*){3}).*%g
with $1.
see http://regexr.com?2vlr8 for a live example
If your regex implementation support arbitrary length look-behind assertions you could replace
(?<=^[^/]*(/[^/]*){3})/.*$
with an empty string.
If it does not, you can replace
^([^/]*(?:/[^/]*){3})/.*$
with the contents of the first capturing group. A PHP example for the second one can be found at ideone.com.
you could also use a loop:
result="";
for char c in URL:
if(c is a slash) count++;
if(count<4) result=result+c;
else break;

PHP Regular expression

I would like to capture the last folder in paths without the year. For this string path I would need just 'Millers Crossing' not 'Movies\Millers Crossing' which is what my current regex captures.
G:\Movies\Millers Crossing [1990]
preg_match('/\\\\(.*)\[\d{4}\]$/i', $this->parentDirPath, $title);
How about basename [docs] and substr [docs] instead of complicated expressions?
$title = substr(basename($this->parentDirPath), 0, -6);
This assumes that there will always be a year in the format [xxxx] at the end of the string.
(And it works on *nix too ;))
Update: You can still use basename to get the folder and then apply a regular expression:
$folder = basename($this->parentDirPath);
preg_match('#^(.*?)(?:\[\d{4}\])?$#', $str, $match);
$title = $match[1];
Try
preg_match('/\\\\([^\\]*)\[\d{4}\]$/i', $this->parentDirPath, $title);
Basically, instead of matching any character with ., you're matching any character but \.
It looks like you want something like this:
/([^\\])+\s\[\d{4}\]$/
That's what I'd go with, at least. Should only include whatever comes after the last backslash in the string, and the movie title will be in the first capture group.
Simpler approach:
([^\\]*)\s?\[\d{4}\]$
I believe your issue is also with you including "double backslashes" (e.g. \\\\ instead of a single \\. You can also make life easier by using a class to include characters you don't want by prefixing it with a caret (^).

Categories