I want to create my personal realpath() function which uses regex and doesn't expect that file exists.
What I did so far
function my_realpath (string $path): string {
if ($path[0] != '/') {
$path = __DIR__.'/../../'.$path;
}
$path = preg_replace("~/\./~", '', $path);
$path = preg_replace("~\w+/\.\./~", '', $path); // removes ../ from path
return $path;
}
What is not correct
The problem is if I have this string:
"folders/folder1/folder5/../../folder2"
it removes only first occurence (folder5/../):
"folders/folder1/../folder2"
Question
How to I remove (with regex) all folders followed by same number of "../" after them?
Examples
"folders/folder1/folder5/../../folder2" -> "folders/folder2"
"folders/folder1/../../../folder2" -> "../folder2"
"folders/folder1/folder5/../folder2" -> "folders/folder1/folder2"
Can we tell regex that: "~(\w+){n}/(../){n}~", n being greedy but same in both groups?
You can use a recursion-based pattern like
preg_replace('~(?<=/|^)(?!\.\.(?![^/]))[^/]+/(?R)?\.\.(?:/|$)~', '', $url)
See the regex demo. Details:
(?<=/|^) - immediately to the left, there must be / or start of string (if the strings are served as separate strings, eqqual to a more efficient (?<![^/]))
(?!\.\.(?![^/])) - immediately to the right, there should be no .. that are followed with / or end of string
[^/]+ - one or more chars other than /
/ - a / char
(?R)? - recurse the whole pattern, optionally
\.\.(?:/|$) - .. followed with a / char or end of string.
See the PHP demo:
$strings = ["folders/folder1/folder5/../../folder2", "folders/folder1/../../../folder2", "folders/folder1/folder5/../folder2"];
foreach ($strings as $url) {
echo preg_replace('~(?<=/|^)(?!\.\.(?![^/]))[^/\n]+/(?R)?\.\.(?:/|$)~', '', $url) . PHP_EOL;
}
// => folders/folder2, ../folder2, folders/folder1/folder2
Alternatively, you can use
(?<![^/])(?!\.\.(?![^/]))[^/]+/\.\.(?:/|$)
See the regex demo. Details:
(?<![^/]) - immediately to the left, there must be start of string or a / char
(?!\.\.(?![^/])) - immediately to the right, there should be no .. that are followed with / or end of string
[^/]+ - one or more chars other than /
/\.\. - /.. substring followed with...
(?:/|$) - / or end of string.
See the PHP demo:
$strings = ["folders/folder1/folder5/../../folder2", "folders/folder1/../../../folder2", "folders/folder1/folder5/../folder2"];
foreach ($strings as $url) {
$count = 0;
do {
$url = preg_replace('~(?<![^/])(?!\.\.(?![^/]))[^/]+/\.\.(?:/|$)~', '', $url, -1, $count);
} while ($count > 0);
echo "$url" . PHP_EOL;
}
The $count argument in preg_replace('~(?<![^/])(?!\.\.(?![^/]))[^/]+/\.\.(?:/|$)~', '', $url, -1, $count) keeps the number of replacements, and the replacing goes on until no match is found.
Output:
folders/folder2
../folder2
folders/folder1/folder2
You could as well use a non-regex approach:
<?php
$strings = ["folders/folder1/folder5/../../folder2", "folders/folder1/../../../folder2", "folders/folder1/folder5/../folder2"];
function make_path($string) {
$parts = explode("/", $string);
$new_folder = [];
for ($i=0; $i<count($parts); $i++) {
if (($parts[$i] == "..") and count($new_folder) >= 1) {
array_pop($new_folder);
} else {
$new_folder[] = $parts[$i];
}
}
return implode("/", $new_folder);
}
$new_folders = array_map('make_path', $strings);
print_r($new_folders);
?>
This yields
Array
(
[0] => folders/folder2
[1] => ../folder2
[2] => folders/folder1/folder2
)
See a demo on ideone.com.
Related
So, I have a URL I am receiving via an API.
The pattern would be similar to https://www.example.co.uk/category/000000000k/0000
I want to replace the second k character, with a x char.
I tried using str_replace() but it also replaces the first k character.
So what I did was use preg_split(), so my solution is:
$url = preg_split( '/[\/]+/', 'https://www.example.co.uk/category/000000000k/0000' );
$url = $url[0] . '//' . $url[1] . '/' . $url[2] . '/' . str_replace( 'k', 'x', $url[3] ) . '/' . $url[4];
So, my solution works are long as the URL pattern does not change. However, I think it could be more elegant if my Regex was up to par.
Anyone here could point me to a better path?
If k/ always have numbers before that and another part doesn't have same pattern, this may work.
$url = "https://www.example.co.uk/category/000000000k/0000";
$url = preg_replace("/([0-9])k\//", "$1x/", $url);
Anoter pattern. Find 3rd /, make sub string, and replace.
$url = "https://www.example.co.uk/category/000000000k/0000";
$pos = 0;
for ($i = 0; $i < 3; $i++) {
$pos = strpos($url, "/", $pos);
if (FALSE === $pos) {
// error
break;
}
$pos++;
}
if ($pos) {
$url_tmp = substr($url, $pos);
$url_tmp = str_replace("k/", "x/", $url_tmp);
$url = substr($url, 0, $pos).$url_tmp;
} else {
// $pos is 0 or FALSE
// error
}
If url source is not reliable, more checks may be needed.
$url = "https://www.example.co.uk/category/000000000k/0000";
echo preg_replace("/^[^k]+k[^k]+\Kk/", "x", $url),"\n";
Output:
https://www.example.co.uk/category/000000000x/0000
Explanation:
/ # regex delimiter
^ # beginning of line
[^k]+ # 1 or more any character that is not k
k # 1rst letter k
[^k]+ # 1 or more any character that is not k
\K # forget all we have seen until this position
k # 2nd letter k
/ # regex delimiter
I am looking for finding middle part of a string using starting tag and ending tag in PHP.
$str = 'Abc/hello#gmail.com/1267890(A-29)';
$agcodedup = substr($str, '(', -1);
$agcode = substr($agcodedup, 1);
final expected value of agcode:
$agcode = 'A-29';
You can use preg_match
$str = 'Abc/hello#gmail.com/1267890(A-29)';
if( preg_match('/\(([^)]+)\)/', $string, $match ) ) echo $match[1]."\n\n";
Outputs
A-29
You can check it out here
http://sandbox.onlinephpfunctions.com/code/5b6aa0bf9725b62b87b94edbccc2df1d73450ee4
Basically Regular expression says:
start match, matches \( Open Paren literal
capture group ( .. )
match everything except [^)]+ Close Paren )
end match, matches \) Close Paren literal
Oh and if you really have your heart set on substr here you go:
$str = 'Abc/hello#gmail.com/1267890(A-29)';
//this is the location/index of the ( OPEN_PAREN
//strlen 0 based so we add +1 to offset it
$start = strpos( $str,'(') +1;
//this is the location/index of the ) CLOSE_PAREN.
$end = strpos( $str,')');
//we need the length of the substring for the third argument, not its index
$len = ($end-$start);
echo substr($str, $start, $len );
Ouputs
A-29
And you can test this here
http://sandbox.onlinephpfunctions.com/code/88723be11fc82d88316d32a522030b149a4788aa
If it was me, I would benchmark both methods, and see which is faster.
May this helps to you.
function getStringBetween($str, $from, $to, $withFromAndTo = false)
{
$sub = substr($str, strpos($str,$from)+strlen($from),strlen($str));
if ($withFromAndTo) {
return $from . substr($sub,0, strrpos($sub,$to)) . $to;
} else {
return substr($sub,0, strrpos($sub,$to));
}
$inputString = "Abc/hello#gmail.com/1267890(A-29)";
$outputString = getStringBetween($inputString, '(', ')');
echo $outputString;
//output will be A-29
$outputString = getStringBetween($inputString, '(', ')', true);
echo $outputString;
//output will be (A-29)
return $outputString;
}
How to determine, using regexp or something else in PHP, that following urls match some patterns with tokens (url => pattern):
node/11221 => node/%node
node/38429/news => node/%node/news
album/34234/shadowbox/321023 => album/%album/shadowbox/%photo
Thanks in advance!
Update 1
Wrote the following script:
<?php
$patterns = [
"node/%node",
"node/%node/news",
"album/%album/shadowbox/%photo",
"media/photo",
"blogs",
"news",
"node/%node/players",
];
$url = "node/11111/news";
foreach ($patterns as $pattern) {
$result_pattern = preg_replace("/\/%[^\/]+/x", '/*', $pattern);
$to_replace = ['/\\\\\*/']; // asterisks
$replacements = ['[^\/]+'];
$result_pattern = preg_quote($result_pattern, '/');
$result_pattern = '/^(' . preg_replace($to_replace, $replacements, $result_pattern) . ')$/';
if (preg_match($result_pattern, $url)) {
echo "<pre>" . $pattern . "</pre>";
}
}
?>
Could anyone analyze whether this code is good enough? And also explain why there is so many slashes in this part $to_replace = ['/\\\\\*/']; (regarding the replacement, found exactly such solution on the Internet).
If you know the format beforehand you can use preg_match. For example in the first example, you know %node can only be numbers. Matching multiples should be as as easy as we did it earlier, just store the regex in the array:
$patterns = array(
'node/%node' => '|node/[0-9]+$|',
'node/%node/news' => '|node/[0-9]+/news|',
'album/%album/shadowbox/%photo' => '|album/[0-9]+/shadowbox/[0-9]+|',
'media/photo' => '|media/photo|',
'blogs' => '|blogs|',
'news' => '|news|',
'node/%node/players' => '|node/[0-9]+/players|',
);
$url = "node/11111/players";
foreach ($patterns as $pattern => $regex) {
preg_match($regex, $url, $results);
if (!empty($results)) {
echo "<pre>" . $pattern . "</pre>";
}
}
Notice how I added the question mark $ to end of the first rule, this will insure that it doesn't break into the second rule.
Here is the generic solution to the solution above
<?php
// The url part
$url = "/node/123/hello/strText";
// The pattern part
$pattern = "/node/:id/hello/:test";
// Replace all variables with * using regex
$buffer = preg_replace("(:[a-z]+)", "*", $pattern);
// Explode to get strings at *
// In this case ['/node/','/hello/']
$buffer = explode("*", $buffer);
// Control variables for loop execution
$IS_MATCH = True;
$CAPTURE = [];
for ($i=0; $i < sizeof($buffer); $i++) {
$slug = $buffer[$i];
$real_slug = substr($url, 0 , strlen($slug));
if (!strcmp($slug, $real_slug)) {
$url = substr($url, strlen($slug));
$temp = explode("/", $url)[0];
$CAPTURE[sizeof($CAPTURE)+1] = $temp;
$url = substr($url,strlen($temp));
}else {
$IS_MATCH = False;
}
}
unset($CAPTURE[sizeof($CAPTURE)]);
if($IS_MATCH)
print_r($CAPTURE);
else
print "Not a match";
?>
You can pretty much convert the code above into a function and pass parameters to check against the array case. The first step is regex to convert all variables into * and the explode by *. Finally loop over this array and keep comparing to the url to see if the pattern matches using simple string comparison.
As long as the pattern is fixed, you can use preg_match() function:
$urls = array (
"node/11221",
"node/38429/news",
"album/34234/shadowbox/321023",
);
foreach ($urls as $url)
{
if (preg_match ("|node/([\d]+$)|", $url, $matches))
{
print "Node is {$matches[1]}\n";
}
elseif (preg_match ("|node/([\d]+)/news|", $url, $matches))
{
print "Node is {$matches[1]}\n";
}
elseif (preg_match ("|album/([\d]+)/shadowbox/([\d]+)$|", $url, $matches))
{
print "Album is {$matches[1]} and photo is {$matches[2]}\n";
}
}
For other patterns to match, adjust as necessary.
I am trying to remove the # sign from a block of text. The problem is that in certain cases (when at the beginning of a line, the # sign needs to stay.
I have succeeded by using the RegEx pattern .\#, however on when the # sign does get removed it also removes the character preceding it.
Goal: remove all # signs UNLESS the # sign is the first character in the line.
<?php
function cleanFile($text)
{
$pattern = '/.\#/';
$replacement = '%40';
$val = preg_replace($pattern, $replacement, $text);
$text = $val;
return $text;
};
$text = ' Test: test#test.com'."\n";
$text .= '#Test: Leave the leading at sign alone'."\n";
$text .= '#Test: test#test.com'."\n";
$valResult = cleanFile($text);
echo $valResult;
?>
Output:
Test: tes%40test.com
#Test: Leave the leading at sign alone
#Test: tes%40test.com
You can do this with regex using a negative lookbehind: /(?<!^)#/m (an # sign not preceded by the start of a line (or the start of the string if you skip out the m modifier)).
Regex 101 Demo
In code:
<?php
$string = "Test: test#test.com\n#Test: Leave the leading at sign alone\n#Test: test#test.com;";
$string = preg_replace("/(?<!^)#/m", "%40", $string);
var_dump($string);
?>
which outputs the following:
string(84) "Test: test%40test.com
#Test: Leave the leading at sign alone
#Test: test%40test.com;"
Codepad demo
There's no need for regexp in such simple case.
function clean($source) {
$prefix = '';
$offset = 0;
if( $source[0] == '#' ) {
$prefix = '#';
$offset = 1;
}
return $prefix . str_replace('#', '', substr( $source, $offset ));
}
and test case
$test = array( '#foo#bar', 'foo#bar' );
foreach( $test as $src ) {
echo $src . ' => ' . clean($src) . "\n";
}
would give:
#foo#bar => #foobar
foo#bar => foobar
the syntax [^] means negative match (as in don't match), but I don't think the following would work
$pattern = '/[^]^#/';
I'm writing a quick preg_replace to strip comments from CSS. CSS comments usually have this syntax:
/* Development Classes*/
/* Un-comment me for easy testing
(will make it simpler to see errors) */
So I'm trying to kill everything between /* and */, like so:
$pattern = "#/\*[^(\*/)]*\*/#";
$replace = "";
$v = preg_replace($pattern, $replace, $v);
No dice! It seems to be choking on the forward slashes, because I can get it to remove the text of comments if I take the /s out of the pattern. I tried some simpler patterns to see if I could just lose the slashes, but they return the original string unchanged:
$pattern = "#/#";
$pattern = "/\//";
Any ideas on why I can't seem to match those slashes? Thanks!
Here's a solution:
$regex = array(
"`^([\t\s]+)`ism"=>'',
"`^\/\*(.+?)\*\/`ism"=>"",
"`([\n\A;]+)\/\*(.+?)\*\/`ism"=>"$1",
"`([\n\A;\s]+)//(.+?)[\n\r]`ism"=>"$1\n",
"`(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+`ism"=>"\n"
);
$buffer = preg_replace(array_keys($regex),$regex,$buffer);
Taken from the Script/Stylesheet Pre-Processor in Samstyle PHP Framework
See: http://code.google.com/p/samstyle-php-framework/source/browse/trunk/sp.php
csstest.php:
<?php
$buffer = file_get_contents('test.css');
$regex = array(
"`^([\t\s]+)`ism"=>'',
"`^\/\*(.+?)\*\/`ism"=>"",
"`([\n\A;]+)\/\*(.+?)\*\/`ism"=>"$1",
"`([\n\A;\s]+)//(.+?)[\n\r]`ism"=>"$1\n",
"`(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+`ism"=>"\n"
);
$buffer = preg_replace(array_keys($regex),$regex,$buffer);
echo $buffer;
?>
test.css:
/* testing to remove this */
.test{}
Output of csstest.php:
.test{}
I don't believe you can use grouping within a negated character class like you have there. What you're going to want to use is called Assertions, of which there are two types. "look-ahead" and "look-behind".
The pattern you're looking for in English is basically, "forward slash, literal wildcard, anything that isn't followed by a forward slash or anything other than a literal wildcard that is followed by a forward slash or a forward slash that isn't preceded by a literal wildcard zero or more times, literal wild card, forward slash"
<?php
$str = '/* one */ onemore
/*
* a
* b
**/
stuff // single line
/**/';
preg_match_all('#/\*(?:.(?!/)|[^\*](?=/)|(?<!\*)/)*\*/#s', $str, $matches);
print_r($matches);
?>
I had the same issue.
To solve it, I first simplified the code by replacing "/ASTERIX" and "ASTERIX/" with different identifiers and then used those as the start and end markers.
$code = str_replace("/*","_COMSTART",$code);
$code = str_replace("*/","COMEND_",$code);
$code = preg_replace("/_COMSTART.*?COMEND_/s","",$code);
The /s flag tells the search to go onto new lines
There's a number of suggestions out there, but this one seems to work for me:
$v=preg_replace('!/\*[^*]*\*+([^/][^*]*\*+)*/!', '', $v);
so
"/* abc */.test { color:white; } /* XYZ */.test2 { padding:1px; /* DEF */} /* QWERTY */"
gives
.test { color:white; } .test2 { padding:1px; }
see https://onlinephp.io/c/2ae1c for working test
Just for fun(and small project of course) I made a non-regexp version of a such code (I hope it's faster):
function removeCommentFromCss( $textContent )
{
$clearText = "";
$charsInCss = strlen( $textContent );
$searchForStart = true;
for( $index = 0; $index < $charsInCss; $index++ )
{
if ( $searchForStart )
{
if ( $textContent[ $index ] == "/" && (( $index + 1 ) < $charsInCss ) && $textContent[ $index + 1 ] == "*" )
{
$searchForStart = false;
continue;
}
else
{
$clearText .= $textContent[ $index ];
}
}
else
{
if ( $textContent[ $index ] == "*" && (( $index + 1 ) < $charsInCss ) && $textContent[ $index + 1 ] == "/" )
{
$searchForStart = true;
$index++;
continue;
}
}
}
return $clearText;
}