Please help me write regex replace - php

I've not yet mastered regex, so would appreciate your help with the code.
I need to replace all lines that:
start with brackets or parentheses;
which may contain either a regular number of up to 3 digits or a combination of up to 3 letters;
which may be followed by a period;
which digis or the numbers may or may not be inside or tags.
Here's the example of what it needs to be replaced with:
(1)blahblah => %%(1)|blahblah
(<i>iv</i>.) blahblah => %%(<i>iv</i>.)|blahblah
[b] some stuff => %%[b]| some stuff
So the regex will need to recognize if it needs to be applied to the particular string, and if yes, put %% in the beginning of the line, then put the stuff inside the brackets, then put a pipe | (if there is a space between the brackets and the rest of the text, delete the space), and finally place the rest of the line.
So, let's assume I have an array that I'm trying to run through the function that will either process the string (if it matches the criteria), or return it unchanged.
I only need to know how to write the function.
Thanks

function my_replace ($str) {
$expr = '~
(
# opening bracket or paren
(?:\(|\[)
# optional opening tag
(?:<([a-z])>)?
# either up to 3 digits or up to 3 alphas
(?:[a-z]{1,3}|[0-9]{1,3})
# optional closing tag
(?:</\2>)?
# optional dot
\.?
# closing bracket or paren
(?:\)|])
)
# optional whitespace
\s*
# grab the rest of the string
(.+)
~ix';
return preg_replace($expr, '%%$1|$3', $str);
}
See it working

Here is my version.
It uses the fallback regular expression if the first one doesn't match (as agreed upon previously).
Demo
Code:
<?php
function do_replace($string) {
$regex = '/^(\((?:<([a-z])>)?(\d{0,3}|[a-z]{1,3})(?:<\/\2>)?(\.)?\)|\[(?:<([a-z])>)?(\d{0,3}|[a-z]{1,3})(?:<\/\2>)?(\.)?\])\s*(.*)/i';
$result = preg_match($regex, $string);
if($result) {
return preg_replace($regex, '%%$1|$8', $string);
} else {
$regex = '/^(\d{0,3}|[a-z]{1,3})\.\s*(.+)$/i';
$result = preg_match($regex, $string);
if($result) {
return preg_replace($regex, '%%$1.|$2', $string);
} else {
return $string;
}
}
}
$strings = array(
'(1)blahblah',
'(<i>iv</i>.) blahblah',
'[b] some stuff',
'25. blahblah',
'A. some other stuff. one',
'blah. some other stuff',
'text (1) text',
'2008. blah',
'[123) <-- mismatch'
);
foreach($strings as $string) echo do_replace($string) . PHP_EOL;
?>
First regular expression expanded:
$regex = '
/
^(
\(
(?:<([a-z])>)?
(
\d{0,3}
|
[a-z]{1,3}
)
(?:<\/\2>)?
(\.)?
\)
|
\[
(?:<([a-z])>)?
(
\d{0,3}
|
[a-z]{1,3}
)
(?:<\/\2>)?
(\.)?
\]
)
\s*
(.*)
/ix';

function replaceString($string){
return preg_replace('/^\s*([\{,\[,\(]+?)/', "%%$1", $string);
}

Related

PHP: Parse comma-delimited string outside single and double quotes and parentheses

I've found several partial answers to this question, but none that cover all my needs...
I am trying to parse a user generated string as if it were a series of php function arguments to determine the number of arguments:
This string:
$arg1,$arg2='ABC,DEF',$arg3="GHI\",JKL",$arg4=array(1,'2)',"3\"),")
will be inserted as the arguments of a function:
function my_function( [insert string here] ){ ... }
I need to parse the string on the commas, taking into account single- and double-quotes, parentheses, and escaped quotes and parentheses to create an array:
array(4) {
[0] => $arg1
[1] => $arg2='ABC,DEF'
[2] => $arg3="GHI\",JKL"
[3] => $arg4=array(1,'2)',"3\"),")
}
Any help with a regular expression or parser function to accomplish this is appreciated!
It isn't possible to solve this problem with a classical csv tool since there is more than one character able to protect parts of the string.
Using preg_split is possible but will result in a very complicated and inefficient pattern. So the best way is to use preg_match_all. There are however several problems to solve:
as needed, a comma enclosed in quotes or parenthesis must be ignored (seen as a character without special meaning, not as a delimiter)
you need to extract the params, but you need to check if the string has the good format too, otherwise the match results may be totally false!
For the first point, you can define subpatterns to describe each cases: the quoted parts, the parts enclosed between parenthesis, and a more general subpattern able to match a complete param and that uses the two previous subpatterns when needed.
Note that the parenthesis subpattern needs to refer to the general subpattern too, since it can contain anything (and commas too).
The second point can be solved using the \G anchor that ensures that all matchs are contiguous. But you need to be sure that the end of the string has been reached. To do that, you can add an optional empty capture group at the end of the main pattern that is created only if the anchor for the end of the string \z succeeds.
$subject = <<<'EOD'
$arg1,$arg2='ABC,DEF',$arg3="GHI\",JKL",$arg4=array(1,'2)',"3\"),")
EOD;
$pattern = <<<'EOD'
~
# named groups definitions
(?(DEFINE) # this definition group allows to define the subpatterns you want
# without matching anything
(?<quotes>
' [^'\\]*+ (?s:\\.[^'\\]*)*+ ' | " [^"\\]*+ (?s:\\.[^"\\]*)*+ "
)
(?<brackets> \( \g<content> (?: ,+ \g<content> )*+ \) )
(?<content> [^,'"()]*+ # ' # (<-- comment for SO syntax highlighting)
(?:
(?: \g<brackets> | \g<quotes> )
[^,'"()]* # ' #
)*+
)
)
# the main pattern
(?: # two possible beginings
\G(?!\A) , # a comma contiguous to a previous match
| # OR
\A # the start of the string
)
(?<param> \g<content> )
(?: \z (?<check>) )? # create an item "check" when the end is reached
~x
EOD;
$result = false;
if ( preg_match_all($pattern, $subject, $matches, PREG_SET_ORDER) &&
isset(end($matches)['check']) )
$result = array_map(function ($i) { return $i['param']; }, $matches);
else
echo 'bad format' . PHP_EOL;
var_dump($result);
demo
You could split the argument string at ,$ and then append $ back the array values:
$args_array = explode(',$', $arg_str);
foreach($args_array as $key => $arg_raw) {
$args_array[$key] = '$'.ltrim($arg_raw, '$');
}
print_r($args_array);
Output:
(
[0] => $arg1
[1] => $arg2='ABC,DEF'
[2] => $arg3="GHI\",JKL"
[3] => $arg4=array(1,'2)',"3\"),")
)
If you want to use a regex, you can use something like this:
(.+?)(?:,(?=\$)|$)
Working demo
Php code:
$re = '/(.+?)(?:,(?=\$)|$)/';
$str = "\$arg1,\$arg2='ABC,DEF',\$arg3=\"GHI\",JKL\",\$arg4=array(1,'2)',\"3\"),\")\n";
preg_match_all($re, $str, $matches);
Match information:
MATCH 1
1. [0-5] `$arg1`
MATCH 2
1. [6-21] `$arg2='ABC,DEF'`
MATCH 3
1. [22-39] `$arg3="GHI\",JKL"`
MATCH 4
1. [40-67] `$arg4=array(1,'2)',"3\"),")`

How to manipulate a string so I can make implicit multiplication explicit in a math expression?

I want to manipulate a string like "...4+3(4-2)-...." to become "...4+3*(4-2)-....", but of course it should recognize any number, d, followed by a '(' and change it to 'd*('. And I also want to change ')(' to ')*(' at the same time if possible. Would nice if there is a possibility to add support for constants like pi or e too.
For now, I just do it this stupid way:
private function make_implicit_multiplication_explicit($string)
{
$i=1;
if(strlen($string)>1)
{
while(($i=strpos($string,"(",$i))!==false)
{
if(strpos("0123456789",substr($string,$i-1,1)))
{
$string=substr_replace($string,"*(",$i,1);
$i++;
}
$i++;
}
$string=str_replace(")(",")*(",$string);
}
return $string;
}
But I Believe this could be done much nicer with preg_replace or some other regex function? But those manuals are really cumbersome to grasp, I think.
Let's start by what you are looking for:
either of the following: ((a|b) will match either a or b)
any number, \d
the character ): \)
followed by (: \(
Which creates this pattern: (\d|\))\(. But since you want to modify the string and keep both parts, you can group the \( which results in (\() making it worse to read but better to handle.
Now everything left is to tell how to rearrange, which is simple: \\1*\\2, leaving you with code like this
$regex = "/(\d|\))(\()/";
$replace = "\\1*\\2";
$new = preg_replace($regex, $replace, $test);
To see that the pattern actually matches all cases, see this example.
To recognize any number followed by a ( OR a combination of a )( and place an asterisk in between them, you can use a combination of lookaround assertions.
echo preg_replace("/
(?<=[0-9)]) # look behind to see if there is: '0' to '9', ')'
(?=\() # look ahead to see if there is: '('
/x", '*', '(4+3(4-2)-3)(2+3)');
The Positive Lookbehind asserts that what precedes is either a number or right parentheses. While the Positive Lookahead asserts that the preceding characters are followed by a left parentheses.
Another option is to use the \K escape sequence in replace of the Lookbehind. \K resets the starting point of the reported match. Any previously consumed characters are no longer included ( throws away everything that it has matched up to that point. )
echo preg_replace("/
[0-9)] # any character of: '0' to '9', ')'
\K # resets the starting point of the reported match
(?=\() # look ahead to see if there is: '('
/x", '*', '(4+3(4-2)-3)(2+3)');
Your php code should be,
<?php
$mystring = "4+3(4-2)-(5)(3)";
$regex = '~\d+\K\(~';
$replacement = "*(";
$str = preg_replace($regex, $replacement, $mystring);
$regex1 = '~\)\K\(~';
$replacement1 = "*(";
echo preg_replace($regex1, $replacement1, $str);
?> //=> 4+3*(4-2)-(5)*(3)
Explanation:
~\d+\K\(~ this would match the one or more numbers followed by a (. Because of \K it excludes the \d+
Again it replaces the matched part with *( which in turn produces 3*( and the result was stored in another variable.
\)\K\( Matches )( and excludes the first ). This would be replaced by *( which in turn produces )*(
DEMO 1
DEMO 2
Silly method :^ )
$value = '4+3(4-2)(1+2)';
$search = ['1(', '2(', '3(', '4(', '5(', '6(', '7(', '8(', '9(', '0(', ')('];
$replace = ['1*(', '2*(', '3*(', '4*(', '5*(', '6*(', '7*(', '8*(', '9*(', '0*(', ')*('];
echo str_replace($search, $replace, $value);

preg_replace_callback function name with parameter in string

I am trying to use preg_replace_callback() to call any function with its parameter(s) embedded in a string.
$string = "some text ucfirst('asd')";
$pattern = "~ucfirst([a-z]+)\(\)~";
$string = preg_replace_callback($pattern, "ucasef", $string);
echo $string; // some text Asd
I need some help with the pattern but also with how to use it to accomplish the example output.
This is how you may use it, I've added some comments to clarify the code:
$input = "some text ucfirst('name') and strtoupper (\"shout\" ). Maybe also make it strtolower( 'LOWER') or do('nothing').";
$pattern = '~
(\w+) # Match the function name and put it in group 1
\s*\(\s* # Some optional whitespaces around (
("|\') # Match either a double or single quote and put it in group 2
(.*?) # Match anything, ungreedy until ...
\2 # Match what was matched in group 2
\s*\) # Some optional whitespaces before )
~xs'; # XS modifiers, x to make this fancy formatting/commenting and s to match newlines with the dot "."
$output = preg_replace_callback($pattern, function($v){
$allowed = array('strtolower', 'strtoupper', 'ucfirst', 'ucwords'); // Allowed functions
if(in_array(strtolower($v[1]), $allowed)){ // Check if the function used is allowed
return call_user_func($v[1], $v[3]); // Use it
}else{
return $v[0]; // return the original value, you might use something else
}
}, $input);
echo $output;
Output: some text Name and SHOUT. Maybe also make it lower or do('nothing').

Looping within a regular expression

can regex able to find a patter to this?
{{foo.bar1.bar2.bar3}}
where in the groups would be
$1 = foo $2 = bar1 $3 = bar2 $4 = bar3 and so on..
it would be like re-doing the expression over and over again until it fails to get a match.
the current expression i am working on is
(?:\{{2})([\w]+).([\w]+)(?:\}{2})
Here's a link from regexr.
http://regexr.com?3203h
--
ok I guess i didn't explain well what I'm trying to achieve here.
let's say I am trying to replace all
.barX inside a {{foo . . . }}
my expected results should be
$foo->bar1->bar2->bar3
This should work, assuming no braces are allowed within the match:
preg_match_all(
'%(?<= # Assert that the previous character(s) are either
\{\{ # {{
| # or
\. # .
) # End of lookbehind
[^{}.]* # Match any number of characters besides braces/dots.
(?= # Assert that the following regex can be matched here:
(?: # Try to match
\. # a dot, followed by
[^{}]* # any number of characters except braces
)? # optionally
\}\} # Match }}
) # End of lookahead%x',
$subject, $result, PREG_PATTERN_ORDER);
$result = $result[0];
I'm not a PHP person, but I managed to construct this piece of code here:
preg_match_all("([a-z0-9]+)",
"{{foo.bar1.bar2.bar3}}",
$out, PREG_PATTERN_ORDER);
foreach($out[0] as $val)
{
echo($val);
echo("<br>");
}
The code above prints the following:
foo
bar1
bar2
bar3
It should allow you to exhaustively search a given string by using a simple regular expression. I think that you should also be able to get what you want by removing the braces and splitting the string.
I don't think so, but it's relatively painless to just split the string on periods like so:
$str = "{{foo.bar1.bar2.bar3}}";
$str = str_replace(array("{","}"), "", $str);
$values = explode(".", $str);
print_r($values); // Yields an array with values foo, bar1, bar2, and bar3
EDIT: In response to your question edit, you could replace all barX in a string by doing the following:
$str = "{{foo.bar1.bar2.bar3}}";
$newStr = preg_replace("#bar\d#, "hi", $str);
echo $newStr; // outputs "{{foo.hi.hi.hi}}"
I don't know the correct syntax in PHP, for pulling out the results, but you could do:
\{{2}(\w+)(?:\.(\w+))*\}{2}
That would capture the first hit in the first capturing group and the rest in second capturing group. regexr.com is lacking the ability to show that as far as I can see though. Try out Expresso, and you'll see what I mean.

PHP: Get last Tag of a String with Regular Expressions

Quite simple problem (but difficult solution): I got a string in PHP like as follows:
['one']['two']['three']
And from this, i must extract the last tags, so i finally got three
it is also possible that there is a number, like
[1][2][3]
and then i must get 3
How can i solve this?
Thanks for your help!
Flo
Your tag is \[[^\]]+\].
3 Tags are: (\[[^\]]+\]){3}
3 Tags at end are: (\[[^\]]+\]){3}$
N Tags at end are: (\[[^\]]+\])*$ (N 0..n)
Example:
<?php
$string = "['one']['two']['three'][1][2][3]['last']";
preg_match("/((?:\[[^\]+]*\]){3})$/", $string, $match);
print_r($match); // Array ( [0] => [2][3]['last'] [1] => [2][3]['last'] )
This tested code may work for you:
function getLastTag($text) {
$re = '/
# Match contents of last [Tag].
\[ # Literal start of last tag.
(?: # Group tag contents alternatives.
\'([^\']+)\' # Either $1: single quoted,
| (\d+) # or $2: un-quoted digits.
) # End group of tag contents alts.
\] # Literal end of last tag.
\s* # Allow trailing whitespace.
$ # Anchor to end of string.
/x';
if (preg_match($re, $text, $matches)) {
if ($matches[1]) return $matches[1]; // Either single quoted,
if ($matches[2]) return $matches[2]; // or non quoted digit.
}
return null; // No match. Return NULL.
}
Here is a regex that may work for you. Try this:
[^\[\]']*(?='?\]$)

Categories