I have the following content
"aa_bb" : "foo"
"pp_Qq" : "bar"
"Xx_yY_zz" : "foobar"
And I want to convert the content on the left side to camelCase
"aaBb" : "foo"
"ppQq" : "bar"
"xxYyZz" : "foobar"
And the code:
// selects the left part
$newString = preg_replace_callback("/\"(.*?)\"(.*?):/", function($matches) {
// selects the characters following underscores
$matches[1] = preg_replace_callback("/_(.?)/", function($matches) {
//removes the underscore and uppercases the character
return strtoupper($matches[1]);
}, $matches[1]);
// lowercases the first character before returning
return "\"".lcfirst($matches[1])."\" : ".$matches[2];
}, $string);
Can this code be simplified?
Note: The content will always be a single string.
First, since you already have a working code you want to improve, consider to post your question in code review instead of stackoverflow next time.
Let's start to improve your original approach:
$result = preg_replace_callback('~"[^"]*"\s*:~', function ($m) {
return preg_replace_callback('~_+(.?)~', function ($n) {
return strtoupper($n[1]);
}, strtolower($m[0]));
}, $str);
pro: patterns are relatively simple and the idea is easy to understand.
cons: nested preg_replace_callback's may hurt the eyes.
After this eyes warm-up exercice, we can try a \G based pattern approach:
$pattern = '~(?|\G(?!^)_([^_"]*)|("(?=[^"]*"\s*:)[^_"]*))~';
$result = preg_replace_callback($pattern, function ($m) {
return ucfirst(strtolower($m[1]));
}, $str);
pro: the code is shorter, no need to use two preg_replace_callback's.
cons: the pattern is from far more complicated.
notice: When you write a long pattern, nothing forbids to use the free-spacing mode with the x modifier and to put comments:
$pattern = '~
(?| # branch reset group: in which capture groups have the same number
\G # contigous to the last successful match
(?!^) # but not at the start of the string
_
( [^_"]* ) # capture group 1
|
( # capture group 1
"
(?=[^"]*"\s*:) # lookahead to check if it is the "key part"
[^_"]*
)
)
~x';
Is there compromises between these two extremes, and what is the good one? Two suggestions:
$result = preg_replace_callback('~"[^"]+"\s*:~', function ($m) {
return array_reduce(explode('_', strtolower($m[0])), function ($c, $i) {
return $c . ucfirst($i);
});
}, $str);
pro: minimal use of regex.
cons: needs two callback functions except that this time the second one is called by array_reduce and not by preg_replace_callback.
$result = preg_replace_callback('~["_][^"_]*(?=[^"]*"\s*:)~', function ($m) {
return ucfirst(strtolower(ltrim($m[0], '_')));
}, $str);
pro: the pattern is relatively simple and the callback function stays simple too. It looks like a good compromise.
cons: the pattern isn't very constrictive (but should suffice for your use case)
pattern description: the pattern looks for a _ or a " and matches following characters that aren't a _ or a ". A lookahead assertion then checks that these characters are inside the key part looking for a closing quote and colon. The match result is always like _aBc or "aBc (underscores are trimmed on the left in the callback function and " stays the same after applying ucfirst).
pattern details:
["_] # one " or _
[^"_]* # zero or more characters that aren't " or _
(?= # open a lookahead assertion (followed with)
[^"]* # all that isn't a "
" # a literal "
\s* # eventual whitespaces
: # a literal :
) # close the lookahead assertion
There's no good answer and what looks simple or complicated really depends on the reader.
You might make use of preg_replace_callback in combination with the \G anchor and capturing groups.
(?:"\K([^_\r\n]+)|\G(?!^))(?=[^":\r\n]*")(?=[^:\r\n]*:)_?([a-zA-Z])([^"_\r\n]*)
In parts
(?: Non capturing group
"\K([^_\r\n]+) Match ", capture group 1 match 1+ times any char except _ or newline
| Or
\G(?!^) Assert position at the previous match, not at the start
) Close group
(?=[^":\r\n]*") Positive lookahead, assert "
(?=[^:\r\n]*:) Positive lookahead, assert :
_? Match optional _
([a-zA-Z]) Capture group 2 match a-zA-Z
([^"_\r\n]*) Capture group 3 match 0+ times any char except _ or newline
In the replacement concatenate a combination of strtolower and strtoupper using the 3 capturing groups.
Regex demo
For example
$re = '/(?:"\K([^_\r\n]+)|\G(?!^))(?=[^":\r\n]*")(?=[^:\r\n]*:)_?([a-zA-Z])([^"_\r\n]*)/';
$str = '"aa_bb" : "foo"
"pp_Qq" : "bar"
"Xx_yY_zz" : "foobar"
"Xx_yYyyyyyYyY_zz_a" : "foobar"';
$result = preg_replace_callback($re, function($matches) {
return strtolower($matches[1]) . strtoupper($matches[2]) . strtolower($matches[3]);
}, $str);
echo $result;
Output
"aaBb" : "foo"
"ppQq" : "bar"
"xxYyZz" : "foobar"
"xxYyyyyyyyyyZzA" : "foobar"
Php demo
Related
I like to replace the letters "KELLY" bettween "#" with the same length of "#". (here, repetitive five #'s instead of 'KELLY')
$str = "####KELLY#####"; // any alpabet letters can come.
preg_replace('/(#{3,})[A-Z]+(#{3,})/', "$1$2", $str);
It returns ######### (four hashes then five hashes) without 'KELLY'.
How can I get ############## which is four original leading hashes, then replace each letter with a hash, then the five original trailing hashes?
The \G continue metacharacter makes for a messier pattern, but it enables the ability to use preg_replace() instead of preg_replace_callback().
Effectively, it looks for the leading three-or-more hashes, then makes single-letter replacements until it reaches the finishing sequence of three-or-more hashes.
This technique also allows hash markers to be "shared" -- I don't actually know if this is something that is desired.
Code: (Demo)
$str = "####KELLY##### and ###ANOTHER###### not ####foo#### but: ###SHARE###MIDDLE###HASHES### ?";
echo $str . "\n";
echo preg_replace('/(?:#{3}|\G(?!^))\K[A-Z](?=[A-Z]*#{3})/', '#', $str);
Output:
####KELLY##### and ###ANOTHER###### not ####foo#### but: ###SHARE###MIDDLE###HASHES### ?
############## and ################ not ####foo#### but: ############################# ?
Breakdown:
/ #starting pattern delimiter
(?: #start non-capturing group
#{3} #match three hash symbols
| # OR
\G(?!^) #continue matching, disallow matching from the start of string
) #close non-capturing group
\K #forget any characters matched up to this point
[A-Z] #match a single letter
(?= #lookahead (do not consume any characters) for...
[A-Z]* #zero or more letters then
#{3} #three or more hash symbols
) #close the lookahead
/ #ending pattern delimiter
Or you can achieve the same result with preg_replace_callback().
Code: (Demo)
echo preg_replace_callback(
'/#{3}\K[A-Z]+(?=#{3})/',
function($m) {
return str_repeat('#', strlen($m[0]));
},
$str
);
I solved the problem with preg_replace_callback function in php.
Thanks CBroe for the tips.
preg_replace_callback('/#{3,}([A-Z]+)#{3,}/i', 'replaceLetters', $str);
function replaceLetters($matches) {
$ret = '';
for($i=0; $i < strlen($matches[0]); $i++) {
$ret .= "#";
}
return $ret;
}
I have to replace characters in URLs but only form a certain point and also handle duplicate characters.
The URLs look like this:
http://example.com/001-one-two.html#/param-what-ever
http://example.com/002-one-two-three.html#/param-what--ever-
http://example.com/003-one-two-four.html#/param2-what-ever-
http://example.com/004-one-two-five.html#/param33-what--ever---here-
and they should look like this:
http://example.com/001-one-two.html#/param-what_ever
http://example.com/002-one-two-three.html#/param-what_ever_
http://example.com/003-one-two-four.html#/param2-what_ever_
http://example.com/004-one-two-five.html#/param33-what_ever_here_
In words replace - characters (any number of it) with a single _ char but skip the first - after #/
The string length after the #/ varies obviously and I couldn't figure out a way to do this.
How can I do this?
Here is a way to go, using preg_replace_callback:
$in = array(
'http://example.com/001-one-two.html#/param-what-ever',
'http://example.com/002-one-two-three.html#/param-what--ever-',
'http://example.com/003-one-two-four.html#/param2-what-ever-',
'http://example.com/004-one-two-five.html#/param33-what--ever---here-'
);
foreach($in as $str) {
$res = preg_replace_callback('~^.*?#/[^-]+-(.+)$~', function ($m) {
return preg_replace('/-+/', '_', $m[1]);
},
$str);
echo "$res\n";
}
Explanation:
~ : regex delimiter
^ : start of string
.*? : 0 or more any character, not greedy
#/ : literally #/
[^-]+ : 1 or more any character that is not a dash
- : a dash
\K : forget all we have seen until here
(.+) : group 1, contains avery thing after the first dash after #/
$ : end of string
~ : regex delimiter
Output:
http://example.com/001-one-two.html#/param-what_ever
http://example.com/002-one-two-three.html#/param-what_ever_
http://example.com/003-one-two-four.html#/param2-what_ever_
http://example.com/004-one-two-five.html#/param33-what_ever_here_
I want to manipulate a string like "...4+3(4-2)-...." to become "...4+3*(4-2)-....", but of course it should recognize any number, d, followed by a '(' and change it to 'd*('. And I also want to change ')(' to ')*(' at the same time if possible. Would nice if there is a possibility to add support for constants like pi or e too.
For now, I just do it this stupid way:
private function make_implicit_multiplication_explicit($string)
{
$i=1;
if(strlen($string)>1)
{
while(($i=strpos($string,"(",$i))!==false)
{
if(strpos("0123456789",substr($string,$i-1,1)))
{
$string=substr_replace($string,"*(",$i,1);
$i++;
}
$i++;
}
$string=str_replace(")(",")*(",$string);
}
return $string;
}
But I Believe this could be done much nicer with preg_replace or some other regex function? But those manuals are really cumbersome to grasp, I think.
Let's start by what you are looking for:
either of the following: ((a|b) will match either a or b)
any number, \d
the character ): \)
followed by (: \(
Which creates this pattern: (\d|\))\(. But since you want to modify the string and keep both parts, you can group the \( which results in (\() making it worse to read but better to handle.
Now everything left is to tell how to rearrange, which is simple: \\1*\\2, leaving you with code like this
$regex = "/(\d|\))(\()/";
$replace = "\\1*\\2";
$new = preg_replace($regex, $replace, $test);
To see that the pattern actually matches all cases, see this example.
To recognize any number followed by a ( OR a combination of a )( and place an asterisk in between them, you can use a combination of lookaround assertions.
echo preg_replace("/
(?<=[0-9)]) # look behind to see if there is: '0' to '9', ')'
(?=\() # look ahead to see if there is: '('
/x", '*', '(4+3(4-2)-3)(2+3)');
The Positive Lookbehind asserts that what precedes is either a number or right parentheses. While the Positive Lookahead asserts that the preceding characters are followed by a left parentheses.
Another option is to use the \K escape sequence in replace of the Lookbehind. \K resets the starting point of the reported match. Any previously consumed characters are no longer included ( throws away everything that it has matched up to that point. )
echo preg_replace("/
[0-9)] # any character of: '0' to '9', ')'
\K # resets the starting point of the reported match
(?=\() # look ahead to see if there is: '('
/x", '*', '(4+3(4-2)-3)(2+3)');
Your php code should be,
<?php
$mystring = "4+3(4-2)-(5)(3)";
$regex = '~\d+\K\(~';
$replacement = "*(";
$str = preg_replace($regex, $replacement, $mystring);
$regex1 = '~\)\K\(~';
$replacement1 = "*(";
echo preg_replace($regex1, $replacement1, $str);
?> //=> 4+3*(4-2)-(5)*(3)
Explanation:
~\d+\K\(~ this would match the one or more numbers followed by a (. Because of \K it excludes the \d+
Again it replaces the matched part with *( which in turn produces 3*( and the result was stored in another variable.
\)\K\( Matches )( and excludes the first ). This would be replaced by *( which in turn produces )*(
DEMO 1
DEMO 2
Silly method :^ )
$value = '4+3(4-2)(1+2)';
$search = ['1(', '2(', '3(', '4(', '5(', '6(', '7(', '8(', '9(', '0(', ')('];
$replace = ['1*(', '2*(', '3*(', '4*(', '5*(', '6*(', '7*(', '8*(', '9*(', '0*(', ')*('];
echo str_replace($search, $replace, $value);
can regex able to find a patter to this?
{{foo.bar1.bar2.bar3}}
where in the groups would be
$1 = foo $2 = bar1 $3 = bar2 $4 = bar3 and so on..
it would be like re-doing the expression over and over again until it fails to get a match.
the current expression i am working on is
(?:\{{2})([\w]+).([\w]+)(?:\}{2})
Here's a link from regexr.
http://regexr.com?3203h
--
ok I guess i didn't explain well what I'm trying to achieve here.
let's say I am trying to replace all
.barX inside a {{foo . . . }}
my expected results should be
$foo->bar1->bar2->bar3
This should work, assuming no braces are allowed within the match:
preg_match_all(
'%(?<= # Assert that the previous character(s) are either
\{\{ # {{
| # or
\. # .
) # End of lookbehind
[^{}.]* # Match any number of characters besides braces/dots.
(?= # Assert that the following regex can be matched here:
(?: # Try to match
\. # a dot, followed by
[^{}]* # any number of characters except braces
)? # optionally
\}\} # Match }}
) # End of lookahead%x',
$subject, $result, PREG_PATTERN_ORDER);
$result = $result[0];
I'm not a PHP person, but I managed to construct this piece of code here:
preg_match_all("([a-z0-9]+)",
"{{foo.bar1.bar2.bar3}}",
$out, PREG_PATTERN_ORDER);
foreach($out[0] as $val)
{
echo($val);
echo("<br>");
}
The code above prints the following:
foo
bar1
bar2
bar3
It should allow you to exhaustively search a given string by using a simple regular expression. I think that you should also be able to get what you want by removing the braces and splitting the string.
I don't think so, but it's relatively painless to just split the string on periods like so:
$str = "{{foo.bar1.bar2.bar3}}";
$str = str_replace(array("{","}"), "", $str);
$values = explode(".", $str);
print_r($values); // Yields an array with values foo, bar1, bar2, and bar3
EDIT: In response to your question edit, you could replace all barX in a string by doing the following:
$str = "{{foo.bar1.bar2.bar3}}";
$newStr = preg_replace("#bar\d#, "hi", $str);
echo $newStr; // outputs "{{foo.hi.hi.hi}}"
I don't know the correct syntax in PHP, for pulling out the results, but you could do:
\{{2}(\w+)(?:\.(\w+))*\}{2}
That would capture the first hit in the first capturing group and the rest in second capturing group. regexr.com is lacking the ability to show that as far as I can see though. Try out Expresso, and you'll see what I mean.
For example:
$s1 = "Test Test the rest of string"
$s2 = "Test the rest of string"
I would like to match positively $s1 but not $s2, because first word in $s1 is the same as second. Word 'Test' is example, regular expression should work on any words.
if(preg_match('/^(\w+)\s+\1\b/',$input)) {
// $input has same first two words.
}
Explanation:
^ : Start anchor
( : Start of capturing group
\w+ : A word
) : End of capturing group
\s+ : One or more whitespace
\1 : Back reference to the first word
\b : Word boundary
~^(\w+)\s+\1(?:\W|$)~
~^(\pL+)\s+\1(?:\PL|$)~u // unicode variant
\1 is a back reference to the first capturing group.
This does not cause Test Testx to return true.
$string = "Test Test";
preg_match('/^(\w+)\s+\1(\b|$)/', $string);
Not working everywhere, see the comments...
^([^\b]+)\b\1\b
^(\B+)\b\1\b
Gets the first word, and matches if the same word is repeated again after a word boundary.