I am writing some unit tests for some methods I am using, and have found a weird bug and would like some Regex advice.
when doing:-
$needle = ' ';
$haystack = 'hello world. this is a unit test.';
$pattern = '/\b' . $needle . '\b/';
preg_match_all($pattern, $haystack, $matches, PREG_OFFSET_CAPTURE, $offset)
I'm expecting the positions to positions found to be
[5, 12, 17, 20, 22, 27]
The same as if I did this, to get none exact whole word matches
while (($pos = strpos($haystack, $needle, $offset)) !== false) {
$offset = $pos + 1;
$positions[] = $pos;
}
However the preg_match_all does not find the 2nd occurrence (12) the space between
. this
Is this to do with the \b boundary flag? How can I resolve this to make sure it picks up other this?
Thanks
You have to change your $pattern in preg_match_all() like below:-
<?php
$haystack = 'hello world. this is a unit test.';
while (($pos = strpos($haystack, ' ', $offset)) !== false) {
$offset = $pos + 1;
$positions[] = $pos;
}
echo "<pre/>";print_r($positions);
preg_match_all('/\s/', $haystack, $matches,PREG_OFFSET_CAPTURE);
echo "<pre/>";print_r($matches);
Output:- https://eval.in/725574
Note:- you need to use \s for checking white-spaces
You can apply an if-else to change $pattern based on $needle:-
if($needle == ''){
$pattern = '/\s/';
}else{
$pattern = '/\b' . $needle . '\b/';
}
Related
I have:
$string = 'Hello x World hello';
And I want to remove the first instance of a term that is case insensitive.
For example, I have:
$term = 'hello';
And I want to match it with $string, so we end up with:
$string = ' x World hello';
What I've done so far
I am nearly there! I can replace the first instance of a term with:
function str_replace_once($str_pattern, $str_replacement, $string){
if (strpos($string, $str_pattern) !== false){
$occurrence = strpos($string, $str_pattern);
return substr_replace($string, $str_replacement, strpos($string, $str_pattern), strlen($str_pattern));
}
return $string;
}
But it is not case insensitive, see ideone. How can I make it case insensitive?
In your example code, just replace strpos with stripos... (with a few minor tweaks)...
function str_replace_once($str_pattern, $str_replacement, $string){
$pos = stripos($string, $str_pattern);
if ($pos !== false){
return substr_replace($string, $str_replacement, $pos, strlen($str_pattern));
}
return $string;
}
I wish str_ireplace had a limit, it would be better. But preg_replace has a limit. Use the i modifier for case-insensitive:
$string = preg_replace("/$term/i", "", $string, 1);
Also note that there is a stripos which is case-insensitive strpos which you can use in your existing code.
Use stripos(case-insensitive) and substr
function str_replace_first($str, $search, $replace)
{
$index = stripos($str, $search);
return $index === false ? $str : substr($str, 0, $index) . $replace . substr($str, $index + strlen($search));
}
Usage:
$string = 'Hello x World hello';
echo str_replace_first($string,'Hello', '');
How can I preg match a string, but tolerate a variable levensthein distance in the pattern?
$string = 'i eat apples and oranges all day long';
$find = 'and orangis';
$distance = 1;
$matches = pregMatch_withLevensthein($find, $distance, $string);
This would return 'and oranges';
By converting the search string into a regexp, we can match the pattern. Then we search using that regexp and do a comparison with levenshtein. If it matches the bounds we can return the values.
$string = 'i eat apples and oranges all day long';
$find = 'and orangis';
$distance = 1;
$matches = preg_match_levensthein($find, $distance, $string);
var_dump($matches);
function preg_match_levensthein($find, $distance, $string)
{
$found = array();
// Covert find into regex
$parts = explode(' ', $find);
$regexes = array();
foreach ($parts as $part) {
$regexes[] = '[a-z0-9]{' . strlen($part) . '}';
}
$regexp = '#' . implode('\s', $regexes) . '#i';
// Find all matches
preg_match_all($regexp, $string, $matches);
foreach ($matches as $match) {
// Check levenshtein distance and add to the found if within bounds
if (levenshtein($match[0], $find) <= $distance) {
$found[] = $match[0];
}
}
// return found
return $found;
}
I am trying to replace this "iwdnowfreedom[body_style][var]" with this "iwdnowfreedom_body_style_var" in the name attributes of a variable. There could be several array keys but for my situation stripping them out shouldn't result in any issues.
Here is the code I have so far:
$pattern = '/name\\s*=\\s*["\'](.*?)["\']/i';
$replacement = 'name="$2"';
$fixedOutput = preg_replace($pattern, $replacement, $input);
return $fixedOutput;
How can I fix this to work properly?
You could try using the build in str_replace function to achieve what you are looking for (assuming there are no nested bracked like "test[test[key]]"):
$str = "iwdnowfreedom[body_style][var]";
echo trim( str_replace(array("][", "[", "]"), "_", $str), "_" );
or if you prefer regex (nested brackets work fine with this method):
$input = "iwdnowfreedom[body_style][var]";
$pattern = '/(\[+\]+|\]+\[+|\[+|\]+)/i';
$replacement = '_';
$fixedOutput = trim( preg_replace($pattern, $replacement, $input), "_" );
echo $fixedOutput;
I think you also meant that you might have a string such as
<input id="blah" name="test[hello]" />
and to parse the name attribute you could just do:
function parseNameAttribute($str)
{
$pos = strpos($str, 'name="');
if ($pos !== false)
{
$pos += 6; // move 6 characters forward to remove the 'name="' part
$endPos = strpos($str, '"', $pos); // find the next quote after the name="
if ($endPos !== false)
{
$name = substr($str, $pos, $endPos - $pos); // cut between name=" and the following "
return trim(preg_replace('/(\[+\]+|\]+\[+|\[+|\]+)/i', '_', $name), '_');
}
}
return "";
}
OR
function parseNameAttribute($str)
{
if (preg_match('/name="(.+?)"/', $str, $matches))
{
return trim(preg_replace('/(\[+\]+|\]+\[+|\[+|\]+)/i', '_', $matches[1]), '_');
}
return "";
}
str_repeat(A, B) repeat string A, B times:
$string = "This is a " . str_repeat("test", 2) .
"! " . str_repeat("hello", 3) . " and Bye!";
// Return "This is a testtest! hellohellohello and Bye!"
I need reverse operation:
str_shrink($string, array("hello", "test"));
// Return "This is a test(x2)! hello(x3) and Bye!" or
// "This is a [test]x2! [hello]x3 and Bye!"
Best and efficient way for create str_shrink function?
Here are two versions that I could come up with.
The first uses a regular expression and replaces duplicate matches of the $needle string with a single $needle string. This is the most vigorously tested version and handles all possibilities of inputs successfully (as far as I know).
function str_shrink( $str, $needle)
{
if( is_array( $needle))
{
foreach( $needle as $n)
{
$str = str_shrink( $str, $n);
}
return $str;
}
$regex = '/(' . $needle . ')(?:' . $needle . ')+/i';
return preg_replace_callback( $regex, function( $matches) { return $matches[1] . '(x' . substr_count( $matches[0], $matches[1]) . ')'; }, $str);
}
The second uses string manipulation to continually replace occurrences of the $needle concatenated with itself. Note that this one will fail if $needle.$needle occurs more than once in the input string (The first one does not have this problem).
function str_shrink2( $str, $needle)
{
if( is_array( $needle))
{
foreach( $needle as $n)
{
$str = str_shrink2( $str, $n);
}
return $str;
}
$count = 1; $previous = -1;
while( ($i = strpos( $str, $needle.$needle)) > 0)
{
$str = str_replace( $needle.$needle, $needle, $str);
$count++;
$previous = $i;
}
if( $count > 1)
{
$str = substr( $str, 0, $previous) . $needle .'(x' . $count . ')' . substr( $str, $previous + strlen( $needle));
}
return $str;
}
See them both in action
Edit: I didn't realize that the desired output wanted to include the number of repetitions. I've modified my examples accordingly.
You can play around with tis one, not tested a lot though
function shrink($s, $parts, $mask = "%s(x%d)"){
foreach($parts as $part){
$removed = 0;
$regex = "/($part)+/";
preg_match_all($regex, $s, $matches, PREG_OFFSET_CAPTURE);
if(!$matches)
continue;
foreach($matches[0] as $m){
$offset = $m[1] - $removed;
$nb = substr_count($m[0], $part);
$counter = sprintf($mask, $part, $nb);
$s = substr($s, 0, $offset) . $counter . substr($s, $offset + strlen($m[0]));
$removed += strlen($m[0]) - strlen($part);
}
}
return $s;
}
I think you can try with:
<?php
$string = "This is a testtest! hellohellohello and Bye!";
function str_shrink($string, $array){
$tr = array();
foreach($array as $el){
$n = substr_count($string, $el);
$tr[$el] = $el.'(x'.$n.')';
$pattern[] = '/('.$el.'\(x'.$n.'\))+/i';
}
return preg_replace($pattern, '${1}', strtr($string,$tr));
}
echo $string;
echo '<br/>';
echo str_shrink($string,array('test','hello')); //This is a test(x2)! hello(x3) and Bye!
?>
I have a second version in order to works with strings:
<?php
$string = "This is a testtest! hellohellohello and Bye!";
function str_shrink($string, $array){
$tr = array();
$array = is_array($array) ? $array : array($array);
foreach($array as $el){
$sN = 'x'.substr_count($string, $el);
$tr[$el] = $el.'('.$sN.')';
$pattern[] = '/('.$el.'\('.$sN.'\))+/i';
}
return preg_replace($pattern, '${1}', strtr($string,$tr));
}
echo $string;
echo '<br/>';
echo str_shrink($string,array('test','hello')); //This is a test(x2)! hello(x3) and Bye!
echo '<br/>';
echo str_shrink($string,'test'); //This is a test(x2)! hellohellohello and Bye!
?>
I kept it short:
function str_shrink($haystack, $needles, $match_case = true) {
if (!is_array($needles)) $needles = array($needles);
foreach ($needles as $k => $v) $needles[$k] = preg_quote($v, '/');
$regexp = '/(' . implode('|', $needles) . ')+/' . ($match_case ? '' : 'i');
return preg_replace_callback($regexp, function($matches) {
return $matches[1] . '(x' . (strlen($matches[0]) / strlen($matches[1])) . ')';
}, $haystack);
}
The behavior of cases like str_shrink("aaa", array("a", "a(x3)")) is it returns "a(x3)", which I thought was more likely intended if you're specifying an array. For the other behavior, giving a result of "a(x3)(x1)", call the function with each needle individually.
If you don't want multiples of one to get "(x1)" change:
return $matches[1] . '(x' . (strlen($matches[0]) / strlen($matches[1])) . ')';
to:
$multiple = strlen($matches[0]) / strlen($matches[1]);
return $matches[1] . (($multiple > 1) ? '(x' . $multiple . ')' : '');
Here's a very direct, single-regex technique and you don't need to collect the words in the string in advance.
There will be some fringe cases to mitigate which are not represented in the sample input, but as for the general purpose of this task, I reckon this is the way that I'd script this in my project.
Match (and capture) any full word that is repeated one or more times.
Match the contiguous repetitions of the word.
Replace the fullstring match (substring of multiple words) with the captured first instance of the word.
Before returning the replacement string for re-insertion, add the desired formatting and calculate the number of repetitions by dividing the fullstring length by the captured string's length.
Code: (Demo)
$string = "This is a " . str_repeat("test", 2) .
"!\n" . str_repeat("hello", 3) . " and Bye!\n" .
"When I sleep, the thought bubble says " . str_repeat("zz", 3) . ".";
echo preg_replace_callback(
'~\b(\w+?)\1+\b~',
function($m) {
return "[{$m[1]}](" . (strlen($m[0]) / strlen($m[1])) . ")";
},
$string
);
Output:
This is a [test](2)!
[hello](3) and Bye!
When I sleep, the thought bubble says [z](6).
For a whitelist of needles, this adaptation to my above code does virtually the same job.
Code: (Demo)
function str_shrink($string, $needles) {
// this escaping is unnecessary if only working with alphanumeric characters
$needles = array_map(function($needle) {
return preg_quote($needle, '~');
}, $needles);
return preg_replace_callback(
'~\b(' . implode('|', $needles) . ')\1+\b~',
function($m) {
return "[{$m[1]}](" . (strlen($m[0]) / strlen($m[1])) . ")";
},
$string
);
}
echo str_shrink($string, ['test', 'hello']);
Output:
This is a [test](2)!
[hello](3) and Bye!
When I sleep, the thought bubble says zzzzzz.
How to find positions of a character in a string or sentence in php
$char = 'i';
$string = 'elvis williams';
$result = '3rd ,7th and 10th'.
I tried strpos..but no use..
This will give you the position of $char in $string:
$pos = strpos($string, $char);
If you want the position of all occurences of $char in string:
$positions = array();
$pos = -1;
while (($pos = strpos($string, $char, $pos+1)) !== false) {
$positions[] = $pos;
}
$result = implode(', ', $positions);
print_r($result);
Test it here: http://codepad.viper-7.com/yssEK3