preg_replace custom ranges - php

The user input is stored in the variable $input.
so i want to use preg replace to swap the letters from the user input that will range from a-z, with my own custom alphabet.
My code i am trying, which doesnt work is below:
preg_replace('/([a-z])/', "y,p,l,t,a,v,k,r,e,z,g,m,s,h,u,b,x,n,c,d,i,j,f,q,o,w", $input)
This code however doesnt work.
If anyone has any suggestions on how i can get this working then that would be great. Thanks

Don't jump for preg, when str is enough:
$regular = range('a', 'z');
$custom = explode(',', "y,p,l,t,a,v,k,r,e,z,g,m,s,h,u,b,x,n,c,d,i,j,f,q,o,w");
$output = str_replace($regular, $custom, $input);

Using str_replace makes a lot more sense in this case:
str_replace(
range("a", "z"), // Creates an array with all lowercase letters
explode(",", "y,p,l,t,a,v,k,r,e,z,g,m,s,h,u,b,x,n,c,d,i,j,f,q,o,w"),
$input
);

You could instead use strtr(), this resolves the problem of replacing already replaced values.
echo strtr($input, 'abcdefghijklmnopqrstuvwxyz', 'ypltavkrezgmshubxncdijfqow');
With $input as yahoo the output is oyruu, as expected.

A potential problem with the solutions given is that multiple replacements could occur for each character. eg. 'a' gets replaced by 'y', and in the same statement 'y' gets replaced by 'o'. So, in the examples given above, 'aaa' becomes 'ooo', not 'yyy' that might be expected. And 'yyy' becomes 'ooo' as well. The resulting string is essentially garbage. You'd never be able to convert it back, if that was a requirement.
You could get around this using two replacements.
On the first replacement you replace the $regular chars with an intermediate set of character sequences that don't exist in $input. eg. 'a' to '[[[a]]]', 'b' to '[[[b]]]', etc.
Then replace the intermediate character sequences with your $custom set of chars. eg. '[[[a]]]' to 'y', '[[[b]]]' to 'p', etc.
Like so...
$regular = range('a', 'z');
$custom = explode(',', 'y,p,l,t,a,v,k,r,e,z,g,m,s,h,u,b,x,n,c,d,i,j,f,q,o,w');
// Create an intermediate set of char (sequences) that don't exist anywhere else in the $input
// eg. '[[[a]]]', '[[[b]]]', ...
$intermediate = $regular;
array_walk($intermediate,create_function('&$value','$value="[[[$value]]]";'));
// Replace the $regular chars with the $intermediate set
$output = str_replace($regular, $intermediate, $input);
// Replace the $intermediate chars with our custom set
$output = str_replace($intermediate, $custom, $output);
EDIT:
Leaving this solution for reference, but #salathe's solution to use strtr() is much better!

Related

PHP: preg_replace only first matching string in array

I've started with preg_replace in PHP and I wonder how I can replace only first matching array key with a specified array value cause I set preg_replace number of changes parameter to '1' and it's changing more than one time anyways. I also splitted my string to single words and I'm examining them one by one:
<?php
$internal_message = 'Hey, this is awesome!';
$words = array(
'/wesome(\W|$)/' => 'wful',
'/wful(\W|$)/' => 'wesome',
'/^this(\W|$)/' => 'that',
'/^that(\W|$)/' => 'this'
);
$splitted_message = preg_split("/[\s]+/", $internal_message);
$words_num = count($splitted_message);
for($i=0; $i<$words_num; $i++) {
$splitted_message[$i] = preg_replace(array_keys($words), array_values($words), $splitted_message[$i], 1);
}
$message = implode(" ", $splitted_message);
echo $message;
?>
I want this to be on output:
Hey, that is awful
(one suffix change, one word change and stops)
Not this:
Hey, this is awesome
(two suffix changes, two word changes and back to original word & suffix...)
Maybe I can simplify this code? I also can't change order of the array keys and values cause there will be more suffixes and single words to change soon. I'm kinda newbie in php coding and I'll be thankful for any help ;>
You may use plain text in the associative array keys that you will use to create dynamic regex patterns from, and use preg_replace_callback to replace the found values with the replacements in one go.
$internal_message = 'Hey, this is awesome!';
$words = array(
'wesome' => 'wful',
'wful' => 'wesome',
'this' => 'that',
'that' => 'this'
);
$rx = '~(?:' . implode("|", array_keys($words)) . ')\b~';
echo "$rx\n";
$message = preg_replace_callback($rx, function($m) use ($words) {
return isset($words[$m[0]]) ? $words[$m[0]] : $m[0];
}, $internal_message);
echo $message;
// => Hey, that is awful!
See the PHP demo.
The regex is
~(?:wesome|wful|this|that)\b~
The (?:wesome|wful|this|that) is a non-capturing group that matches any of the values inside, and \b is a word boundary, a non-consuming pattern that ensures there is no letter, digit or _ after the suffix.
The preg_replace_callback parses the string once, and when a match occurs, it is passed to the anonymous function (function($m)) together with the $words array (use ($words)) and if the $words array contains the found key (isset($words[$m[0]])) the corresponding value is returned ($words[$m[0]]) or the found match is returned otherwise ($m[0]).

php regex match possible accented characters

I found alot of questions about this, but none of those helped me with my especific problem. The situation: I want to search a string with something like "blablebli" and be able to find a match with all possible accented variations of that ("blablebli", "blábleblí", "blâblèbli", etc...) in an text.
I already made a workaround to to the opposite (find a word without possible accents that i wrote). But i can't figure it out a way to implement what i want.
Here is my working code. (the relevant part, this was part of a foreach so we are only seeing a single word search):
$word="something";
$word = preg_quote(trim($word)); //Just in case
$word2 = $this->removeAccents($word); // Removed all accents
if(!empty($word)) {
$sentence = "/(".$word.")|(".$word2.")/ui"; // Now I'm checking with and without accents.
if (preg_match($sentence, $content)){
echo "found";
}
}
And my removeAccents() function (i'm not sure if i covered all possible accents with that preg_replace(). So far it's working. I would appreciate if someone check if i'm missing anything):
function removeAccents($string)
{
return preg_replace('/[\`\~\']/', '', iconv('UTF-8', 'ASCII//TRANSLIT', $string));
}
What i'm trying to avoid:
I know i could check my $word and replace all a for [aàáãâä] and
same thing with other letters, but i dont know... it seens a litle
overkill.
And sure i could use my own removeAccents() function in my if
statement to check the $content without accents, something like:
if (preg_match($sentence, $content) || preg_match($sentence, removeAccents($content)))
But my problem with that second situation is i want to hightlight the word found after the match. So i can't change my $content.
Is there any way to improve my preg_match() to include possible accented characters? Or should i use my first option above?
I would decompose the string, this makes it easier to remove the offending characters, something along the lines:
<?php
// Convert unicode input to NFKD form.
$str = Normalizer::normalize("blábleblí", Normalizer::FORM_KD);
// Remove all combining characters (https://en.wikipedia.org/wiki/Combining_character).
var_dump(preg_replace('/[\x{0300}-\x{036f}]/u', "", $str));
Thanks for the help everyone, but i will end it up using my first sugestion i made in my question. And thanks again #CasimiretHippolyte for your patience, and making me realize that isn't that overkill as i thought.
Here is the final code I'm using (first the functions):
function removeAccents($string)
{
return preg_replace('/[\x{0300}-\x{036f}]/u', '', Normalizer::normalize($string, Normalizer::FORM_KD));
}
function addAccents($string)
{
$array1 = array('a', 'c', 'e', 'i' , 'n', 'o', 'u', 'y');
$array2 = array('[aàáâãäå]','[cçćĉċč]','[eèéêë]','[iìíîï]','[nñ]','[oòóôõö]','[uùúûü]','[yýÿ]');
return str_replace($array1, $array2, strtolower($string));
}
And:
$word="something";
$word = preg_quote(trim($word)); //Just in case
$word2 = $this->addAccents($this->removeAccents($word)); //check all possible accents
if(!empty($word)) {
$sentence = "/(".$word.")|(".$word2.")/ui"; // Now I'm checking my normal word and the possible variations of it.
if (preg_match($sentence, $content)){
echo "found";
}
}
Btw, im covering all possible accents from my country (and some others). You should check if you need to improve the addAccents() function before use it.

preg_replace with a word in an array

I am trying to use certain words in a array called keywords, which will be used to be replaced in a string by "as".
for($i = 0; $i<sizeof($this->keywords[$this->lang]); $i++)
{
$word = $this->keywords[$this->lang][$i];
$a = preg_replace("/\b$word\b/i", "as",$this->code);
}
It works with if I replace the variable $word with something like /\bhello\b/i, which then would replace all hello words with "as".
Is the approach am using even possible?
Before to be a pattern, it's a double quoted string, so variables will be replaced, it's not the problem.
The problem is that you use a loop to change several words and you store the result in $a:
the first iteration, all the occurences of the first word in $this->code are replaced and the new string is stored in $a.
but the next iteration doesn't reuse $a as third parameter to replace the next word, but always the original string $this->code
Result: after the for loop $a contains the original string but with only the occurences of the last word replaced with as.
When you want to replace several words with the same string, a way consists to build an alternation: word1|word2|word3.... It can easily be done with implode:
$alternation = implode('|', $this->keywords[$this->lang]);
$pattern = '~\b(?:' . $alternation . ')\b~i';
$result = preg_replace($pattern, 'as', $this->code);
So, when you do that, the string is parsed only once and all the words are replaced in one shot.
If you have a lot of words and a very long string:
Testing a long alternation has a significant cost. Even if the pattern starts with \b that highly reduces the possible positions for a match, your pattern will have hard time to succeed and more to fail.
Only in this particular case, you can use this another way:
First you define a placeholder (a character or a small string that can't be in your string, lets say §) that will be inserted in each positions of word boundaries.
$temp = preg_replace('~\b~', '§', $this->code);
Then you change all the keywords like this §word1§, §word2§ ... and you build an associative array where all values are the replacement string:
$trans = [];
foreach ($this->keywords[$this->lang] as $word) {
$trans['§' . $word . '§'] = 'as';
}
Once you have do that you add an empty string with the placeholder as key. You can now use the fast strtr function to perform the replacement:
$trans['§'] = '';
$result = strtr($temp, $trans);
The only limitation of this technic is that it is case-sensitive.
it will work if you keep it like bellow:
$a = preg_replace("/\b".$word."\b/i", "as",$this->code);

str_replace do not work fo all the key

I need to remove some chars from my string so I used the str_replace function as I always do but this doen't work.
I need to find the the string all the chars in the $array1 and replace them with the chars in $array2 in order to remove all the special chars that can cause trouble with my url.
Here my code:
$array1 = array('è','é','ò','à','ù','"','\'','ì',' ','.',',','<','>','&');
$array2 = array('e','e','o','a','u','','','i','-','','','','','');
$link = base_url().'esperto/tag/'.str_replace($array1,$array2,$val);
The problem is that this code just ignore some chars such as à for example.
If i set:
$val = "hello world";
the result is correct, I get:
hello-world
but if I use something like an italian word like:
$val = "aggressività";
in the $link I get the same value with no replace for the à.
The code should be correct, I cannot figure out what I'm missing here

preg_replace: how to consider whole array of patterns before replacing?

I'm using preg_replace to match and replace improperly encoded UTF-8 characters with their proper characters. I've created a "old" array containing the wrong characters, and a corresponding "new" array with the replacements. Here is a snippet of each array:
$old = array(
'/â€/',
'/’/',
);
$new = array(
'†',
'’',
);
(Note: If you're curious about why I'm doing this, read more here)
A sample string that may contain the wrong data could be:
The programmer’s becoming very frustrated
Which should become:
The programmer's becoming very frustrated
I'm using this function:
$result = preg_replace($old, $new, $str);
But the subject is actually becoming:
The programmer†™s becoming very frustrated
It's clear that PHP is doing what I call a non-greedy match on the subject (not the correct term to use here, I know). preg_replace is executing the replacement on the first pair in the old/new array without considering if there may a different pattern in the pattern array that is more appropriate. If I reverse the order of the replacement pair, then it works as expected.
My question is: Is there an approach that will allow preg_replace to consider all elements of the pattern array before executing a replacement, or is my only option to re-order the arrays?
I don't think there is any option like that. However, you could use an associative array to store your replacements and sort it using uasort and strlen, so larger matches would come first and you wouldn't need to manage your array order manually.
Then you can use array_keys and array_values to act just like your separated $old and $new arrays.
$replacements = array(
'†' => '/â€/',
'’' => '/’/',
);
// sorts the replacements array by value string length keeping the indexes intact
uasort($replacements, function($a, $b) {
return strlen($b) - strlen($a);
});
$str = 'The programmer’s becoming very frustrated';
$result = preg_replace(array_values($replacements), array_keys($replacements), $str);
EDIT: As #CasimiretHippolyte pointed out, using array_values is not necessary on the first parameter of the preg_replace function in this case. It would only return a copy from $replacements with numerical indexes but the order would be the same. Unless you need an array with identical structure to $old from your question, you do not need to use it.
Order the arrays $old and $new in such way that the longest regex becomes first:
$old = array(
'/’/',
'/â€/',
);
$new = array(
'’',
'†',
);
$str = 'The programmer’s becoming very frustrated';
$result = preg_replace($old, $new, $str);
echo $result,"\n";
output:
The programmer’s becoming very frustrated
I don't believe there is a way to do this only using preg_replace. However you can easily do this sorting your array beforehand:
$replacements = array_combine($old, $new);
krsort($replacements);
$result = preg_repalce( array_keys($replacements), array_values($replacements), $string);

Categories