Replace foreign characters - php

I need to be able to replace some common foreign characters with English equivalents before I store values into my db.
For example: æ replace with ae and ñ with n.
Do I use preg_replace?
Thanks

For single character of accents
$str = strtr($str,
"ÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÑÒÓÔÕÖØÝßàáâãäåçèéêëìíîïñòóôõöøùúûüýÿ",
"AAAAAACEEEEIIIINOOOOOOYSaaaaaaceeeeiiiinoooooouuuuyy");
For double character of accents (such as Æ, æ)
$match = array('æ', 'Æ');
$replace = array('ae', 'AE');
$str = str_replace($replace, $replace, $str);

You can define your convertable characters in an array, and use str_replace():
$conversions = array(
"æ" => "ae",
"ñ" => "n",
);
$text = str_replace(array_keys($conversions), $conversions, $text);

You can try iconv() with ASCII//TRANSLIT:
$text = iconv("UTF-8", "ASCII//TRANSLIT", $text);

Excuse me second-guessing why you're doing this, but..
If this is for search matching: The point of character set collation in a MySQL (for example), is that you can search for "n" and still match "ñ"
IF this is for display purposes: I'd recommend if you have to do this, you do it when you display the text to a user. You can never get your original data back otherwise.

Related

preg_replace, where is the mistake?

I want to replace, in a dictionary, the letters that start with a vowel with an acute accent, by the letter a, but when I put 'á' it doesn't read it to me and therefore it doesn't replace it. How can I make the function read it to me that special character?
$letter_in_progress = 'área';
$pattern = '/á/i';
echo preg_replace($pattern, 'a', $letter_in_progress);
If you want to remove any accents, you can transliterate them to ASCII.
$letter_in_progress = 'área';
echo iconv('UTF-8', 'ASCII//TRANSLIT', $letter_in_progress);
area

regex to also match accented characters

I have the following PHP code:
$search = "foo bar que";
$search_string = str_replace(" ", "|", $search);
$text = "This is my foo text with qué and other accented characters.";
$text = preg_replace("/$search_string/i", "<b>$0</b>", $text);
echo $text;
Obviously, "que" does not match "qué". How can I change that? Is there a way to make preg_replace ignore all accents?
The characters that have to match (Spanish):
á,Á,é,É,í,Í,ó,Ó,ú,Ú,ñ,Ñ
I don't want to replace all accented characters before applying the regex, because the characters in the text should stay the same:
"This is my foo text with qué and other accented characters."
and not
"This is my foo text with que and other accented characters."
The solution I finally used:
$search_for_preg = str_ireplace(["e","a","o","i","u","n"],
["[eé]","[aá]","[oó]","[ií]","[uú]","[nñ]"],
$search_string);
$text = preg_replace("/$search_for_preg/iu", "<b>$0</b>", $text)."\n";
$search = str_replace(
['a','e','i','o','u','ñ'],
['[aá]','[eé]','[ií]','[oó]','[uú]','[nñ]'],
$search)
This and the same for upper case will complain your request. A side note: ñ replacemet sounds invalid to me, as 'niño' is totaly diferent from 'nino'
If you want to use the captured text in the replacement string, you have to use character classes in your $search variable (anyway, you set it manually):
$search = "foo bar qu[eé]"
And so on.
You could try defining an array like this:
$vowel_replacements = array(
"e" => "eé",
// Other letters mapped to their other versions
);
Then, before your preg_match call, do something like this:
foreach ($vowel_replacements as $vowel => $replacements) {
str_replace($search_string, "$vowel", "[$replacements]");
}
If I'm remembering my PHP right, that should replace your vowels with a character class of their accented forms -- which will keep it in place. It also lets you change the search string far more easily; you don't have to remember to replaced the vowels with their character classes. All you have to remember is to use the non-accented form in your search string.
(If there's some special syntax I'm forgetting that does this without a foreach, please comment and let me know.)

Remove accents - Replace accented letters by letters without accents with str_replace

str_replace does not replace accented letters by letters without accent. What's wrong with that?
This returns the expected result:
<?php
$string = get_post_custom_values ("text");
// Say get_post_custom_values ​​("text") equals "José José"
$string = str_replace(" ", "-", $string);
echo $string [0];
// Output "José-José"
?>
This does not work:
<?php
$string = get_post_custom_values ("text");
// Say get_post_custom_values ​​("text") equals "Joseph Joseph"
$string = str_replace("é", "e", $string);
echo $string [0];
// Output "José José". Nothing has changed
?>
Note: Translated from the Portuguese language with GoogleTranslate.
The easy, safe way to remove every accented letters is by using iconv :
setlocale(LC_ALL, "fr_CA.utf8"); // for instance
$output = iconv("utf-8", "ascii//TRANSLIT", $input);
Your current problem is most likely caused by a different encoding.
The character é as saved in your source code is not in the same encoding as the data you get back from get_post_custom_values. Encoding doesn't match → not recognized as the same character → not replaced.

Replace selected characters in PHP string

I know this question has been asked several times for sure, but I have my problems with regular expressions... So here is the (simple) thing I want to do in PHP:
I want to make a function which replaces unwanted characters of strings. Accepted characters should be:
a-z A-Z 0-9 _ - + ( ) { } # äöü ÄÖÜ space
I want all other characters to change to a "_". Here is some sample code, but I don't know what to fill in for the ?????:
<?php
// sample strings
$string1 = 'abd92 s_öse';
$string2 = 'ab! sd$ls_o';
// Replace unwanted chars in string by _
$string1 = preg_replace(?????, '_', $string1);
$string2 = preg_replace(?????, '_', $string2);
?>
Output should be:
$string1: abd92 s_öse (the same)
$string2: ab_ sd_ls_o
I was able to make it work for a-z, 0-9 but it would be nice to allow those additional characters, especially äöü. Thanks for your input!
To allow only the exact characters you described:
$str = preg_replace("/[^a-zA-Z0-9_+(){}#äöüÄÖÜ -]/", "_", $str);
To allow all whitespace, not just the (space) character:
$str = preg_replace("/[^a-zA-Z0-9_+(){}#äöüÄÖÜ\s-]/", "_", $str);
To allow letters from different alphabets -- not just the specific ones you mentioned, but also things like Russian and Greek, or other types of accent marks:
$str = preg_replace("/[^\w+(){}#\s-]/", "_", $str);
If I were you, I'd go with the last one. Not only is it shorter and easier to read, but it's less restrictive, and there's no particular advantage to blocking stuff like и if äöüÄÖÜ are all fine.
Replace [^a-zA-Z0-9_\-+(){}#äöüÄÖÜ ] with _.
$string1 = preg_replace('/[^a-zA-Z0-9_\-+(){}#äöüÄÖÜ ]/', '_', $string1);
This replaces any characters except the ones after ^ in the [character set]
Edit: escaped the - dash.

PHP preg_replace oddity with £ pound sign and ã

I am applying the following function
<?php
function replaceChar($string){
$new_string = preg_replace("/[^a-zA-Z0-9\sçéèêëñòóôõöàáâäåìíîïùúûüýÿ]/", "", $string);
return $new_string;
}
$string = "This is some text and numbers 12345 and symbols !£%^#&$ and foreign letters éèêëñòóôõöàáâäåìíîïùúûüýÿ";
echo replaceChar($string);
?>
which works fine but if I add ã to the preg_replace like
$new_string = preg_replace("/[^a-zA-Z0-9\sçéèêëñòóôõöàáâãäåìíîïùúûüýÿ]/", "", $string);
$string = "This is some text and numbers 12345 and symbols !£%^#&$ and foreign letters éèêëñòóôõöàáâäåìíîïùúûüýÿã";
It conflicts with the pound sign £ and replaces the pound sign with the unidentified question mark in black square.
This is not critical but does anyone know why this is?
Thank you,
Barry
UPDATE: Thank you all. Changed functions adding the u modifier: pt2.php.net/manual/en/… – as suggested by Artefacto and works a treat
function replaceChar($string){
$new_string = preg_replace("/[^a-zA-Z0-9\sçéèêëñòóôõøöàáâãäåìíîïùúûüýÿ]/u", "", $string);
return $new_string;
}
If your string is in UTF-8, you must add the u modifier to the regex. Like this:
function replaceChar($string){
$new_string = preg_replace("/[^a-zA-Z0-9\sçéèêëñòóôõöàáâäåìíîïùúûüýÿ]/u", "", $string);
return $new_string;
}
$string = "This is some text and numbers 12345 and symbols !£%^#&$ and foreign letters éèêëñòóôõöàáâäåìíîïùúûüýÿ";
echo replaceChar($string);
Chances are that your string is UTF-8, but preg_replace() is working on bytes
that code is valid ...
maybe you should try Central-European character encoding
<?php
header ('Content-type: text/html; charset=ISO-8859-2');
?>
You might want to take a look at mb_ereg_replace(). As Mark mentioned preg_replace only works on byte level and does not work well with multibyte character encodings.
Cheers,
Fabian

Categories