Remove accents - Replace accented letters by letters without accents with str_replace - php

str_replace does not replace accented letters by letters without accent. What's wrong with that?
This returns the expected result:
<?php
$string = get_post_custom_values ("text");
// Say get_post_custom_values ​​("text") equals "José José"
$string = str_replace(" ", "-", $string);
echo $string [0];
// Output "José-José"
?>
This does not work:
<?php
$string = get_post_custom_values ("text");
// Say get_post_custom_values ​​("text") equals "Joseph Joseph"
$string = str_replace("é", "e", $string);
echo $string [0];
// Output "José José". Nothing has changed
?>
Note: Translated from the Portuguese language with GoogleTranslate.

The easy, safe way to remove every accented letters is by using iconv :
setlocale(LC_ALL, "fr_CA.utf8"); // for instance
$output = iconv("utf-8", "ascii//TRANSLIT", $input);
Your current problem is most likely caused by a different encoding.

The character é as saved in your source code is not in the same encoding as the data you get back from get_post_custom_values. Encoding doesn't match → not recognized as the same character → not replaced.

Related

Replace illegal charactes in a text by underscore in PHP

i need to replace the illegal characters by underscore(_),
For Example:
if user given text is "imageЙ ййé.png" need to replace this Й йй characters by _ __ So the overall output must be image_ __é.png. And this replacing must not occur for french characters. I have worked check the below code and help me to get the output.
<?php
$allowed_char_array=array("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z","à","á","â","ã","ä","å","æ","ç","è","é","ê","ë","ì","í","î","ï","ñ","ò","ó","ô","õ","ö","ð","ø","œ","š","Þ","ù","ú","û","ü","ý","ÿ","ž","0","1","2","3","4","5","6","7","8","9"," ","(",")","-","_",".","#","#","$","%","*","¢","ß","¥","£","™","©","®","ª","×","÷","±","+","-","²","³","¼","½","¾","µ","¿","¶","·","¸","º","°","¯","§","…","¤","¦","≠","¬","ˆ","¨","‰");
$word = 'imageЙ ййé.png';
$file_name = url_rewrite(trim($word));
$file_name2 = strtolower($file_name);
$split = str_split($file_name2);
if(is_array($split) && is_array($allowed_char_array)){
$result=array_diff($split,$allowed_char_array);
echo '<pre>';
print_r($split);
echo '<pre>';
print_r($allowed_char_array);
echo '<pre>';
print_r($result);
}
function url_rewrite($chaine) {
// On va formater la chaine de caractère
// On remplace pour ne plus avoir d'accents
$accents = array('é','à','è','À','É','È');
$sans = array('é','à','è','À','É','È');
$chaine = str_replace($accents, $sans, $chaine);
return $chaine;
}
?>
I would build a regex (character class, to be exact) using your whitelisted characters, and then remove any character which matches the negation of that class.
$allowed_char_array = array("a","b","c","d","e") // and others
$chars = implode("", $allowed_char_array);
$regex = "/[^" . $chars . "]/u";
$input = "imageЙ ййé.png";
echo $regex . "\n";
$output = preg_replace($regex, "_", $input);
echo $input . "\n" . $output;
imageЙ ййé.png
image_ __é.png
If the above be not clear, here is what the actual all to preg_replace would look like:
preg_replace("/[^abcdefghijklmnopqrstuv]/u, "_", $input);
That is, any non whitelisted character would be replaced with just underscore. I did not bother to list out the entire character class, because you already have that in your source code.
Note that the /u flag in the regex is critical here, because your input string is a UTF-8 string. UTF-8 characters may consist of more than one byte, and using preg_replace on them without /u may have unexpected results.
You will want to use mb_strtolower() to convert multibyte characters to lowercase safely.
My solution uses strtr() to convert your French accented letters to your preferred form.
Since all characters are lowercased from the onset, you can halve your white list of French characters.
Using pathinfo() helps you to dissect your filename.
Code: (Demo)
$word = 'imageЙ ййé.png';
$parts = pathinfo($word);
$filename = strtr(mb_strtolower($parts['filename']), ['é' =>'é', 'à' => 'à','è' => 'è']);
echo preg_replace('~[^ a-zéàè]~u', '_', $filename) , "." , $parts['extension'];
Output:
image_ __é.png

ucwords not capitalizing accented letters

I have a string with all letters capitalized. I'm using the ucwords() and the mb_strtolower() functions to capitalize only the first letter of a string. But I'm having some problems when the first letter of a word have a accent. For example:
ucwords(mb_strtolower('GRANDE ÁRVORE')); //outputs 'Grande árvore'
Why the first letter of the second word is not being capitalized? What can I do to solve this?
ucwords is one of the core PHP functions which is blissfully oblivious to non-ASCII or non-Latin-1 encodings.* For handling multibyte strings and/or non-ASCII strings, you should use the multibyte aware mb_convert_case:
mb_convert_case($str, MB_CASE_TITLE, 'UTF-8')
// your string encoding here --------^^^^^^^
* I'm not entirely sure whether it works only with ASCII or at least with Latin-1, but I wouldn't even bother to find out.
If you're looking to only capitalize the first letter only, here's a way to achieve it :
$s = "économie collégiale"
mb_strtoupper( mb_substr( $s, 0, 1 )) . mb_substr( $s, 1 )
// output : Économie collégiale
ucwords doesn't recognize the accented character. Try using mb_convert_case.
$str = 'GRANDE ÁRVORE';
function ucwords_accent($string)
{
if (mb_detect_encoding($string) != 'UTF-8') {
$string = mb_convert_case(utf8_encode($string), MB_CASE_TITLE, 'UTF-8');
} else {
$string = mb_convert_case($string, MB_CASE_TITLE, 'UTF-8');
}
return $string;
}
echo ucwords_accent($str);

Replace foreign characters

I need to be able to replace some common foreign characters with English equivalents before I store values into my db.
For example: æ replace with ae and ñ with n.
Do I use preg_replace?
Thanks
For single character of accents
$str = strtr($str,
"ÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÑÒÓÔÕÖØÝßàáâãäåçèéêëìíîïñòóôõöøùúûüýÿ",
"AAAAAACEEEEIIIINOOOOOOYSaaaaaaceeeeiiiinoooooouuuuyy");
For double character of accents (such as Æ, æ)
$match = array('æ', 'Æ');
$replace = array('ae', 'AE');
$str = str_replace($replace, $replace, $str);
You can define your convertable characters in an array, and use str_replace():
$conversions = array(
"æ" => "ae",
"ñ" => "n",
);
$text = str_replace(array_keys($conversions), $conversions, $text);
You can try iconv() with ASCII//TRANSLIT:
$text = iconv("UTF-8", "ASCII//TRANSLIT", $text);
Excuse me second-guessing why you're doing this, but..
If this is for search matching: The point of character set collation in a MySQL (for example), is that you can search for "n" and still match "ñ"
IF this is for display purposes: I'd recommend if you have to do this, you do it when you display the text to a user. You can never get your original data back otherwise.

Remove all non-matching characters in PHP string?

I've got text from which I want to remove all characters that ARE NOT the following.
desired_characters =
0123456789!&',-./abcdefghijklmnopqrstuvwxyz\n
The last is a \n (newline) that I do want to keep.
To match all characters except the listed ones, use an inverted character set [^…]:
$chars = "0123456789!&',-./abcdefghijklmnopqrstuvwxyz\n";
$pattern = "/[^".preg_quote($chars, "/")."]/";
Here preg_quote is used to escape certain special characters so that they are interpreted as literal characters.
You could also use character ranges to express the listed characters:
$pattern = "/[^0-9!&',-.\\/a-z\n]/";
In this case it doesn’t matter if the literal - in ,-. is escaped or not. Because ,-. is interpreted as character range from , (0x2C) to . (0x2E) that already contains the - (0x2D) in between.
Then you can remove those characters that are matched with preg_replace:
$output = preg_replace($pattern, "", $str);
$string = 'This is anexample $tring! :)';
$string = preg_replace('/[^0-9!&\',\-.\/a-z\n]/', '', $string);
echo $string; // hisisanexampletring!
^ This is case sensitive, hence the capital T is removed from the string. To allow capital letters as well, $string = preg_replace('/[^0-9!&\',\-.\/A-Za-z\n]/', '', $string)

PHP preg_replace oddity with £ pound sign and ã

I am applying the following function
<?php
function replaceChar($string){
$new_string = preg_replace("/[^a-zA-Z0-9\sçéèêëñòóôõöàáâäåìíîïùúûüýÿ]/", "", $string);
return $new_string;
}
$string = "This is some text and numbers 12345 and symbols !£%^#&$ and foreign letters éèêëñòóôõöàáâäåìíîïùúûüýÿ";
echo replaceChar($string);
?>
which works fine but if I add ã to the preg_replace like
$new_string = preg_replace("/[^a-zA-Z0-9\sçéèêëñòóôõöàáâãäåìíîïùúûüýÿ]/", "", $string);
$string = "This is some text and numbers 12345 and symbols !£%^#&$ and foreign letters éèêëñòóôõöàáâäåìíîïùúûüýÿã";
It conflicts with the pound sign £ and replaces the pound sign with the unidentified question mark in black square.
This is not critical but does anyone know why this is?
Thank you,
Barry
UPDATE: Thank you all. Changed functions adding the u modifier: pt2.php.net/manual/en/… – as suggested by Artefacto and works a treat
function replaceChar($string){
$new_string = preg_replace("/[^a-zA-Z0-9\sçéèêëñòóôõøöàáâãäåìíîïùúûüýÿ]/u", "", $string);
return $new_string;
}
If your string is in UTF-8, you must add the u modifier to the regex. Like this:
function replaceChar($string){
$new_string = preg_replace("/[^a-zA-Z0-9\sçéèêëñòóôõöàáâäåìíîïùúûüýÿ]/u", "", $string);
return $new_string;
}
$string = "This is some text and numbers 12345 and symbols !£%^#&$ and foreign letters éèêëñòóôõöàáâäåìíîïùúûüýÿ";
echo replaceChar($string);
Chances are that your string is UTF-8, but preg_replace() is working on bytes
that code is valid ...
maybe you should try Central-European character encoding
<?php
header ('Content-type: text/html; charset=ISO-8859-2');
?>
You might want to take a look at mb_ereg_replace(). As Mark mentioned preg_replace only works on byte level and does not work well with multibyte character encodings.
Cheers,
Fabian

Categories