PHP preg_replace oddity with £ pound sign and ã - php

I am applying the following function
<?php
function replaceChar($string){
$new_string = preg_replace("/[^a-zA-Z0-9\sçéèêëñòóôõöàáâäåìíîïùúûüýÿ]/", "", $string);
return $new_string;
}
$string = "This is some text and numbers 12345 and symbols !£%^#&$ and foreign letters éèêëñòóôõöàáâäåìíîïùúûüýÿ";
echo replaceChar($string);
?>
which works fine but if I add ã to the preg_replace like
$new_string = preg_replace("/[^a-zA-Z0-9\sçéèêëñòóôõöàáâãäåìíîïùúûüýÿ]/", "", $string);
$string = "This is some text and numbers 12345 and symbols !£%^#&$ and foreign letters éèêëñòóôõöàáâäåìíîïùúûüýÿã";
It conflicts with the pound sign £ and replaces the pound sign with the unidentified question mark in black square.
This is not critical but does anyone know why this is?
Thank you,
Barry
UPDATE: Thank you all. Changed functions adding the u modifier: pt2.php.net/manual/en/… – as suggested by Artefacto and works a treat
function replaceChar($string){
$new_string = preg_replace("/[^a-zA-Z0-9\sçéèêëñòóôõøöàáâãäåìíîïùúûüýÿ]/u", "", $string);
return $new_string;
}

If your string is in UTF-8, you must add the u modifier to the regex. Like this:
function replaceChar($string){
$new_string = preg_replace("/[^a-zA-Z0-9\sçéèêëñòóôõöàáâäåìíîïùúûüýÿ]/u", "", $string);
return $new_string;
}
$string = "This is some text and numbers 12345 and symbols !£%^#&$ and foreign letters éèêëñòóôõöàáâäåìíîïùúûüýÿ";
echo replaceChar($string);

Chances are that your string is UTF-8, but preg_replace() is working on bytes

that code is valid ...
maybe you should try Central-European character encoding
<?php
header ('Content-type: text/html; charset=ISO-8859-2');
?>

You might want to take a look at mb_ereg_replace(). As Mark mentioned preg_replace only works on byte level and does not work well with multibyte character encodings.
Cheers,
Fabian

Related

Remove � Special Character from String

I have been trying to remove junk character from a stream of html strings using PHP but haven't been successfull yet. Is there any special syntax or logics to remove special character from the string?
I had tried this so far, but ain't working
$new_string = preg_replace("�", "", $HtmlText);
echo '<pre>'.$new_string.'</pre>';
\p{S}
You can use this.\p{S} matches math symbols, currency signs, dingbats, box-drawing characters, etc
See demo.
https://www.regex101.com/r/rK5lU1/30
$re = "/\\p{S}/i";
$str = "asdas�sadsad";
$subst = "";
$result = preg_replace($re, $subst, $str);
This is due to mismatch in Charset between database and front-end. Correcting this will fix the issue.
function clean($string) {
return preg_replace('/[^A-Za-z0-9\-]/', '', $string); // Removes special chars.
}

Extract digit from unicode string - PHP RegExpression

I am Parsing a web page for getting the web page prize. the prize include a Rupee symbol (₹).
So i used preg_replace to extract digits.
For example:
$str='₹ 1,195 ';
echo preg_replace("/[^0-9]/", '', $str);
Output is :
2091195
I tried same code to execute on http://writecodeonline.com/php/.
There i m getting correct output 1195.
I'm not getting what is the problem.
Thanks in Advance
If the unicode string is UTF-8, you can use the u (PCRE_UTF8) modifierDocs to tell preg_replace that it should use UTF-8 mode. If not, re-encode it to UTF-8 first and then use the modifier.
Example (Demo):
$subject = '₹ 1,195 ';
$pattern = "/[^0-9]/u";
$result = preg_replace($pattern, '', $subject);
echo $result;

Remove accents - Replace accented letters by letters without accents with str_replace

str_replace does not replace accented letters by letters without accent. What's wrong with that?
This returns the expected result:
<?php
$string = get_post_custom_values ("text");
// Say get_post_custom_values ​​("text") equals "José José"
$string = str_replace(" ", "-", $string);
echo $string [0];
// Output "José-José"
?>
This does not work:
<?php
$string = get_post_custom_values ("text");
// Say get_post_custom_values ​​("text") equals "Joseph Joseph"
$string = str_replace("é", "e", $string);
echo $string [0];
// Output "José José". Nothing has changed
?>
Note: Translated from the Portuguese language with GoogleTranslate.
The easy, safe way to remove every accented letters is by using iconv :
setlocale(LC_ALL, "fr_CA.utf8"); // for instance
$output = iconv("utf-8", "ascii//TRANSLIT", $input);
Your current problem is most likely caused by a different encoding.
The character é as saved in your source code is not in the same encoding as the data you get back from get_post_custom_values. Encoding doesn't match → not recognized as the same character → not replaced.

Why would str_replace not replace question marks?

I have a string with question marks ('?') in it and I want to replace it with something parsable.
However str_replace will not replace any ? characters in my string...
$str = str_replace('?', 'replacement', $str);
Any ideas?
That code does replace question marks with the word replacement, which means that's not the code you're using, or what's in your string is not a question mark.
PHP's string functions only operate correctly on latin1 (iso-8859-1) encoded strings. In many encodings there may be many codepoints that correspond to a glyph that looks like a question mark visually, but is not the same as ASCII ?.
$str = "Hello? Anyone home?";
$str = str_replace('?', 'replacement', $str);
echo $str;
Output:
Helloreplacement Anyone homereplacement
This will replace ? in to space. This may help you.
<?php
$str = "this ? does ? indeed ? work";
$char='';
$str1 = str_replace('?',$char,$str);
echo $str1;
?>

Remove garbage characters in arabic

I needed to remove all non Arabic characters from a string and eventually with the help of people from stack-overflow was able to come up with the following regex to get rid of all characters which are not Arabic.
preg_replace('/[^\x{0600}-\x{06FF}]/u','',$string);
The problem is the above removes white spaces too. And now I discovered I would need character from A-Z,a-z,0-9, !##$%^&*() also. So how do I need to modify the regex?
Thanking you
Add the ones you want to keep to your character class:
preg_replace('/[^\x{0600}-\x{06FF}A-Za-z !##$%^&*()]/u','', $string);
assume you have this string:
$str = "Arabic Text نص عربي test 123 و,.m,............ ~~~ ٍ،]ٍْ}~ِ]ٍ}";
this will keep arabic chars with spaces only.
echo preg_replace('/[^أ-ي ]/ui', '', $str);
this will keep Arabic and English chars with Numbers Only
echo preg_replace('/[^أ-يA-Za-z0-9 ]/ui', '', $str);
this will answer your question latterly.
echo preg_replace('/[^أ-يA-Za-z !##$%^&*()]/ui', '', $str);
In a more detailed manner from Above example, Considering below is your string:
$string = '<div>This..</div> <a>is<a/> <strong>hello</strong> <i>world</i> ! هذا هو مرحبا العالم! !##$%^&&**(*)<>?:";p[]"/.,\|`~1##$%^&^&*(()908978867564564534423412313`1`` "Arabic Text نص عربي test 123 و,.m,............ ~~~ ٍ،]ٍْ}~ِ]ٍ}"; ';
Code:
echo preg_replace('/[^\x{0600}-\x{06FF}A-Za-z0-9 !##$%^&*().]/u','', strip_tags($string));
Allows: English letters, Arabic letters, 0 to 9 and characters !##$%^&*().
Removes: All html tags, and special characters other than above

Categories