Detect latin1 characters in utf8 string - php

I get data from the database that is utf8 encoded. But somehow some old data contains latin1 characters.
So this
$encod = mb_detect_encoding($string, 'UTF-8', true);
always is correct.
Is it safe to always use utf8_decode() to check for latin1 characters like 'äöüß'???
$string = utf8_decode($string);
$search = Array(" ", "ä", "ö", "ü", "ß", "."); //,"/Ä/","/Ö/","/Ü/");
$replace = Array("-", "ae", "oe", "ue", "ss", "-"); //,"Ae","Oe","Ue");
$string = str_replace($search, $replace, strtolower($string));
Regards

It seems to work without the utf8_encoding:
<?php
$string = "äöüß";
$search = Array(" ", "ä", "ö", "ü", "ß", "."); //,"/Ä/","/Ö/","/Ü/");
$replace = Array("-", "ae", "oe", "ue", "ss", "-"); //,"Ae","Oe","Ue");
$string = str_replace($search, $replace, strtolower($string));
echo $string;
?>
DEMO: http://codepad.org/HGTyHkBU

Use htmlspecialchars(); it is more safer for work.
More info:
http://php.net/manual/en/function.htmlspecialchars.php

Related

Adjust code by changing function eregi_replace [duplicate]

This question already has answers here:
How can I convert ereg expressions to preg in PHP?
(4 answers)
Deprecated: Function eregi_replace() [duplicate]
(2 answers)
Closed 4 years ago.
How can I adjust my code by replacing the "eregi_replace" function with another one that does the same thing?
I know the alternatives, but I do not know how to update the code. O erro que recebo é:
PHP Deprecated: Function eregi_replace
function url($String){
$Separador = "-";
$String = trim($String);
$String = strtolower($String);
$String = strip_tags($String);
$String = eregi_replace("[[:space:]]", $Separador, $String);
$String = eregi_replace("[çÇ]", "c", $String);
$String = eregi_replace("[áÁäÄàÀãÃâÂ]", "a", $String);
$String = eregi_replace("[éÉëËèÈêÊ]", "e", $String);
$String = eregi_replace("[íÍïÏìÌîÎ]", "i", $String);
$String = eregi_replace("[óÓöÖòÒõÕôÔ]", "o", $String);
$String = eregi_replace("[úÚüÜùÙûÛ]", "u", $String);
$String = eregi_replace("(\()|(\))", $Separador, $String);
$String = eregi_replace("(\/)|(\\\)", $Separador, $String);
$String = eregi_replace("(\[)|(\])", $Separador, $String);
$String = eregi_replace("[#®#\$%&\*\+=\|º]", $Separador, $String);
$String = eregi_replace("[;:'\"<>,\.?!_]", $Separador, $String);
$String = eregi_replace("[“”]", $Separador, $String);
$String = eregi_replace("(ª)+", $Separador, $String);
$String = eregi_replace("[´~^°]", $Separador, $String);
$String = eregi_replace("($Separador)+", $Separador, $String);
$String = substr($String, 0, 100);
$String = eregi_replace("(^($Separador)+)|(($Separador)+$)", "", $String);
$String = str_replace("-", $Separador, $String);
return $String;
}

Error: Nothing to repeat at offset error during a preg_match_all in PHP

I need to find if a file name contains some special characters I don't want.
I'm using this code actually:
$files = array("logo.png", "légo.png");
$badChars = array(" ", "é", "É", "è", "È", "à", "À", "ç", "Ç", "¨", "^", "=", "/", "*", "-", "+", "'", "<", ">", ":", ";", ",", "`", "~", "/", "", "|", "!", "#", "#", "$", "%", "?", "&", "(", ")", "¬", "{", "}", "[", "]", "ù", "Ù", '"', "«", "»");
$matches = array();
foreach($files as $file) {
$matchFound = preg_match_all("#\b(" . implode("|", $badChars) . ")\b#i", $file, $matches);
}
if ($matchFound) {
$words = array_unique($matches[0]);
foreach($words as $word) {
$results[] = array('Error' => "Forbided chars found : ". $word);
}
}
else {
$results[] = array('Success' => "OK.");
}
But I have an error saying:
Warning: preg_match_all(): Compilation failed: nothing to repeat at offset 38 in /home/public_html/upload.php on line 138
Which is:
$matchFound = preg_match_all("#\b(" . implode("|", $badChars) . ")\b#i", $file, $matches);
Any help or clue?
it is because ? * + are quantifiers. Since they are not escaped you obtain this error: |? there is obviously nothing to repeat.
For your task you don't need to use an alternation, a character class should suffice:
if (preg_match_all('~[] éèàç¨^=/*-+\'<>:;,`\~/|!##$%?&()¬{}[ù"«»]~ui', $file, $m)) {
$m = array_unique($m[0]);
$m = array_map(function ($i) use ($file) { return array('Error' => 'Forbidden character found : ' . $i . ' in ' . $file); }, $m);
$results = array_merge($results, $m);
}
or perhaps this pattern: ~[^[:alnum:]]~
It's because your characters have * in it, which tries to repeat the previous character, which in your case ends up being |, which is invalid. Your regex turns into:
..... |/|*|-| .....
Map preg_quote() to your character array before your loop and you'll be fine:
$badChars = array_map( 'preg_quote', $badChars);
Just make sure that since you're not specifying your delimiter # in the call to preg_quote(), you'll have to manually escape it in your $badChars array.

PHP speed up replace?

How can I increase the performance of the following code:
$text = str_replace("A", "B", $text);
$text = str_replace("f", "F", $text);
$text = str_replace("c", "S", $text);
$text = str_replace("4", "G", $text);
//more str_replace here
Do it as one function call:
$text = str_replace(["A","f","c","4"], ["B","F","S","G"], $text);

special character replacement not working

I have written this method to replace special characters:
function replace_sonder($string)
{
$string2 = str_replace("ä", "ä", $string);
$string2 = str_replace("%E4", "ä", $string2);
$string2 = str_replace("ö", "ö", $string2);
$string2 = str_replace("%F6", "ö", $string2);
$string2 = str_replace("ü", "ü", $string2);
$string2 = str_replace("%FC", "ü", $string2);
$string2 = str_replace("Ä", "Ä", $string2);
$string2 = str_replace("%C4", "Ä", $string2);
$string2 = str_replace("Ö", "Ö", $string2);
$string2 = str_replace("%D6", "Ö", $string2);
$string2 = str_replace("Ü", "Ü", $string2);
$string2 = str_replace("%DC", "Ü", $string2);
$string2 = str_replace("ß", "ß", $string2);
$string2 = str_replace("%DF", "ß", $string2);
return $string2;
}
it always returns the same string that I pass in. Where am I missing something or is there an alternative way to do this?
$string = preg_replace("/ä/", "ä", $string);
...
but better way is:
$string = htmlentities($string, ENT_QUOTES);
Check the output you're comparing is not to an HTML page as it will convert the characters back again.

is it possible to shrink preg_replace and str_replace

is it possible to make this more smooth with less line of codes since i have to repeat it for every new box i need to insert it into.
$fil_namn = str_replace("5FSE_", "", $fil_url);
$fil_namn = str_replace(".pdf", "", $fil_namn);
$fil_namn = str_replace(".docx", "", $fil_namn);
$fil_namn = str_replace(".doc", "", $fil_namn);
$fil_namn = preg_replace("[_]",". ",$fil_namn);
$fil_namn = preg_replace('/^[0-9]+\. +/','', $fil_namn);
$fil_namn = preg_replace ("[AaA]","å",$fil_namn);
$fil_namn = preg_replace ("[AeA]","ä",$fil_namn);
$fil_namn = preg_replace ("[OoO]","ö",$fil_namn);
$fil_namn = preg_replace ("[aAa]","Å",$fil_namn);
$fil_namn = preg_replace ("[aEa]","Ä",$fil_namn);
$fil_namn = preg_replace ("[oOo]","ö",$fil_namn);
$fil_namn= str_replace("."," ", $fil_namn);
You could use this:
str_replace(array('5FSE_', '.pdf', '.docx', '.doc'), '', $fill_namn);
str_replace allows for arrays.
You can also do this:
$string = "Hello";
echo str_replace(array("H", "e", "l", "o"), array("A", "l", "e", "x"), $string);
This will print out Aeeex.
Another method would be to use the strtr() function:
$string = "[AaA][AeA][OoO][aAa][aEa][oOo]";
$find = array("[AaA]", "[AeA]", "[OoO]", "[aAa]", "[aEa]", "[oOo]");
$replace = array("å", "ä", "ö", "Å", "Ä", "ö");
echo strtr($string, array_combine($find, $replace));
This echoes out:
åäöÅÄö

Categories