### Result of sort
sort($keys_arranged, SORT_NUMERIC);
var_dump($keys_arranged);
{ ...
[51]=> float(11.903327742296)
[52]=> int(5)
[53]=> float(13.165002546636)
[54]=> float(14.478273306964)
[55]=> float(4.6264742674547)
[56]=> float(13.290508819344)
[57]=> float(15.686809055276) }
### Result or rsort
rsort($keys_arranged, SORT_NUMERIC);
var_dump($keys_arranged);
{
[0]=> float(15.686809055276)
[1]=> float(14.478273306964)
[2]=> float(13.290508819344)
[3]=> float(13.165002546636)
[4]=> float(11.903327742296)
[5]=> int(5)
[6]=> float(4.6264742674547)
... }
echo var_export($keys_arranged);
array ( 0 => 3.142678516658294, 1 => 1.0, 2 => 1.0, 3 => 14.478273306963985, 4 => 13.165002546635966, 5 => 1.0, 6 => 1.0005037081114851, 7 => 1.0, 8 => 4.6264742674547001, 9 => 15.686809055275578, 10 => 1.0, 11 => 11.903327742295504, 12 => 13.29050881934397, 13 => 1.0, 14 => 1.0, 15 => 3.5421134937189365, 16 => 1.0, 17 => 0.010999999999999999, 18 => 3.2999566681750605, 19 => 5, 20 => 1.2282984802843129, 21 => 1.0, 22 => 2.9748253120971184, 23 => 0.44855992975075798, 24 => 0.99999999999999989, 25 => 3.8350475954623371, 26 => 1.0625975061426283, 27 => 1.0000072792091179, 28 => 0.99999987785487132, 29 => 1, 30 => 0.0, 31 => 1.0, 32 => 1.0, 33 => 1.0, 34 => 0.0, 35 => 1.0972568578553616, 36 => 1.0, 37 => 1.4077661823957415, 38 => 1.0, 39 => 0.0, 40 => 3.6038030347555705, 41 => 1.0, 42 => 1.0, 43 => 1.0636876768842174, 44 => 1.0, 45 => NAN, 46 => 1.0, 47 => NAN, 48 => NAN, 49 => NAN, 50 => NAN, 51 => 1.0, 52 => 1.0, 53 => NAN, 54 => 0.99958680716631509, 55 => 1.0, 56 => NAN, 57 => 1.0, )
I got some incorrect result from array_multisort. It has 8 different keys and all the keys but this key(say A) works fine.
To figure out why that happened, I played with A in the array. And I could see that not only in the function of array_multisort, but also in 'sort' function, an incorrect result prompted as well.
As you can see the result of rsort seems okay, the result of sort works strangely when it comes to
13.16 < 14.47 < 4.62 < 13.29 < 15.68
Does anyone know why this occurred?
// Data in readible format
array (
0 => 3.142678516658294,
1 => 1.0,
2 => 1.0,
3 => 14.478273306963985,
4 => 13.165002546635966,
5 => 1.0,
6 => 1.0005037081114851,
7 => 1.0,
8 => 4.6264742674547001,
9 => 15.686809055275578,
10 => 1.0,
11 => 11.903327742295504,
12 => 13.29050881934397,
13 => 1.0,
14 => 1.0,
15 => 3.5421134937189365,
16 => 1.0,
17 => 0.010999999999999999,
18 => 3.2999566681750605,
19 => 5,
20 => 1.2282984802843129,
21 => 1.0,
22 => 2.9748253120971184,
23 => 0.44855992975075798,
24 => 0.99999999999999989,
25 => 3.8350475954623371,
26 => 1.0625975061426283,
27 => 1.0000072792091179,
28 => 0.99999987785487132,
29 => 1,
30 => 0.0,
31 => 1.0,
32 => 1.0,
33 => 1.0,
34 => 0.0,
35 => 1.0972568578553616,
36 => 1.0,
37 => 1.4077661823957415,
38 => 1.0,
39 => 0.0,
40 => 3.6038030347555705,
41 => 1.0,
42 => 1.0,
43 => 1.0636876768842174,
44 => 1.0,
45 => NAN,
46 => 1.0,
47 => NAN,
48 => NAN,
49 => NAN,
50 => NAN,
51 => 1.0,
52 => 1.0,
53 => NAN,
54 => 0.99958680716631509,
55 => 1.0,
56 => NAN,
57 => 1.0, )
It seems as you actually do have discovered a bug. On large arrays containing NaN values, represented by the NAN constant in PHP, the built-in function sort fails.
The comparision with NaN should always result as non-ordered as mentioned by kuh-chan in the question comments. However, as soon as there is at least one operand NaN, in recent (March 2019) versions of PHP the spaceship operator <=> returns 1 (first operand greater than second one) and -1 (first operand less than second one) in some other versions.
echo var_dump( NAN <=> 1.23 );
echo var_dump( 1.23 <=> NAN );
echo var_dump( NAN <=> -1.23 );
echo var_dump( -1.23 <=> NAN );
echo var_dump( NAN <=> 0 );
echo var_dump( NAN <=> NAN );
I'm not very famimilar with the internals of the Zend engine. However, I guess the sort algorithm terminates too soon on large arrays with several NaN values, likely in order to the order comparision algorithm alway returns "greater (or less) than" which probably leads to multiple exchanges of the same elements.
I experienced that calling sort multiple times seems to continue the sorting. However, that would not be a proper workaround.
You can use a custom comparision algorithm with usort instead. If you want to order NaN at the start of the array, return -1 when the first operand is NaN, 1 when the second operand is NaN and 0 (equal) if both are. Otherwise return the comparision result of the spaceship operator.
usort($keys_arranged,
function(&$v1, &$v2)
{
switch( (is_nan($v1) ? 1 : 0) | (is_nan($v2) ? 2 : 0) )
{
case 0: return $v1 <=> $v2;
case 1: return -1;
case 2: return 1;
case 3: return 0;
}
}
);
Similar bitwise logic as above, but in a more compressive and direct form:
usort( $keys_arranged,
function(&$v1, &$v2){ return -2 === ($b = (is_nan($v1) ? -1 : -2) ^ (is_nan($v2) ? -1 : 0)) ? $v1 <=> $v2 : $b; }
);
I need to remove a wrong schema from an URL submitted as a string.
I start with this consideration:
The only two allowed schemas are http:// and https://
Starting from this consideration, I need to remove from the string all wrongly formatted schemas like:
htp://example.com
htps://example.com
http:/example.com
https:/example.com
htpexample.com
htps://example.com
http:/example.com
https:/example.com
... many many more ...
My questions are:
Is there a way to calculate all possible variation of wrong schemas?
Which is the best approach to remove them from the given string?
ABOUT QUESTION 2
My first approach would be to create an array with all the wrong schemas and then use something like this:
$wrongSchemas = [/* here all the possible wrong schemas calculated from question 1 */];
$url = str_replace($wrongSchemas, '', $url);
But this approach relies on a correct order, because instead I risk to remove partial schemas and make it wrong anyway.
And anyway I need to find a way to create the array $wrongSchemas!
Any suggestion or any further consideration on the topic that I'm missing, is well appreciated.
use strpos check in your $url
if(strpos($url,"http://")!==0 && strpos($url,"https://")!==0){
//wrong schemas
$exp_string = "";
// $exp_string = strpos($url,"//")?"//":"/";
if(strpos($url,"://"))
$exp_string = "://";
elseif(strpos($url,":/"))
$exp_string = ":/";
elseif(strpos($url,":"))
$exp_string = ":";
elseif(strpos($url,"//"))
$exp_string = "//";
$temp = explode($exp_string,$url);
if(count($temp)>1){
$url = '';
for($i=1;$i<count($temp);$i++){
$url = $url.$temp[$i];
}
//if needed you can add http:// here: $url = 'http://'.$url;
}
}
echo $url;
//process for right schemas
If you need more validations like url starting with ht,htp,etc you can use checkdnsrr or gethostbyname functions to validate domains
After some more thoughts, thanks also to some of your comments (thank you #Frédéric Clausset), I decided that the best approach would be to build the array of wrong schemas on the fly:
$schemas = ['http', 'htp', 'htt', 'ht', 'hp', 'ttp', 'tp'];
$delimiters = ['://', ':/', ':', '//', '/'];
$wrongCombinations = [];
foreach ($schemas as $wrongSchema) {
foreach ($delimiters as $wrongDelimiter) {
$combination = $wrongSchema . $wrongDelimiter;
$combinationSecure = $wrongSchema . 's' . $wrongDelimiter;
if ('http://' !== $combination) {
$wrongCombinations[] = $combination;
}
if ('https://' !== $combinationSecure) {
$wrongCombinations[] = $combinationSecure;
}
}
}
This generates a long array like this:
array:68 [▼
0 => "http:/"
1 => "https:/"
2 => "http:"
3 => "https:"
4 => "http//"
5 => "https//"
6 => "http/"
7 => "https/"
8 => "htp://"
9 => "htps://"
10 => "htp:/"
11 => "htps:/"
12 => "htp:"
13 => "htps:"
14 => "htp//"
15 => "htps//"
16 => "htp/"
17 => "htps/"
18 => "htt://"
19 => "htts://"
20 => "htt:/"
21 => "htts:/"
22 => "htt:"
23 => "htts:"
24 => "htt//"
25 => "htts//"
26 => "htt/"
27 => "htts/"
28 => "ht://"
29 => "hts://"
30 => "ht:/"
31 => "hts:/"
32 => "ht:"
33 => "hts:"
34 => "ht//"
35 => "hts//"
36 => "ht/"
37 => "hts/"
38 => "hp://"
39 => "hps://"
40 => "hp:/"
41 => "hps:/"
42 => "hp:"
43 => "hps:"
44 => "hp//"
45 => "hps//"
46 => "hp/"
47 => "hps/"
48 => "ttp://"
49 => "ttps://"
50 => "ttp:/"
51 => "ttps:/"
52 => "ttp:"
53 => "ttps:"
54 => "ttp//"
55 => "ttps//"
56 => "ttp/"
57 => "ttps/"
58 => "tp://"
59 => "tps://"
60 => "tp:/"
61 => "tps:/"
62 => "tp:"
63 => "tps:"
64 => "tp//"
65 => "tps//"
66 => "tp/"
67 => "tps/"
]
Those 68 combinations are all the possible wrong ones.
Now, a simple str_replace removes the schema if it is one of the wrong ones:
str_replace($combinations, '', $domain);
This intercept wrong URLs like:
htp://example.com > example.com
http:/example.com > example.com
htps:/example.com > example.com
hps:/example.com > example.com
htp:example.com > example.com
many many more
all possible variations
Thank you to all of you for your answers that helped me to better clarify the problem and find my solution! :)
PS
If you find bugs or drawbacks in this code, please, let me know!
I would search for :// if exists in string, after a then a simple in_array with the substring
$url="https://www.example.com";
$validschemas=array("http","https");
$pos=strpos($url,"://");
if($pos)
{
$valid=in_array(substr($url,0,$pos),$validschemas)?true:false;
//If not valid schema, remove schema from url, if it os ok dont touch it.
if(!$valid)
$url=substr($url,$pos+3);
}
else
{
//No schema, then url continues being the same
}
I have my data in this format: U+597D or like this U+6211. I want to convert them to UTF-8 (original characters are 好 and 我). How can I do it?
$utf8string = html_entity_decode(preg_replace("/U\+([0-9A-F]{4})/", "&#x\\1;", $string), ENT_NOQUOTES, 'UTF-8');
is probably the simplest solution.
function utf8($num)
{
if($num<=0x7F) return chr($num);
if($num<=0x7FF) return chr(($num>>6)+192).chr(($num&63)+128);
if($num<=0xFFFF) return chr(($num>>12)+224).chr((($num>>6)&63)+128).chr(($num&63)+128);
if($num<=0x1FFFFF) return chr(($num>>18)+240).chr((($num>>12)&63)+128).chr((($num>>6)&63)+128).chr(($num&63)+128);
return '';
}
function uniord($c)
{
$ord0 = ord($c{0}); if ($ord0>=0 && $ord0<=127) return $ord0;
$ord1 = ord($c{1}); if ($ord0>=192 && $ord0<=223) return ($ord0-192)*64 + ($ord1-128);
$ord2 = ord($c{2}); if ($ord0>=224 && $ord0<=239) return ($ord0-224)*4096 + ($ord1-128)*64 + ($ord2-128);
$ord3 = ord($c{3}); if ($ord0>=240 && $ord0<=247) return ($ord0-240)*262144 + ($ord1-128)*4096 + ($ord2-128)*64 + ($ord3-128);
return false;
}
utf8() and uniord() try to mirror the chr() and ord() functions on php:
echo utf8(0x6211)."\n";
echo uniord(utf8(0x6211))."\n";
echo "U+".dechex(uniord(utf8(0x6211)))."\n";
//In your case:
$wo='U+6211';
$hao='U+597D';
echo utf8(hexdec(str_replace("U+","", $wo)))."\n";
echo utf8(hexdec(str_replace("U+","", $hao)))."\n";
output:
我
25105
U+6211
我
好
PHP 7+
As of PHP 7, you can use the Unicode codepoint escape syntax to do this.
echo "\u{597D}"; outputs 好.
I just wrote a polyfill for missing multibyte versions of ord and chr with the following in mind:
It defines functions mb_ord and mb_chr only if they don't already exist. If they do exist in your framework or some future version of PHP, the polyfill will be ignored.
It uses the widely used mbstring extension to do the conversion. If the mbstring extension is not loaded, it will use the iconv extension instead.
I also added functions for HTMLentities encoding / decoding and encoding / decoding to JSON format as well as some demo code for how to use these functions
Code
if (!function_exists('codepoint_encode')) {
function codepoint_encode($str) {
return substr(json_encode($str), 1, -1);
}
}
if (!function_exists('codepoint_decode')) {
function codepoint_decode($str) {
return json_decode(sprintf('"%s"', $str));
}
}
if (!function_exists('mb_internal_encoding')) {
function mb_internal_encoding($encoding = NULL) {
return ($from_encoding === NULL) ? iconv_get_encoding() : iconv_set_encoding($encoding);
}
}
if (!function_exists('mb_convert_encoding')) {
function mb_convert_encoding($str, $to_encoding, $from_encoding = NULL) {
return iconv(($from_encoding === NULL) ? mb_internal_encoding() : $from_encoding, $to_encoding, $str);
}
}
if (!function_exists('mb_chr')) {
function mb_chr($ord, $encoding = 'UTF-8') {
if ($encoding === 'UCS-4BE') {
return pack("N", $ord);
} else {
return mb_convert_encoding(mb_chr($ord, 'UCS-4BE'), $encoding, 'UCS-4BE');
}
}
}
if (!function_exists('mb_ord')) {
function mb_ord($char, $encoding = 'UTF-8') {
if ($encoding === 'UCS-4BE') {
list(, $ord) = (strlen($char) === 4) ? #unpack('N', $char) : #unpack('n', $char);
return $ord;
} else {
return mb_ord(mb_convert_encoding($char, 'UCS-4BE', $encoding), 'UCS-4BE');
}
}
}
if (!function_exists('mb_htmlentities')) {
function mb_htmlentities($string, $hex = true, $encoding = 'UTF-8') {
return preg_replace_callback('/[\x{80}-\x{10FFFF}]/u', function ($match) use ($hex) {
return sprintf($hex ? '&#x%X;' : '&#%d;', mb_ord($match[0]));
}, $string);
}
}
if (!function_exists('mb_html_entity_decode')) {
function mb_html_entity_decode($string, $flags = null, $encoding = 'UTF-8') {
return html_entity_decode($string, ($flags === NULL) ? ENT_COMPAT | ENT_HTML401 : $flags, $encoding);
}
}
How to use
echo "\nGet string from numeric DEC value\n";
var_dump(mb_chr(25105));
var_dump(mb_chr(22909));
echo "\nGet string from numeric HEX value\n";
var_dump(mb_chr(0x6211));
var_dump(mb_chr(0x597D));
echo "\nGet numeric value of character as DEC int\n";
var_dump(mb_ord('我'));
var_dump(mb_ord('好'));
echo "\nGet numeric value of character as HEX string\n";
var_dump(dechex(mb_ord('我')));
var_dump(dechex(mb_ord('好')));
echo "\nEncode / decode to DEC based HTML entities\n";
var_dump(mb_htmlentities('我好', false));
var_dump(mb_html_entity_decode('我好'));
echo "\nEncode / decode to HEX based HTML entities\n";
var_dump(mb_htmlentities('我好'));
var_dump(mb_html_entity_decode('我好'));
echo "\nUse JSON encoding / decoding\n";
var_dump(codepoint_encode("我好"));
var_dump(codepoint_decode('\u6211\u597d'));
Output
Get string from numeric DEC value
string(3) "我"
string(3) "好"
Get string from numeric HEX value
string(3) "我"
string(3) "好"
Get numeric value of character as DEC string
int(25105)
int(22909)
Get numeric value of character as HEX string
string(4) "6211"
string(4) "597d"
Encode / decode to DEC based HTML entities
string(16) "我好"
string(6) "我好"
Encode / decode to HEX based HTML entities
string(16) "我好"
string(6) "我好"
Use JSON encoding / decoding
string(12) "\u6211\u597d"
string(6) "我好"
mb_convert_encoding(
preg_replace("/U\+([0-9A-F]*)/"
,"&#x\\1;"
,'U+597DU+6211'
)
,"UTF-8"
,"HTML-ENTITIES"
);
works fine, too.
<?php
function chr_utf8($n,$f='C*'){
return $n<(1<<7)?chr($n):($n<1<<11?pack($f,192|$n>>6,1<<7|191&$n):
($n<(1<<16)?pack($f,224|$n>>12,1<<7|63&$n>>6,1<<7|63&$n):
($n<(1<<20|1<<16)?pack($f,240|$n>>18,1<<7|63&$n>>12,1<<7|63&$n>>6,1<<7|63&$n):'')));
}
$your_input='U+597D';
echo (chr_utf8(hexdec(ltrim($your_input,'U+'))));
// Output 好
If you want to use a callback function you can try it :
<?php
// Note: function chr_utf8 shown above is required
$your_input='U+597DU+6211';
$result=preg_replace_callback('#U\+([a-f0-9]+)#i',function($a){return chr_utf8(hexdec($a[1]));},$your_input);
echo $result;
// Output 好我
Check it in https://eval.in/748187
I was in the position I needed to filter specific characters without affecting the html because I was using a wysiwig editor, but people copy pasting from word would add some nice unrenderable characters to the content.
My solution boils down to simple replacement lists.
class ReplaceIllegal {
public static $find = array ( 0 => '\x0', 1 => '\x1', 2 => '\x2', 3 => '\x3', 4 => '\x4', 5 => '\x5', 6 => '\x6', 7 => '\x7', 8 => '\x8', 9 => '\x9', 10 => '\xA', 11 => '\xB', 12 => '\xC', 13 => '\xD', 14 => '\xE', 15 => '\xF', 16 => '\x10', 17 => '\x11', 18 => '\x12', 19 => '\x13', 20 => '\x14', 21 => '\x15', 22 => '\x16', 23 => '\x17', 24 => '\x18', 25 => '\x19', 26 => '\x1A', 27 => '\x1B', 28 => '\x1C', 29 => '\x1D', 30 => '\x1E', 31 => '\x80', 32 => '\x81', 33 => '\x82', 34 => '\x83', 35 => '\x84', 36 => '\x85', 37 => '\x86', 38 => '\x87', 39 => '\x88', 40 => '\x89', 41 => '\x8A', 42 => '\x8B', 43 => '\x8C', 44 => '\x8D', 45 => '\x8E', 46 => '\x8F', 47 => '\x90', 48 => '\x91', 49 => '\x92', 50 => '\x93', 51 => '\x94', 52 => '\x95', 53 => '\x96', 54 => '\x97', 55 => '\x98', 56 => '\x99', 57 => '\x9A', 58 => '\x9B', 59 => '\x9C', 60 => '\x9D', 61 => '\x9E', 62 => '\x9F', 63 => '\xA0', 64 => '\xA1', 65 => '\xA2', 66 => '\xA3', 67 => '\xA4', 68 => '\xA5', 69 => '\xA6', 70 => '\xA7', 71 => '\xA8', 72 => '\xA9', 73 => '\xAA', 74 => '\xAB', 75 => '\xAC', 76 => '\xAD', 77 => '\xAE', 78 => '\xAF', 79 => '\xB0', 80 => '\xB1', 81 => '\xB2', 82 => '\xB3', 83 => '\xB4', 84 => '\xB5', 85 => '\xB6', 86 => '\xB7', 87 => '\xB8', 88 => '\xB9', 89 => '\xBA', 90 => '\xBB', 91 => '\xBC', 92 => '\xBD', 93 => '\xBE', 94 => '\xBF', 95 => '\xC0', 96 => '\xC1', 97 => '\xC2', 98 => '\xC3', 99 => '\xC4', 100 => '\xC5', 101 => '\xC6', 102 => '\xC7', 103 => '\xC8', 104 => '\xC9', 105 => '\xCA', 106 => '\xCB', 107 => '\xCC', 108 => '\xCD', 109 => '\xCE', 110 => '\xCF', 111 => '\xD0', 112 => '\xD1', 113 => '\xD2', 114 => '\xD3', 115 => '\xD4', 116 => '\xD5', 117 => '\xD6', 118 => '\xD7', 119 => '\xD8', 120 => '\xD9', 121 => '\xDA', 122 => '\xDB', 123 => '\xDC', 124 => '\xDD', 125 => '\xDE', 126 => '\xDF', 127 => '\xE0', 128 => '\xE1', 129 => '\xE2', 130 => '\xE3', 131 => '\xE4', 132 => '\xE5', 133 => '\xE6', 134 => '\xE7', 135 => '\xE8', 136 => '\xE9', 137 => '\xEA', 138 => '\xEB', 139 => '\xEC', 140 => '\xED', 141 => '\xEE', 142 => '\xEF', 143 => '\xF0', 144 => '\xF1', 145 => '\xF2', 146 => '\xF3', 147 => '\xF4', 148 => '\xF5', 149 => '\xF6', 150 => '\xF7', 151 => '\xF8', 152 => '\xF9', 153 => '\xFA', 154 => '\xFB', 155 => '\xFC', 156 => '\xFD', 157 => '\xFE', );
private static $replace = array ( 0 => '', 1 => '', 2 => '', 3 => '', 4 => '', 5 => '', 6 => '', 7 => '', 8 => '', 9 => ' ', 10 => '
', 11 => '', 12 => '', 13 => '
', 14 => '', 15 => '', 16 => '', 17 => '', 18 => '', 19 => '', 20 => '', 21 => '', 22 => '', 23 => '', 24 => '', 25 => '', 26 => '', 27 => '', 28 => '', 29 => '', 30 => '', 31 => '', 32 => '', 33 => '', 34 => '', 35 => '', 36 => '
', 37 => '', 38 => '', 39 => '', 40 => '', 41 => '', 42 => '', 43 => '', 44 => '', 45 => '', 46 => '', 47 => '', 48 => '', 49 => '', 50 => '', 51 => '', 52 => '', 53 => '', 54 => '', 55 => '', 56 => '', 57 => '', 58 => '', 59 => '', 60 => '', 61 => '', 62 => '', 63 => ' ', 64 => '¡', 65 => '¢', 66 => '£', 67 => '¤', 68 => '¥', 69 => '¦', 70 => '§', 71 => '¨', 72 => '©', 73 => 'ª', 74 => '«', 75 => '¬', 76 => '', 77 => '®', 78 => '¯', 79 => '°', 80 => '±', 81 => '²', 82 => '³', 83 => '´', 84 => 'µ', 85 => '¶', 86 => '·', 87 => '¸', 88 => '¹', 89 => 'º', 90 => '»', 91 => '¼', 92 => '½', 93 => '¾', 94 => '¿', 95 => 'À', 96 => 'Á', 97 => 'Â', 98 => 'Ã', 99 => 'Ä', 100 => 'Å', 101 => 'Æ', 102 => 'Ç', 103 => 'È', 104 => 'É', 105 => 'Ê', 106 => 'Ë', 107 => 'Ì', 108 => 'Í', 109 => 'Î', 110 => 'Ï', 111 => 'Ð', 112 => 'Ñ', 113 => 'Ò', 114 => 'Ó', 115 => 'Ô', 116 => 'Õ', 117 => 'Ö', 118 => '×', 119 => 'Ø', 120 => 'Ù', 121 => 'Ú', 122 => 'Û', 123 => 'Ü', 124 => 'Ý', 125 => 'Þ', 126 => 'ß', 127 => 'à', 128 => 'á', 129 => 'â', 130 => 'ã', 131 => 'ä', 132 => 'å', 133 => 'æ', 134 => 'ç', 135 => 'è', 136 => 'é', 137 => 'ê', 138 => 'ë', 139 => 'ì', 140 => 'í', 141 => 'î', 142 => 'ï', 143 => 'ð', 144 => 'ñ', 145 => 'ò', 146 => 'ó', 147 => 'ô', 148 => 'õ', 149 => 'ö', 150 => '÷', 151 => 'ø', 152 => 'ù', 153 => 'ú', 154 => 'û', 155 => 'ü', 156 => 'ý', 157 => 'þ', );
/*
* replace illegal characters for escaped html character but don't touch anything else.
*/
public static function getSaveValue($value) {
return str_replace(self::$find, self::$replace, $value);
}
public static function makeIllegal($find,$replace) {
self::$find[] = $find;
self::$replace[] = $replace;
}
}
This worked fine for me. If you have a string "Letters u00e1 u00e9 etc." replace by "Letters á é".
function unicode2html($str){
// Set the locale to something that's UTF-8 capable
setlocale(LC_ALL, 'en_US.UTF-8');
// Convert the codepoints to entities
$str = preg_replace("/u([0-9a-fA-F]{4})/", "&#x\\1;", $str);
// Convert the entities to a UTF-8 string
return iconv("UTF-8", "ISO-8859-1//TRANSLIT", $str);
}
With the aid of the following table:
http://en.wikipedia.org/wiki/UTF-8#Description
can't be simpler :)
Simply mask the unicode numbers according to which range they fit in.