Converting javascript code to PHP charCodeAt() issue - php

I have a javascript function which I wants in PHP, Here is my JavaScript function:
<script type="text/javascript">
str = '242357de5b105346ea2059795682443';
str_overral = str;
str_overral = str_overral.replace(/[^a-z0-9]/gi, '').toLowerCase();
str_res='';
for (i=0; i<str_overral.length; i++) {
l=str_overral.substr(i,1);
d=l.charCodeAt(0);
if ( Math.floor(d/2) == d/2 ) {
str_res+=l;
} else {
str_res=l+str_res;
}
}
document.write('<in');
document.write('put type="hidden" name="myInput" value="'+str_res+'" />');
</script>
and above JavaScript function generates this string for myInput: 359795ae3515e753242db0462068244
And this I tried with PHP:
$str = '242357de5b105346ea2059795682443';
$str_overral = preg_replace('/[^a-z0-9]/i', '',$str);
$str_overral = strtolower($str_overral);
$str_res='';
for ($i=0; $i<strlen($str_overral); $i++) {
$l= substr($str_overral,$i,1);
// PHP does not have charCodeAt() function so i used uniord()
$d = uniord($l);
if((floor($d)/2) == ($d/2))
$str_res.=$l;
else
$str_res.= $l.$str_res;
}
echo $str_res;
function uniord($c) {
$h = ord($c{0});
if ($h <= 0x7F) {
return $h;
} else if ($h < 0xC2) {
return false;
} else if ($h <= 0xDF) {
return ($h & 0x1F) << 6 | (ord($c{1}) & 0x3F);
} else if ($h <= 0xEF) {
return ($h & 0x0F) << 12 | (ord($c{1}) & 0x3F) << 6
| (ord($c{2}) & 0x3F);
} else if ($h <= 0xF4) {
return ($h & 0x0F) << 18 | (ord($c{1}) & 0x3F) << 12
| (ord($c{2}) & 0x3F) << 6
| (ord($c{3}) & 0x3F);
} else {
return false;
}
}
and above PHP code generates this string: 242357de5b105346ea2059795682443
so basically PHP just return $string as is.
As PHP does not have charCodeAt() function I found a solution here UTF-8 Safe Equivelant of ord or charCodeAt() in PHP , But that does not work for me, I even tried solution posted by 'hakre' in same thread.
Thank you for any kind of help.
UPDATE SOLUTION:
Here was fix:
if($d%2 == 0)
$str_res.=$l;
else
$str_res = $l.$str_res;

if((floor($d)/2) == ($d/2))
You have a ) in the wrong place. It should be after the first /2, not before it.
It could be made more efficient with if($d%2 == 0)

Related

converting emoticon unicode which comes from mobile application for web application

I was using emojione to convert emoticons but there is problem.
When someone upload emoticon from mobile then something like, \ud83d\ude0c\ud83d\ude0c\ud83d\ude0c this unicode.
emojione doesn't convert this type of code.
Can anybody help me to convert this code or suggest me to use any other package
I have done # last.
$str = '\ud83d\ude0c\ud83d\ude0c\ud83d\ude0c';
$regex = '/\\\u([dD][89abAB][\da-fA-F]{2})\\\u([dD][c-fC-F][\da-fA-F]{2})
|\\\u([\da-fA-F]{4})/sx';
echo preg_replace_callback($regex, function($matches) {
if (isset($matches[3])) {
$cp = hexdec($matches[3]);
} else {
$lead = hexdec($matches[1]);
$trail = hexdec($matches[2]);
// http://unicode.org/faq/utf_bom.html#utf16-4
$cp = ($lead << 10) + $trail + 0x10000 - (0xD800 << 10) - 0xDC00;
}
// https://tools.ietf.org/html/rfc3629#section-3
// Characters between U+D800 and U+DFFF are not allowed in UTF-8
if ($cp > 0xD7FF && 0xE000 > $cp) {
$cp = 0xFFFD;
}
// https://github.com/php/php-src/blob/php-5.6.4/ext/standard/html.c#L471
// php_utf32_utf8(unsigned char *buf, unsigned k)
if ($cp < 0x80) {
return chr($cp);
} else if ($cp < 0xA0) {
return chr(0xC0 | $cp >> 6) . chr(0x80 | $cp & 0x3F);
}
return html_entity_decode('&#' . $cp . ';');
}, $str);
output will be:
๐Ÿ˜Œ๐Ÿ˜Œ๐Ÿ˜Œ

javascript translation, encode in php decode in javascript

Hi I'm trying to encode a string in PHP then output in javascript because otherwise Javascript gives me error.
I tried this http://pastebin.com/7RmjDcJY:
<?php
echo "<script type=\"text/javascript\">
function lang(translate){
var translations = new Array();";
foreach($languageTranslations AS $key=>$value):
echo "translations[\"".$key."\"] = \"". base64_encode($value)."\";";
endforeach;
echo "return (typeof translations[translate]!='undefined') ? $.base64.decode(translations[translate]) : translate;
}
</script>";?>
But fรถrnamn -> Fรƒยถrnamn
Any idea? I've utf-8 in my document, but I guess bas64 can't convert utf-8 properly?
I can't just output the value, because some contains characters that breaks my javascript.
Or should I do a replace first? Like replace " with \" and stuff?
EDIT:
Now I got:
//urlencode in PHP and:
decodeURIComponent(translations[translate]).replace('+',' ')
But that doesn't feel like the right thing. What if I had an + sign in my text?
for send data from php to JavaScript you can base64 data and if you have more than one data to send use json_encode for send multi data,then in javaScript use json decode for get value and decode base64 by this code
var Base64 = {
// private property
_keyStr : "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=",
// public method for encoding
encode : function (input) {
var output = "";
var chr1, chr2, chr3, enc1, enc2, enc3, enc4;
var i = 0;
input = Base64._utf8_encode(input);
while (i < input.length) {
chr1 = input.charCodeAt(i++);
chr2 = input.charCodeAt(i++);
chr3 = input.charCodeAt(i++);
enc1 = chr1 >> 2;
enc2 = ((chr1 & 3) << 4) | (chr2 >> 4);
enc3 = ((chr2 & 15) << 2) | (chr3 >> 6);
enc4 = chr3 & 63;
if (isNaN(chr2)) {
enc3 = enc4 = 64;
} else if (isNaN(chr3)) {
enc4 = 64;
}
output = output +
this._keyStr.charAt(enc1) + this._keyStr.charAt(enc2) +
this._keyStr.charAt(enc3) + this._keyStr.charAt(enc4);
}
return output;
},
// public method for decoding
decode : function (input) {
var output = "";
var chr1, chr2, chr3;
var enc1, enc2, enc3, enc4;
var i = 0;
input = input.replace(/[^A-Za-z0-9\+\/\=]/g, "");
while (i < input.length) {
enc1 = this._keyStr.indexOf(input.charAt(i++));
enc2 = this._keyStr.indexOf(input.charAt(i++));
enc3 = this._keyStr.indexOf(input.charAt(i++));
enc4 = this._keyStr.indexOf(input.charAt(i++));
chr1 = (enc1 << 2) | (enc2 >> 4);
chr2 = ((enc2 & 15) << 4) | (enc3 >> 2);
chr3 = ((enc3 & 3) << 6) | enc4;
output = output + String.fromCharCode(chr1);
if (enc3 != 64) {
output = output + String.fromCharCode(chr2);
}
if (enc4 != 64) {
output = output + String.fromCharCode(chr3);
}
}
output = Base64._utf8_decode(output);
return output;
},
// private method for UTF-8 encoding
_utf8_encode : function (string) {
string = string.replace(/\r\n/g,"\n");
var utftext = "";
for (var n = 0; n < string.length; n++) {
var c = string.charCodeAt(n);
if (c < 128) {
utftext += String.fromCharCode(c);
}
else if((c > 127) && (c < 2048)) {
utftext += String.fromCharCode((c >> 6) | 192);
utftext += String.fromCharCode((c & 63) | 128);
}
else {
utftext += String.fromCharCode((c >> 12) | 224);
utftext += String.fromCharCode(((c >> 6) & 63) | 128);
utftext += String.fromCharCode((c & 63) | 128);
}
}
return utftext;
},
// private method for UTF-8 decoding
_utf8_decode : function (utftext) {
var string = "";
var i = 0;
var c = c1 = c2 = 0;
while ( i < utftext.length ) {
c = utftext.charCodeAt(i);
if (c < 128) {
string += String.fromCharCode(c);
i++;
}
else if((c > 191) && (c < 224)) {
c2 = utftext.charCodeAt(i+1);
string += String.fromCharCode(((c & 31) << 6) | (c2 & 63));
i += 2;
}
else {
c2 = utftext.charCodeAt(i+1);
c3 = utftext.charCodeAt(i+2);
string += String.fromCharCode(((c & 15) << 12) | ((c2 & 63) << 6) | (c3 & 63));
i += 3;
}
}
return string;
}
}
sample:
alert(Base64._utf8_decode(2LTYs9uM2YTYqNi02LPbjNio2YLYq9in24zYqNmE2LDYs9uM2KjYsQ==));
this code compatible with utf-8

PHP Explode and adding data in mysql database

My problem is: I want to add multi rows in database (every website with its pagerank on different rows not all in one rows as in the first picture)!
I don't know how to do that, i tried with explode but works only at the printing of the table from browser but now for the links that goes in database! Please help me!
In my database:
In my browser:
My entire code:
<?php
error_reporting(E_ALL & ~E_NOTICE);
function StrToNum($Str, $Check, $Magic)
{
$Int32Unit = 4294967296; // 2^32
$length = strlen($Str);
for ($i = 0; $i < $length; $i++) {
$Check *= $Magic;
if ($Check >= $Int32Unit) {
$Check = ($Check - $Int32Unit * (int) ($Check / $Int32Unit));
$Check = ($Check < -2147483648) ? ($Check + $Int32Unit) : $Check;
}
$Check += ord($Str{$i});
}
return $Check;
}
function CheckHash($Hashnum)
{
$CheckByte = 0;
$Flag = 0;
$HashStr = sprintf('%u', $Hashnum) ;
$length = strlen($HashStr);
for ($i = $length - 1; $i >= 0; $i --) {
$Re = $HashStr{$i};
if (1 === ($Flag % 2)) {
$Re += $Re;
$Re = (int)($Re / 10) + ($Re % 10);
}
$CheckByte += $Re;
$Flag ++;
}
$CheckByte %= 10;
if (0 !== $CheckByte) {
$CheckByte = 10 - $CheckByte;
if (1 === ($Flag % 2) ) {
if (1 === ($CheckByte % 2)) {
$CheckByte += 9;
}
$CheckByte >>= 1;
}
}
return '7'.$CheckByte.$HashStr;
}
function HashURL($String)
{
$Check1 = StrToNum($String, 0x1505, 0x21);
$Check2 = StrToNum($String, 0, 0x1003F);
$Check1 >>= 2;
$Check1 = (($Check1 >> 4) & 0x3FFFFC0 ) | ($Check1 & 0x3F);
$Check1 = (($Check1 >> 4) & 0x3FFC00 ) | ($Check1 & 0x3FF);
$Check1 = (($Check1 >> 4) & 0x3C000 ) | ($Check1 & 0x3FFF);
$T1 = (((($Check1 & 0x3C0) << 4) | ($Check1 & 0x3C)) <<2 ) | ($Check2 & 0xF0F );
$T2 = (((($Check1 & 0xFFFFC000) << 4) | ($Check1 & 0x3C00)) << 0xA) | ($Check2 & 0xF0F0000 );
return ($T1 | $T2);
}
function getpagerank($url) {
$query="http://toolbarqueries.google.com/tbr?client=navclient-auto&ch=".CheckHash(HashURL($url)). "&features=Rank&q=info:".$url;
set_time_limit(0);
$data=file_get_contents($query);
$pos = strpos($data, "Rank_");
if($pos === false){} else{
$pagerank = substr($data, $pos + 9);
return $pagerank;
}
}
if($_POST['urls'])
{
?><table border="1">
<th>URL</th>
<th>Pagerank</th>
<?
$urls=trim($_POST['urls']);
$url=explode("\n",$urls);
foreach($url as $url)
{
if($url)
{
$url=trim($url);
$pagerank=getpagerank($url);
?>
<tr><td><?php echo $url; ?></td><td><?php echo $pagerank; ?></td></tr>
<?
//mysql_query("INSERT INTO projects2 (googlePR, Link)
//VALUES ('".$pagerank."','".$urls."') ") or die(mysql_error());
flush();
}
}
?></table><?
}
else
{
?><form action="" method="post">
URLS:<br /><textarea name="urls" cols="50" rows="10">Introduceti lista de linkuri aici</textarea><br /><input type="submit" value="Check PR & insert values"/>
</form>
<?
}
?>
<?php
$urls=trim($_POST['urls']);
$url=explode("\n",$urls);
foreach($url as $url) {
if($url)
{
$url=trim($url);
$pagerank=getpagerank($url);
mysql_query("INSERT INTO projects2 (googlePR, Link)
VALUES ('".$pagerank."','".$urls."') ") or die(mysql_error());
}
}
?>
The problem is in your foreach statement yo uare writing:
foreach($url as $url) {
That will overwrite your $url array with the first value of the array since the variable names are the same. Then at the end of your loop when it tries to iterate to the next row, it is iterating against a non-array object. Just maybe change the name of the array create from teh explode to $url_array or something similar and do
foreach($url_array as $url) {
Your answer was half true because i modified the
I replaced $urls with $url in:
mysql_query("INSERT INTO projects2 (googlePR, Link)
VALUES ('".$pagerank."','".$url."') ")
now everything goes fine

PHP: how to get unicode character code [duplicate]

I want to get the UCS-2 code points for a given UTF-8 string. For example the word "hello" should become something like "0068 0065 006C 006C 006F". Please note that the characters could be from any language including complex scripts like the east asian languages.
So, the problem comes down to "convert a given character to its UCS-2 code point"
But how? Please, any kind of help will be very very much appreciated since I am in a great hurry.
Transcription of questioner's response posted as an answer
Thanks for your reply, but it needs to be done in PHP v 4 or 5 but not 6.
The string will be a user input, from a form field.
I want to implement a PHP version of utf8to16 or utf8decode like
function get_ucs2_codepoint($char)
{
// calculation of ucs2 codepoint value and assign it to $hex_codepoint
return $hex_codepoint;
}
Can you help me with PHP or can it be done with PHP with version mentioned above?
Use an existing utility such as iconv, or whatever libraries come with the language you're using.
If you insist on rolling your own solution, read up on the UTF-8 format. Basically, each code point is stored as 1-4 bytes, depending on the value of the code point. The ranges are as follows:
U+0000 โ€” U+007F: 1 byte: 0xxxxxxx
U+0080 โ€” U+07FF: 2 bytes: 110xxxxx 10xxxxxx
U+0800 โ€” U+FFFF: 3 bytes: 1110xxxx 10xxxxxx 10xxxxxx
U+10000 โ€” U+10FFFF: 4 bytes: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
Where each x is a data bit. Thus, you can tell how many bytes compose each code point by looking at the first byte: if it begins with a 0, it's a 1-byte character. If it begins with 110, it's a 2-byte character. If it begins with 1110, it's a 3-byte character. If it begins with 11110, it's a 4-byte character. If it begins with 10, it's a non-initial byte of a multibyte character. If it begins with 11111, it's an invalid character.
Once you figure out how many bytes are in the character, it's just a matter if bit twiddling. Also note that UCS-2 cannot represent characters above U+FFFF.
Since you didn't specify a language, here's some sample C code (error checking omitted):
wchar_t utf8_char_to_ucs2(const unsigned char *utf8)
{
if(!(utf8[0] & 0x80)) // 0xxxxxxx
return (wchar_t)utf8[0];
else if((utf8[0] & 0xE0) == 0xC0) // 110xxxxx
return (wchar_t)(((utf8[0] & 0x1F) << 6) | (utf8[1] & 0x3F));
else if((utf8[0] & 0xF0) == 0xE0) // 1110xxxx
return (wchar_t)(((utf8[0] & 0x0F) << 12) | ((utf8[1] & 0x3F) << 6) | (utf8[2] & 0x3F));
else
return ERROR; // uh-oh, UCS-2 can't handle code points this high
}
Scott Reynen wrote a function to convert UTF-8 into Unicode. I found it looking at the PHP documentation.
function utf8_to_unicode( $str ) {
$unicode = array();
$values = array();
$lookingFor = 1;
for ($i = 0; $i < strlen( $str ); $i++ ) {
$thisValue = ord( $str[ $i ] );
if ( $thisValue < ord('A') ) {
// exclude 0-9
if ($thisValue >= ord('0') && $thisValue <= ord('9')) {
// number
$unicode[] = chr($thisValue);
}
else {
$unicode[] = '%'.dechex($thisValue);
}
} else {
if ( $thisValue < 128)
$unicode[] = $str[ $i ];
else {
if ( count( $values ) == 0 ) $lookingFor = ( $thisValue < 224 ) ? 2 : 3;
$values[] = $thisValue;
if ( count( $values ) == $lookingFor ) {
$number = ( $lookingFor == 3 ) ?
( ( $values[0] % 16 ) * 4096 ) + ( ( $values[1] % 64 ) * 64 ) + ( $values[2] % 64 ):
( ( $values[0] % 32 ) * 64 ) + ( $values[1] % 64 );
$number = dechex($number);
$unicode[] = (strlen($number)==3)?"%u0".$number:"%u".$number;
$values = array();
$lookingFor = 1;
} // if
} // if
}
} // for
return implode("",$unicode);
} // utf8_to_unicode
PHP code (which assumes valid utf-8, no check for non-valid utf-8):
function ord_utf8($c) {
$b0 = ord($c[0]);
if ( $b0 < 0x10 ) {
return $b0;
}
$b1 = ord($c[1]);
if ( $b0 < 0xE0 ) {
return (($b0 & 0x1F) << 6) + ($b1 & 0x3F);
}
return (($b0 & 0x0F) << 12) + (($b1 & 0x3F) << 6) + (ord($c[2]) & 0x3F);
}
I'm amused because I just gave this problem to students on a final exam. Here's a sketch of UTF-8:
hex binary UTF-8 binary
0000-007F 00000000 0abcdefg => 0abcdefg
0080-07FF 00000abc defghijk => 110abcde 10fghijk
0800-FFFF abcdefgh ijklmnop => 1110abcd 10efghij 10klmnop
And here's some C99 code:
static void check(char c) {
if ((c & 0xc0) != 0xc0) RAISE(Bad_UTF8);
}
uint16_t Utf8_decode(char **p) { // return code point and advance *p
char *s = *p;
if ((s[0] & 0x80) == 0) {
(*p)++;
return s[0];
} else if ((s[0] & 0x40) == 0) {
RAISE (Bad_UTF8);
return ~0; // prevent compiler warning
} else if ((s[0] & 0x20) == 0) {
if ((s[0] & 0xf0) != 0xe0) RAISE (Bad_UTF8);
check(s[1]); check(s[2]);
(*p) += 3;
return ((s[0] & 0x0f) << 12)
+ ((s[1] & 0x3f) << 6)
+ ((s[2] & 0x3f));
} else {
check(s[1]);
(*p) += 2;
return ((s[0] & 0x1f) << 6)
+ ((s[1] & 0x3f));
}
}
Use mb_ord() in php >= 7.2.
Or this function:
function ord_utf8($c) {
$len = strlen($c);
$code = ord($c);
if($len > 1) {
$code &= 0x7F >> $len;
for($i = 1; $i < $len; $i++) {
$code <<= 6;
$code += ord($c[$i]) & 0x3F;
}
}
return $code;
}
$c is a character.
If you need convert string to character array.You can use this.
$string = 'abcde';
$string = preg_split('//u', $string, -1, PREG_SPLIT_NO_EMPTY);

How to get code point number for a given character in a utf-8 string?

I want to get the UCS-2 code points for a given UTF-8 string. For example the word "hello" should become something like "0068 0065 006C 006C 006F". Please note that the characters could be from any language including complex scripts like the east asian languages.
So, the problem comes down to "convert a given character to its UCS-2 code point"
But how? Please, any kind of help will be very very much appreciated since I am in a great hurry.
Transcription of questioner's response posted as an answer
Thanks for your reply, but it needs to be done in PHP v 4 or 5 but not 6.
The string will be a user input, from a form field.
I want to implement a PHP version of utf8to16 or utf8decode like
function get_ucs2_codepoint($char)
{
// calculation of ucs2 codepoint value and assign it to $hex_codepoint
return $hex_codepoint;
}
Can you help me with PHP or can it be done with PHP with version mentioned above?
Use an existing utility such as iconv, or whatever libraries come with the language you're using.
If you insist on rolling your own solution, read up on the UTF-8 format. Basically, each code point is stored as 1-4 bytes, depending on the value of the code point. The ranges are as follows:
U+0000 โ€” U+007F: 1 byte: 0xxxxxxx
U+0080 โ€” U+07FF: 2 bytes: 110xxxxx 10xxxxxx
U+0800 โ€” U+FFFF: 3 bytes: 1110xxxx 10xxxxxx 10xxxxxx
U+10000 โ€” U+10FFFF: 4 bytes: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
Where each x is a data bit. Thus, you can tell how many bytes compose each code point by looking at the first byte: if it begins with a 0, it's a 1-byte character. If it begins with 110, it's a 2-byte character. If it begins with 1110, it's a 3-byte character. If it begins with 11110, it's a 4-byte character. If it begins with 10, it's a non-initial byte of a multibyte character. If it begins with 11111, it's an invalid character.
Once you figure out how many bytes are in the character, it's just a matter if bit twiddling. Also note that UCS-2 cannot represent characters above U+FFFF.
Since you didn't specify a language, here's some sample C code (error checking omitted):
wchar_t utf8_char_to_ucs2(const unsigned char *utf8)
{
if(!(utf8[0] & 0x80)) // 0xxxxxxx
return (wchar_t)utf8[0];
else if((utf8[0] & 0xE0) == 0xC0) // 110xxxxx
return (wchar_t)(((utf8[0] & 0x1F) << 6) | (utf8[1] & 0x3F));
else if((utf8[0] & 0xF0) == 0xE0) // 1110xxxx
return (wchar_t)(((utf8[0] & 0x0F) << 12) | ((utf8[1] & 0x3F) << 6) | (utf8[2] & 0x3F));
else
return ERROR; // uh-oh, UCS-2 can't handle code points this high
}
Scott Reynen wrote a function to convert UTF-8 into Unicode. I found it looking at the PHP documentation.
function utf8_to_unicode( $str ) {
$unicode = array();
$values = array();
$lookingFor = 1;
for ($i = 0; $i < strlen( $str ); $i++ ) {
$thisValue = ord( $str[ $i ] );
if ( $thisValue < ord('A') ) {
// exclude 0-9
if ($thisValue >= ord('0') && $thisValue <= ord('9')) {
// number
$unicode[] = chr($thisValue);
}
else {
$unicode[] = '%'.dechex($thisValue);
}
} else {
if ( $thisValue < 128)
$unicode[] = $str[ $i ];
else {
if ( count( $values ) == 0 ) $lookingFor = ( $thisValue < 224 ) ? 2 : 3;
$values[] = $thisValue;
if ( count( $values ) == $lookingFor ) {
$number = ( $lookingFor == 3 ) ?
( ( $values[0] % 16 ) * 4096 ) + ( ( $values[1] % 64 ) * 64 ) + ( $values[2] % 64 ):
( ( $values[0] % 32 ) * 64 ) + ( $values[1] % 64 );
$number = dechex($number);
$unicode[] = (strlen($number)==3)?"%u0".$number:"%u".$number;
$values = array();
$lookingFor = 1;
} // if
} // if
}
} // for
return implode("",$unicode);
} // utf8_to_unicode
PHP code (which assumes valid utf-8, no check for non-valid utf-8):
function ord_utf8($c) {
$b0 = ord($c[0]);
if ( $b0 < 0x10 ) {
return $b0;
}
$b1 = ord($c[1]);
if ( $b0 < 0xE0 ) {
return (($b0 & 0x1F) << 6) + ($b1 & 0x3F);
}
return (($b0 & 0x0F) << 12) + (($b1 & 0x3F) << 6) + (ord($c[2]) & 0x3F);
}
I'm amused because I just gave this problem to students on a final exam. Here's a sketch of UTF-8:
hex binary UTF-8 binary
0000-007F 00000000 0abcdefg => 0abcdefg
0080-07FF 00000abc defghijk => 110abcde 10fghijk
0800-FFFF abcdefgh ijklmnop => 1110abcd 10efghij 10klmnop
And here's some C99 code:
static void check(char c) {
if ((c & 0xc0) != 0xc0) RAISE(Bad_UTF8);
}
uint16_t Utf8_decode(char **p) { // return code point and advance *p
char *s = *p;
if ((s[0] & 0x80) == 0) {
(*p)++;
return s[0];
} else if ((s[0] & 0x40) == 0) {
RAISE (Bad_UTF8);
return ~0; // prevent compiler warning
} else if ((s[0] & 0x20) == 0) {
if ((s[0] & 0xf0) != 0xe0) RAISE (Bad_UTF8);
check(s[1]); check(s[2]);
(*p) += 3;
return ((s[0] & 0x0f) << 12)
+ ((s[1] & 0x3f) << 6)
+ ((s[2] & 0x3f));
} else {
check(s[1]);
(*p) += 2;
return ((s[0] & 0x1f) << 6)
+ ((s[1] & 0x3f));
}
}
Use mb_ord() in php >= 7.2.
Or this function:
function ord_utf8($c) {
$len = strlen($c);
$code = ord($c);
if($len > 1) {
$code &= 0x7F >> $len;
for($i = 1; $i < $len; $i++) {
$code <<= 6;
$code += ord($c[$i]) & 0x3F;
}
}
return $code;
}
$c is a character.
If you need convert string to character array.You can use this.
$string = 'abcde';
$string = preg_split('//u', $string, -1, PREG_SPLIT_NO_EMPTY);

Categories