I'm reverse engineering a javascript script to PHP and there are many functions that convert:
6 array bytes of [224, 221, 199, 147, 195, 47]
and output that to 1632933900000 (which is timestamp, Int64)
Can you help me how to pack/unpack those bytes as above to the final integer using PHP ?
Samples:
[224, 221, 199, 147, 195, 47] gets to 1632933900000
[224, 143, 228, 137, 198, 47]
gets to 1633718700000
ADDITIONAL INFO:
The javascript code is very long (it was obfuscated). I just know for certain that it is placed within 6 bytes.
NEXT UPDATE...
It goes to this function (this.buf is array-bytes):
function u() {
var e = new a(0, 0),
t = 0;
if (!(this.len - this.pos > 4)) {
for (; t < 3; ++t) {
if (this.pos >= this.len) throw s(this);
if (e.lo = (e.lo | (127 & this.buf[this.pos]) << 7 * t) >>> 0, this.buf[this.pos++] < 128) return e
}
return e.lo = (e.lo | (127 & this.buf[this.pos++]) << 7 * t) >>> 0, e
}
for (; t < 4; ++t)
if (e.lo = (e.lo | (127 & this.buf[this.pos]) << 7 * t) >>> 0, this.buf[this.pos++] < 128) return e;
if (e.lo = (e.lo | (127 & this.buf[this.pos]) << 28) >>> 0, e.hi = (e.hi | (127 & this.buf[this.pos]) >> 4) >>> 0, this.buf[this.pos++] < 128) return e;
if (t = 0, this.len - this.pos > 4) {
for (; t < 5; ++t)
if (e.hi = (e.hi | (127 & this.buf[this.pos]) << 7 * t + 3) >>> 0, this.buf[this.pos++] < 128) return e
} else
for (; t < 5; ++t) {
if (this.pos >= this.len) throw s(this);
if (e.hi = (e.hi | (127 & this.buf[this.pos]) << 7 * t + 3) >>> 0, this.buf[this.pos++] < 128) return e
}
throw Error("invalid varint encoding")
}
And then it has [hi and low] numbers and this gets the timestamp:
o.prototype.toNumber = function(e) {
if (!e && this.hi >>> 31) {
var t = 1 + ~this.lo >>> 0,
n = ~this.hi >>> 0;
return t || (n = n + 1 >>> 0), -(t + 4294967296 * n)
}
return this.lo + 4294967296 * this.hi
}
224 221 199 147 195 47 is the decimal representation of E0 DD C7 93 C3 2F ... which is 247243140743983 in decimal. Maybe not the whole dword is the timestamp ...for comparision: 01 7C 32 71 EE E0 or 61 54 98 0C. One can already notice by the first digit, how far off this approach is.
That number in JS might be of type BigInt:
BigInt("0x017C3271EEE0")
But BigInt("0xE0DDC793C32F") still gives 247243140743983.
Just to complete this answer I used following stuff, but I have absolutely no clue what it does :D
<?php
function int64_helper($obj)
{
$e = (object) ['lo' => 0, 'hi' => 0];
if (!($obj->len - $obj->pos > 4)) {
for ($i = 0; $i < 3; $i++) {
if ($obj->pos >= $obj->len) throw new Exception('ERROR RANGE');
$e->lo = rrr($e->lo | ((127 & $obj->buf[$obj->pos]) << (7 * $i)), 0);
if ($obj->buf[$obj->pos++] < 128) return $e;
}
$e->lo = rrr($e->lo | ((127 & $obj->buf[$obj->pos++]) << (7 * $i)), 0);
return $e;
}
for ($i = 0; $i < 4; $i++) {
$e->lo = rrr($e->lo | ((127 & $obj->buf[$obj->pos]) << (7 * $i)), 0);
if ($obj->buf[$obj->pos++] < 128) return $e;
}
$e->lo = rrr(($e->lo | ((127 & $obj->buf[$obj->pos]) << 28)), 0);
$e->hi = rrr(($e->hi | rr((127 & $obj->buf[$obj->pos]), 4)), 0);
if ($obj->buf[$obj->pos++] < 128) {
return $e;
}
if ($obj->len - $obj->pos > 4) {
for ($i = 0; $i < 5; $i++) {
$e->hi = rrr($e->hi | ((127 & $obj->buf[$obj->pos]) << (7 * $i) + 3), 0);
if ($obj->buf[$obj->pos++] < 128) return $e;
}
}
else {
for ($i = 0; $i < 5; $i++) {
if ($obj->pos >= $obj->len) throw new Exception('ERROR RANGE');
$e->hi = rrr($e->hi | ((127 & $obj->buf[$obj->pos]) << (7 * $i) + 3), 0);
if ($obj->buf[$obj->pos++] < 128) return $e;
}
}
throw new Exception("invalid timestamp encoding");
}
/**
* Date time
*/
function int64($obj)
{
$e = int64_helper($obj);
$mst = $e->lo + 4294967296 * $e->hi;
$t = substr($mst, 0, -3); // poslední 3 nuly dávám pryč
$s = date("Y-m-d H:i:s", $t); // prague timezone
return $s;
}
/**
* The >>> javascript operator in php x86_64
* Usage: -1149025787 >>> 0 ---> rrr(-1149025787, 0) === 3145941509
* #return int
*/
function rrr($v, $n)
{
return ($v & 0xFFFFFFFF) >> ($n & 0x1F);
}
/**
* The >> javascript operator in php x86_64
* #return int
*/
function rr($v, $n)
{
return ($v & 0x80000000 ? $v | 0xFFFFFFFF00000000 : $v & 0xFFFFFFFF) >> ($n & 0x1F);
}
/**
* The << javascript operator in php x86_64
* #return int
*/
function ll($v, $n)
{
return ($t = ($v & 0xFFFFFFFF) << ($n & 0x1F)) & 0x80000000 ? $t | 0xFFFFFFFF00000000 : $t & 0xFFFFFFFF;
}
Related
This is a C++ to PHP riddle involving log functions.
I need to translate the following function into PHP ... It's a long story, but we are sure we have a lot of it figured out ... but I could use another pair of eyes.
I have a possible PHP implementation below ...
We have examples of data on the C++ side that produce checksums that we cannot reproduce on the PHP side ... frustrating.
============================================================
#define SYNC_CHECKSUM_MULT (0x01000193)
/**
* Calculate a reasonable checksum for floating point numbers.
*/
static void oi_AccumulateFloat64Checksum(oiInt& checksum, oiFloat64 f)
{
// j = round(f * 2^k), where j is a 12-bit number
if ( oi_IsNaN(f) || oi_IsInf(f) )
{
checksum ^= 1;
checksum *= SYNC_CHECKSUM_MULT;
return;
}
oiInt exponent = (oiInt)logb(f);
oiInt k = 12 - exponent;
oiFloat64 jUnrounded = scalb(f, k);
oiFloat64 jRounded = floor(jUnrounded + 0.5);
oiInt kChange = (oiInt)logb(jRounded) - 12;
k += kChange;
oiFloat64 jFloat = scalb(jRounded, kChange);
oiInt j = (oiInt)jFloat;
checksum ^= j;
checksum *= SYNC_CHECKSUM_MULT;
checksum ^= k;
checksum *= SYNC_CHECKSUM_MULT;
}
============= PHP Implementation =========================
This is executed before we call this function.. $intChecksum = 0;
public function AccumulateFloatChecksum(&$intChecksum, $floatValue) {
// j = round(f * 2^k), where j is a 12-bit number
$TRIVIAL_CHECKSUM_MULT = 0x01000193;
if ( !is_numeric($floatValue) || $floatValue == 0 || is_nan($floatValue) || is_infinite($floatValue) )
{
$intChecksum ^= 1;
$intChecksum *= $TRIVIAL_CHECKSUM_MULT;
$intChecksum &= 0xFFFFFFFF;
return;
}
//var_dump('$floatValue: '.$floatValue);
// oiInt exponent = (oiInt)logb(f);
$exponent = intval(log($floatValue, 2));
//var_dump('$exponent: '.$exponent);
// oiInt k = 12 - exponent;
$k = 12 - $exponent;
//var_dump('$k 1: '.$k);
// oiFloat64 jUnrounded = scalb(f, k);
$jUnrounded = $floatValue * pow(2, $k);
//var_dump('$jUnrounded: '.$jUnrounded);
// oiFloat64 jRounded = floor(jUnrounded + 0.5);
$jRounded = floor($jUnrounded + 0.5);
//var_dump('$jRounded: '.$jRounded);
// oiInt kChange = (oiInt)logb(jRounded) - 12;
$kChange = intval(log($jRounded, 2)) - 12;
//var_dump('$kChange: '.$kChange);
// k += kChange;
$k += $kChange;
//var_dump('$k 2: '.$k);
// oiFloat64 jFloat = scalb(jRounded, kChange);
$jFloat = $jRounded * pow(2, $kChange);
//var_dump('$jFloat: '.$jFloat);
$j = intval($jFloat);
//var_dump('$j: '.$j);
//var_dump('===============================');
$intChecksum ^= $j;
$intChecksum *= $TRIVIAL_CHECKSUM_MULT;
$intChecksum &= 0xFFFFFFFF;
$intChecksum ^= $k;
$intChecksum *= $TRIVIAL_CHECKSUM_MULT;
$intChecksum &= 0xFFFFFFFF;
}
I need to convert an unsigned integer into a 4 byte string to send on a socket.
I have the following code and it works, but it feels... disgusting.
/**
* #param $int
* #return string
*/
function intToFourByteString( $int ) {
$four = floor($int / pow(2, 24));
$int = $int - ($four * pow(2, 24));
$three = floor($int / pow(2, 16));
$int = $int - ($three * pow(2, 16));
$two = floor($int / pow(2, 8));
$int = $int - ($two * pow(2, 8));
$one = $int;
return chr($four) . chr($three) . chr($two) . chr($one);
}
My friend who uses C says I should be able to do this with bitshifts but I don't know how and he isn't familiar enough with PHP to be helpful. Any help would be appreciated.
To do the reverse I already have the following code
/**
* #param $string
* #return int
*/
function fourByteStringToInt( $string ) {
if( strlen($string) != 4 ) {
throw new \InvalidArgumentException('String to parse must be 4 bytes exactly');
}
return (ord($string[0]) << 24) + (ord($string[1]) << 16) + (ord($string[2]) << 8) + ord($string[3]);
}
This is actually as simple as
$str = pack('N', $int);
see pack. And the reverse:
$int = unpack('N', $str)[1];
If you're curious how to do packing using bit shifts, it goes like this:
function intToFourByteString( $int ) {
return
chr($int >> 24 & 0xFF).
chr($int >> 16 & 0xFF).
chr($int >> 8 & 0xFF).
chr($int >> 0 & 0xFF);
}
Basically, shift eight bits each time and mask with 0xFF (=255) to remove high-order bits.
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
I need get base64 encoded string in javascript, ( input string may contains non ascii symbols )
There is some good solution ?
If you work with Unicode in the string you finally encode with Base64, I'd suggest to use the following script proposed by WebToolkit.info. Script is fully compatible with UTF-8 encoding.
/**
*
* Base64 encode / decode
* http://www.webtoolkit.info/
*
**/
var Base64 = {
// private property
_keyStr : "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=",
// public method for encoding
encode : function (input) {
var output = "";
var chr1, chr2, chr3, enc1, enc2, enc3, enc4;
var i = 0;
input = Base64._utf8_encode(input);
while (i < input.length) {
chr1 = input.charCodeAt(i++);
chr2 = input.charCodeAt(i++);
chr3 = input.charCodeAt(i++);
enc1 = chr1 >> 2;
enc2 = ((chr1 & 3) << 4) | (chr2 >> 4);
enc3 = ((chr2 & 15) << 2) | (chr3 >> 6);
enc4 = chr3 & 63;
if (isNaN(chr2)) {
enc3 = enc4 = 64;
} else if (isNaN(chr3)) {
enc4 = 64;
}
output = output +
this._keyStr.charAt(enc1) + this._keyStr.charAt(enc2) +
this._keyStr.charAt(enc3) + this._keyStr.charAt(enc4);
}
return output;
},
// public method for decoding
decode : function (input) {
var output = "";
var chr1, chr2, chr3;
var enc1, enc2, enc3, enc4;
var i = 0;
input = input.replace(/[^A-Za-z0-9\+\/\=]/g, "");
while (i < input.length) {
enc1 = this._keyStr.indexOf(input.charAt(i++));
enc2 = this._keyStr.indexOf(input.charAt(i++));
enc3 = this._keyStr.indexOf(input.charAt(i++));
enc4 = this._keyStr.indexOf(input.charAt(i++));
chr1 = (enc1 << 2) | (enc2 >> 4);
chr2 = ((enc2 & 15) << 4) | (enc3 >> 2);
chr3 = ((enc3 & 3) << 6) | enc4;
output = output + String.fromCharCode(chr1);
if (enc3 != 64) {
output = output + String.fromCharCode(chr2);
}
if (enc4 != 64) {
output = output + String.fromCharCode(chr3);
}
}
output = Base64._utf8_decode(output);
return output;
},
// private method for UTF-8 encoding
_utf8_encode : function (string) {
string = string.replace(/\r\n/g,"\n");
var utftext = "";
for (var n = 0; n < string.length; n++) {
var c = string.charCodeAt(n);
if (c < 128) {
utftext += String.fromCharCode(c);
}
else if((c > 127) && (c < 2048)) {
utftext += String.fromCharCode((c >> 6) | 192);
utftext += String.fromCharCode((c & 63) | 128);
}
else {
utftext += String.fromCharCode((c >> 12) | 224);
utftext += String.fromCharCode(((c >> 6) & 63) | 128);
utftext += String.fromCharCode((c & 63) | 128);
}
}
return utftext;
},
// private method for UTF-8 decoding
_utf8_decode : function (utftext) {
var string = "";
var i = 0;
var c = c1 = c2 = 0;
while ( i < utftext.length ) {
c = utftext.charCodeAt(i);
if (c < 128) {
string += String.fromCharCode(c);
i++;
}
else if((c > 191) && (c < 224)) {
c2 = utftext.charCodeAt(i+1);
string += String.fromCharCode(((c & 31) << 6) | (c2 & 63));
i += 2;
}
else {
c2 = utftext.charCodeAt(i+1);
c3 = utftext.charCodeAt(i+2);
string += String.fromCharCode(((c & 15) << 12) | ((c2 & 63) << 6) | (c3 & 63));
i += 3;
}
}
return string;
}
}
DEMO: http://www.webtoolkit.info/demo/javascript-base64
There is no language default solution to get a string to be base64. You'll have to write your own function, or steal this one:
http://ntt.cc/2008/01/19/base64-encoder-decoder-with-javascript.html
The question
How can I decode a string with JavaScript that's encoded in php and maintain the "åäö" letters?
Overview of the problem
As the title states I'm trying to decode a base64 encoded string that I generate from my php code. It all works fine except for the letters "åäö" that the Swedish alphabet ends with.
Output exemple:
å ä ö Å Ä Ö => Ã¥ ä ö à à Ã
Code
The base64 JavaScript I'm using
/*
* Copyright (c) 2010 Nick Galbreath
* http://code.google.com/p/stringencoders/source/browse/#svn/trunk/javascript
*
* Permission is hereby granted, free of charge, to any person
* obtaining a copy of this software and associated documentation
* files (the "Software"), to deal in the Software without
* restriction, including without limitation the rights to use,
* copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following
* conditions:
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
* OTHER DEALINGS IN THE SOFTWARE.
*/
/* base64 encode/decode compatible with window.btoa/atob
*
* window.atob/btoa is a Firefox extension to convert binary data (the "b")
* to base64 (ascii, the "a").
*
* It is also found in Safari and Chrome. It is not available in IE.
*
* if (!window.btoa) window.btoa = base64.encode
* if (!window.atob) window.atob = base64.decode
*
* The original spec's for atob/btoa are a bit lacking
* https://developer.mozilla.org/en/DOM/window.atob
* https://developer.mozilla.org/en/DOM/window.btoa
*
* window.btoa and base64.encode takes a string where charCodeAt is [0,255]
* If any character is not [0,255], then an exception is thrown.
*
* window.atob and base64.decode take a base64-encoded string
* If the input length is not a multiple of 4, or contains invalid characters
* then an exception is thrown.
*/
base64 = {};
base64.PADCHAR = '=';
base64.ALPHA = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/';
base64.getbyte64 = function(s,i) {
// This is oddly fast, except on Chrome/V8.
// Minimal or no improvement in performance by using a
// object with properties mapping chars to value (eg. 'A': 0)
var idx = base64.ALPHA.indexOf(s.charAt(i));
if (idx == -1) {
throw "Cannot decode base64";
}
return idx;
}
base64.decode = function(s) {
// convert to string
s = "" + s;
var getbyte64 = base64.getbyte64;
var pads, i, b10;
var imax = s.length
if (imax == 0) {
return s;
}
if (imax % 4 != 0) {
throw "Cannot decode base64";
}
pads = 0
if (s.charAt(imax -1) == base64.PADCHAR) {
pads = 1;
if (s.charAt(imax -2) == base64.PADCHAR) {
pads = 2;
}
// either way, we want to ignore this last block
imax -= 4;
}
var x = [];
for (i = 0; i < imax; i += 4) {
b10 = (getbyte64(s,i) << 18) | (getbyte64(s,i+1) << 12) |
(getbyte64(s,i+2) << 6) | getbyte64(s,i+3);
x.push(String.fromCharCode(b10 >> 16, (b10 >> 8) & 0xff, b10 & 0xff));
}
switch (pads) {
case 1:
b10 = (getbyte64(s,i) << 18) | (getbyte64(s,i+1) << 12) | (getbyte64(s,i+2) << 6)
x.push(String.fromCharCode(b10 >> 16, (b10 >> 8) & 0xff));
break;
case 2:
b10 = (getbyte64(s,i) << 18) | (getbyte64(s,i+1) << 12);
x.push(String.fromCharCode(b10 >> 16));
break;
}
return x.join('');
}
base64.getbyte = function(s,i) {
var x = s.charCodeAt(i);
if (x > 255) {
throw "INVALID_CHARACTER_ERR: DOM Exception 5";
}
return x;
}
base64.encode = function(s) {
if (arguments.length != 1) {
throw "SyntaxError: Not enough arguments";
}
var padchar = base64.PADCHAR;
var alpha = base64.ALPHA;
var getbyte = base64.getbyte;
var i, b10;
var x = [];
// convert to string
s = "" + s;
var imax = s.length - s.length % 3;
if (s.length == 0) {
return s;
}
for (i = 0; i < imax; i += 3) {
b10 = (getbyte(s,i) << 16) | (getbyte(s,i+1) << 8) | getbyte(s,i+2);
x.push(alpha.charAt(b10 >> 18));
x.push(alpha.charAt((b10 >> 12) & 0x3F));
x.push(alpha.charAt((b10 >> 6) & 0x3f));
x.push(alpha.charAt(b10 & 0x3f));
}
switch (s.length - imax) {
case 1:
b10 = getbyte(s,i) << 16;
x.push(alpha.charAt(b10 >> 18) + alpha.charAt((b10 >> 12) & 0x3F) +
padchar + padchar);
break;
case 2:
b10 = (getbyte(s,i) << 16) | (getbyte(s,i+1) << 8);
x.push(alpha.charAt(b10 >> 18) + alpha.charAt((b10 >> 12) & 0x3F) +
alpha.charAt((b10 >> 6) & 0x3f) + padchar);
break;
}
return x.join('');
}
The implementation
<script type="text/javascript">
document.write(
base64.decode( '<?php echo base64_encode( "å ä ö Å Ä Ö" ); ?>' ) );
</script>
Edit
The script I found that worked:
(someone asked me for this, so here it is)
var Base64 =
{
// private property
_keyStr : "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=",
// public method for encoding
encode : function (input)
{
var output = "";
var chr1, chr2, chr3, enc1, enc2, enc3, enc4;
var i = 0;
input = Base64._utf8_encode(input);
while (i < input.length) {
chr1 = input.charCodeAt(i++);
chr2 = input.charCodeAt(i++);
chr3 = input.charCodeAt(i++);
enc1 = chr1 >> 2;
enc2 = ((chr1 & 3) << 4) | (chr2 >> 4);
enc3 = ((chr2 & 15) << 2) | (chr3 >> 6);
enc4 = chr3 & 63;
if (isNaN(chr2)) {
enc3 = enc4 = 64;
} else if (isNaN(chr3)) {
enc4 = 64;
}
output = output +
this._keyStr.charAt(enc1) + this._keyStr.charAt(enc2) +
this._keyStr.charAt(enc3) + this._keyStr.charAt(enc4);
}
return output;
},
// public method for decoding
decode : function (input)
{
var output = "";
var chr1, chr2, chr3;
var enc1, enc2, enc3, enc4;
var i = 0;
input = input.replace(/[^A-Za-z0-9\+\/\=]/g, "");
while (i < input.length) {
enc1 = this._keyStr.indexOf(input.charAt(i++));
enc2 = this._keyStr.indexOf(input.charAt(i++));
enc3 = this._keyStr.indexOf(input.charAt(i++));
enc4 = this._keyStr.indexOf(input.charAt(i++));
chr1 = (enc1 << 2) | (enc2 >> 4);
chr2 = ((enc2 & 15) << 4) | (enc3 >> 2);
chr3 = ((enc3 & 3) << 6) | enc4;
output = output + String.fromCharCode(chr1);
if (enc3 != 64) {
output = output + String.fromCharCode(chr2);
}
if (enc4 != 64) {
output = output + String.fromCharCode(chr3);
}
}
output = Base64._utf8_decode(output);
return output;
},
// private method for UTF-8 encoding
_utf8_encode : function (string)
{
string = string.replace(/\r\n/g,"\n");
var utftext = "";
for (var n = 0; n < string.length; n++) {
var c = string.charCodeAt(n);
if (c < 128) {
utftext += String.fromCharCode(c);
}
else if((c > 127) && (c < 2048)) {
utftext += String.fromCharCode((c >> 6) | 192);
utftext += String.fromCharCode((c & 63) | 128);
}
else {
utftext += String.fromCharCode((c >> 12) | 224);
utftext += String.fromCharCode(((c >> 6) & 63) | 128);
utftext += String.fromCharCode((c & 63) | 128);
}
}
return utftext;
},
// private method for UTF-8 decoding
_utf8_decode : function (utftext)
{
var string = "";
var i = 0;
var c = c1 = c2 = 0;
while ( i < utftext.length ) {
c = utftext.charCodeAt(i);
if (c < 128) {
string += String.fromCharCode(c);
i++;
}
else if((c > 191) && (c < 224)) {
c2 = utftext.charCodeAt(i+1);
string += String.fromCharCode(((c & 31) << 6) | (c2 & 63));
i += 2;
}
else {
c2 = utftext.charCodeAt(i+1);
c3 = utftext.charCodeAt(i+2);
string += String.fromCharCode(((c & 15) << 12) | ((c2 & 63) << 6) | (c3 & 63));
i += 3;
}
}
return string;
}
}
Looks like a character encoding problem, make sure all you files are using the same encoding (UTF-8?) even you JavaScript files.
If not try searching to see if others have experienced the same problem, most likely with those special characters. (I'm from Norway, so I know how it is with those damn characters ;)
If this don't solve your problem, try another JavaScript base64 decoder.
You might give this a shot to see if it solves your problem:
http://phpjs.org/
I'm looking for a way to create valid UTF-16 JavaScript escape sequence characters (including surrogate pairs) from within PHP.
I'm using the code below to get the UTF-32 code points (from a UTF-8 encoded character). This works as JavaScript escape characters (eg. '\u00E1' for 'á') - until you get into the upper ranges where you get surrogate pairs (eg '𝜕' comes out as '\u1D715' but should be '\uD835\uDF15')...
function toOrdinal($chr)
{
if (ord($chr{0}) >= 0 && ord($chr{0}) <= 127)
{
return ord($chr{0});
}
elseif (ord($chr{0}) >= 192 && ord($chr{0}) <= 223)
{
return (ord($chr{0}) - 192) * 64 + (ord($chr{1}) - 128);
}
elseif (ord($chr{0}) >= 224 && ord($chr{0}) <= 239)
{
return (ord($chr{0}) - 224) * 4096 + (ord($chr{1}) - 128) * 64 + (ord($chr{2}) - 128);
}
elseif (ord($chr{0}) >= 240 && ord($chr{0}) <= 247)
{
return (ord($chr{0}) - 240) * 262144 + (ord($chr{1}) - 128) * 4096 + (ord($chr{2}) - 128) * 64 + (ord($chr{3}) - 128);
}
elseif (ord($chr{0}) >= 248 && ord($chr{0}) <= 251)
{
return (ord($chr{0}) - 248) * 16777216 + (ord($chr{1}) - 128) * 262144 + (ord($chr{2}) - 128) * 4096 + (ord($chr{3}) - 128) * 64 + (ord($chr{4}) - 128);
}
elseif (ord($chr{0}) >= 252 && ord($chr{0}) <= 253)
{
return (ord($chr{0}) - 252) * 1073741824 + (ord($chr{1}) - 128) * 16777216 + (ord($chr{2}) - 128) * 262144 + (ord($chr{3}) - 128) * 4096 + (ord($chr{4}) - 128) * 64 + (ord($chr{5}) - 128);
}
}
How do I adapt this code to give me proper UTF-16 code points? Thanks!
How about using iconv (or similarly mb_convert_encoding)?
eg. something like:
$utf16= iconv('UTF-8', 'UTF-16LE', $text);
$codeunits= array();
for ($i= 0; $i<strlen($utf16); $i+= 2) {
$codeunits[]= ord($utf16{$i})+ord($utf16{$i+1})<<8;
}
Here's the final code used (based on the answer from bobince):
$output = '';
$utf16 = iconv(fxCHARSET, 'UTF-16BE', $text);
for ($i= 0; $i < strlen($utf16); $i+= 2)
{
$output.= '\\u'.str_pad(dechex((ord($character{$i}) << 8) + ord($character{$i+1})), 4, '0', STR_PAD_LEFT);
}