How to decode numeric HTML entities in PHP

How to decode numeric HTML entities in PHP - php

I'm trying to decode encoded long dash from numeric entity to string, but it seems that I can't find a function which can do this properly.
The best that I found is mb_decode_numericentity(), however, for some reason it fails to decode long dash and some other special characters.
$str = '–';
$str = mb_decode_numericentity($str, array(0xFF, 0x2FFFF, 0, 0xFFFF), 'ISO-8859-1');
This will return "?".
Anyone knows how to solve this problem?

The following code snippet (mostly stolen from here and improved) will work for literal, numeric decimal, and numeric hexa-decimal entities:
header("content-type: text/html; charset=utf-8");
/**
* Decodes all HTML entities, including numeric and hexadecimal ones.
*
* #param mixed $string
* #return string decoded HTML
*/
function html_entity_decode_numeric($string, $quote_style = ENT_COMPAT, $charset = "utf-8")
{
$string = html_entity_decode($string, $quote_style, $charset);
$string = preg_replace_callback('~&#x([0-9a-fA-F]+);~i', "chr_utf8_callback", $string);
$string = preg_replace('~&#([0-9]+);~e', 'chr_utf8("\\1")', $string);
return $string;
}
/**
* Callback helper
*/
function chr_utf8_callback($matches)
{
return chr_utf8(hexdec($matches[1]));
}
/**
* Multi-byte chr(): Will turn a numeric argument into a UTF-8 string.
*
* #param mixed $num
* #return string
*/
function chr_utf8($num)
{
if ($num < 128) return chr($num);
if ($num < 2048) return chr(($num >> 6) + 192) . chr(($num & 63) + 128);
if ($num < 65536) return chr(($num >> 12) + 224) . chr((($num >> 6) & 63) + 128) . chr(($num & 63) + 128);
if ($num < 2097152) return chr(($num >> 18) + 240) . chr((($num >> 12) & 63) + 128) . chr((($num >> 6) & 63) + 128) . chr(($num & 63) + 128);
return '';
}
$string ="”";
echo html_entity_decode_numeric($string);
Improvement suggestions are welcome.

mb_decode_numericentity does not handle hexadecimal, only decimal. Do you get the expected result with:
$str = '–';
$str = mb_decode_numericentity ( $str , Array(255, 3145727, 0, 65535) , 'ISO-8859-1');
You can use hexdec to convert your hexadecimal to decimal.
Also, out of curiosity, does the following work:
$str = '–';
$str = html_entity_decode($str);

Related

PHP socket - Can someone explain me these functions?

I'm trying to make a php socket server and I found two functions that mask and unmask a text message (frame).
I think I don't understand clearly how it works.
This is the functions :
//encode message for transfer to client
function mask($text)
{
$b1 = 0x80 | (0x1 & 0x0f);
$length = strlen($text);
if ($length <= 125)
$header = pack('CC', $b1, $length);
elseif ($length > 125 && $length < 65536)
$header = pack('CCn', $b1, 126, $length);
elseif ($length >= 65536)
$header = pack('CCNN', $b1, 127, $length);
return $header . $text;
}
//unmask incoming framed message
function unmask($text)
{
$length = ord($text[1]) & 127;
if ($length == 126) {
$masks = substr($text, 4, 4);
$data = substr($text, 8);
} elseif ($length == 127) {
$masks = substr($text, 10, 4);
$data = substr($text, 14);
} else {
$masks = substr($text, 2, 4);
$data = substr($text, 6);
}
$text = "";
for ($i = 0; $i < strlen($data); ++$i) {
$text .= $data[$i] ^ $masks[$i % 4];
}
return $text;
}
What I think I've understood :
mask convert the binary representation of the message and create a frame (by concatenating the header) of the right size according to the message length. (by adding bytes with pack() right?)
unmask -> reverse process.
What I don't understand :
What is the purpose of this variable $b1 used in mask? The syntax of this code is not clear for me.
$b1 = 0x80 | (0x1 & 0x0f);

That line is a bit strange to write that way, but here is what is going on. & is a binary AND operator, which takes the two values and returns only the bits that match. 0x1 is 00000001 and 0x0f is 00001111 in binary.
00000001
&00001111
=00000001
so (0x1 & 0x0f) is just 0x1 or 1.
The | operator is like &, but is a binary OR. If either side has a 1, the result will be a 1. 0x80 is 01000000, so
01000000
|00000001
=01000001
So the total result is 0x81. Why not just write $b1 = 0x81? I'm guessing that the author of this code copied it from some C code where the 0x1 part was a variable:
byte b1 = 0x80 | (someVariable & 0x0f);
In this case, the binary & with 0x0f ensures that only the last 4 bits of someVariable will be used, and the first 4 bits of b1 will always be 0x8 (which is probably necessary according to the frame specification).

Encoding/Unmasking Websocket binary data in PHP

I have set up a PHP WebSocket server that is able to read string data from clients. My question is on how to handle binary data types. Below is the code on the client side that records microphone input as a Float32Array object and sends the data over a WebSocket connection in binary.
websocket = new WebSocket("ws://...");
websocket.binaryType = "arraybuffer";
recorder.onaudioprocess = function(stream) {
var inputData = stream.inputBuffer.getChannelData(0);
websocket.send(inputData);
}
As for the server side, I am using the following functions which I've found online for encoding/unmasking.
function mask($text)
{
$b1 = 0x80 | (0x1 & 0x0f);
$length = strlen($text);
if( $length <= 125)
$header = pack('CC', $b1, $length);
elseif ($length > 125 && $length < 65536)
$header = pack('CCS', $b1, 126, $length);
elseif ($length >= 65536)
$header = pack('CCN', $b1, 127, $length);
return $header.$text;
}
function unmask($payload)
{
$length = ord($payload[1]) & 127;
if($length == 126) {
$masks = substr($payload, 4, 4);
$data = substr($payload, 8);
$len = (ord($payload[2]) << 8) + ord($payload[3]);
}
elseif($length == 127) {
$masks = substr($payload, 10, 4);
$data = substr($payload, 14);
$len = (ord($payload[2]) << 56) + (ord($payload[3]) << 48) +
(ord($payload[4]) << 40) + (ord($payload[5]) << 32) +
(ord($payload[6]) << 24) +(ord($payload[7]) << 16) +
(ord($payload[8]) << 8) + ord($payload[9]);
}
else {
$masks = substr($payload, 2, 4);
$data = substr($payload, 6);
$len = $length;
}
$text = '';
for ($i = 0; $i < $len; ++$i) {
$text .= $data[$i] ^ $masks[$i%4];
}
return $text;
}
The code works great except they only work on string data and not for binary type. My question is how do I handle binary type and push them to clients?

I don't really get what you are trying to do. WebRTC can be used to create Peer2Peer connections. Thus you would use your PHP Server as a signaling server. The actual data is not passed to your server unless it is needed for forwarding because a p2p connection could not be established. Is that what you want to do?
Otherwise create a RtcPeerConnection and send data from peer to peer as it is intended by webrtc.

Convert Int into 4 Byte String in PHP

I need to convert an unsigned integer into a 4 byte string to send on a socket.
I have the following code and it works, but it feels... disgusting.
/**
* #param $int
* #return string
*/
function intToFourByteString( $int ) {
$four = floor($int / pow(2, 24));
$int = $int - ($four * pow(2, 24));
$three = floor($int / pow(2, 16));
$int = $int - ($three * pow(2, 16));
$two = floor($int / pow(2, 8));
$int = $int - ($two * pow(2, 8));
$one = $int;
return chr($four) . chr($three) . chr($two) . chr($one);
}
My friend who uses C says I should be able to do this with bitshifts but I don't know how and he isn't familiar enough with PHP to be helpful. Any help would be appreciated.
To do the reverse I already have the following code
/**
* #param $string
* #return int
*/
function fourByteStringToInt( $string ) {
if( strlen($string) != 4 ) {
throw new \InvalidArgumentException('String to parse must be 4 bytes exactly');
}
return (ord($string[0]) << 24) + (ord($string[1]) << 16) + (ord($string[2]) << 8) + ord($string[3]);
}

This is actually as simple as
$str = pack('N', $int);
see pack. And the reverse:
$int = unpack('N', $str)[1];
If you're curious how to do packing using bit shifts, it goes like this:
function intToFourByteString( $int ) {
return
chr($int >> 24 & 0xFF).
chr($int >> 16 & 0xFF).
chr($int >> 8 & 0xFF).
chr($int >> 0 & 0xFF);
}
Basically, shift eight bits each time and mask with 0xFF (=255) to remove high-order bits.

Decode an base64 encoded string through php in javascript

The question
How can I decode a string with JavaScript that's encoded in php and maintain the "åäö" letters?
Overview of the problem
As the title states I'm trying to decode a base64 encoded string that I generate from my php code. It all works fine except for the letters "åäö" that the Swedish alphabet ends with.
Output exemple:
å ä ö Å Ä Ö => Ã¥ Ã¤ Ã¶ Ã Ã Ã
Code
The base64 JavaScript I'm using
/*
* Copyright (c) 2010 Nick Galbreath
* http://code.google.com/p/stringencoders/source/browse/#svn/trunk/javascript
*
* Permission is hereby granted, free of charge, to any person
* obtaining a copy of this software and associated documentation
* files (the "Software"), to deal in the Software without
* restriction, including without limitation the rights to use,
* copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following
* conditions:
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
* OTHER DEALINGS IN THE SOFTWARE.
*/
/* base64 encode/decode compatible with window.btoa/atob
*
* window.atob/btoa is a Firefox extension to convert binary data (the "b")
* to base64 (ascii, the "a").
*
* It is also found in Safari and Chrome. It is not available in IE.
*
* if (!window.btoa) window.btoa = base64.encode
* if (!window.atob) window.atob = base64.decode
*
* The original spec's for atob/btoa are a bit lacking
* https://developer.mozilla.org/en/DOM/window.atob
* https://developer.mozilla.org/en/DOM/window.btoa
*
* window.btoa and base64.encode takes a string where charCodeAt is [0,255]
* If any character is not [0,255], then an exception is thrown.
*
* window.atob and base64.decode take a base64-encoded string
* If the input length is not a multiple of 4, or contains invalid characters
* then an exception is thrown.
*/
base64 = {};
base64.PADCHAR = '=';
base64.ALPHA = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/';
base64.getbyte64 = function(s,i) {
// This is oddly fast, except on Chrome/V8.
// Minimal or no improvement in performance by using a
// object with properties mapping chars to value (eg. 'A': 0)
var idx = base64.ALPHA.indexOf(s.charAt(i));
if (idx == -1) {
throw "Cannot decode base64";
}
return idx;
}
base64.decode = function(s) {
// convert to string
s = "" + s;
var getbyte64 = base64.getbyte64;
var pads, i, b10;
var imax = s.length
if (imax == 0) {
return s;
}
if (imax % 4 != 0) {
throw "Cannot decode base64";
}
pads = 0
if (s.charAt(imax -1) == base64.PADCHAR) {
pads = 1;
if (s.charAt(imax -2) == base64.PADCHAR) {
pads = 2;
}
// either way, we want to ignore this last block
imax -= 4;
}
var x = [];
for (i = 0; i < imax; i += 4) {
b10 = (getbyte64(s,i) << 18) | (getbyte64(s,i+1) << 12) |
(getbyte64(s,i+2) << 6) | getbyte64(s,i+3);
x.push(String.fromCharCode(b10 >> 16, (b10 >> 8) & 0xff, b10 & 0xff));
}
switch (pads) {
case 1:
b10 = (getbyte64(s,i) << 18) | (getbyte64(s,i+1) << 12) | (getbyte64(s,i+2) << 6)
x.push(String.fromCharCode(b10 >> 16, (b10 >> 8) & 0xff));
break;
case 2:
b10 = (getbyte64(s,i) << 18) | (getbyte64(s,i+1) << 12);
x.push(String.fromCharCode(b10 >> 16));
break;
}
return x.join('');
}
base64.getbyte = function(s,i) {
var x = s.charCodeAt(i);
if (x > 255) {
throw "INVALID_CHARACTER_ERR: DOM Exception 5";
}
return x;
}
base64.encode = function(s) {
if (arguments.length != 1) {
throw "SyntaxError: Not enough arguments";
}
var padchar = base64.PADCHAR;
var alpha = base64.ALPHA;
var getbyte = base64.getbyte;
var i, b10;
var x = [];
// convert to string
s = "" + s;
var imax = s.length - s.length % 3;
if (s.length == 0) {
return s;
}
for (i = 0; i < imax; i += 3) {
b10 = (getbyte(s,i) << 16) | (getbyte(s,i+1) << 8) | getbyte(s,i+2);
x.push(alpha.charAt(b10 >> 18));
x.push(alpha.charAt((b10 >> 12) & 0x3F));
x.push(alpha.charAt((b10 >> 6) & 0x3f));
x.push(alpha.charAt(b10 & 0x3f));
}
switch (s.length - imax) {
case 1:
b10 = getbyte(s,i) << 16;
x.push(alpha.charAt(b10 >> 18) + alpha.charAt((b10 >> 12) & 0x3F) +
padchar + padchar);
break;
case 2:
b10 = (getbyte(s,i) << 16) | (getbyte(s,i+1) << 8);
x.push(alpha.charAt(b10 >> 18) + alpha.charAt((b10 >> 12) & 0x3F) +
alpha.charAt((b10 >> 6) & 0x3f) + padchar);
break;
}
return x.join('');
}
The implementation
<script type="text/javascript">
document.write(
base64.decode( '<?php echo base64_encode( "å ä ö Å Ä Ö" ); ?>' ) );
</script>
Edit
The script I found that worked:
(someone asked me for this, so here it is)
var Base64 =
{
// private property
_keyStr : "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=",
// public method for encoding
encode : function (input)
{
var output = "";
var chr1, chr2, chr3, enc1, enc2, enc3, enc4;
var i = 0;
input = Base64._utf8_encode(input);
while (i < input.length) {
chr1 = input.charCodeAt(i++);
chr2 = input.charCodeAt(i++);
chr3 = input.charCodeAt(i++);
enc1 = chr1 >> 2;
enc2 = ((chr1 & 3) << 4) | (chr2 >> 4);
enc3 = ((chr2 & 15) << 2) | (chr3 >> 6);
enc4 = chr3 & 63;
if (isNaN(chr2)) {
enc3 = enc4 = 64;
} else if (isNaN(chr3)) {
enc4 = 64;
}
output = output +
this._keyStr.charAt(enc1) + this._keyStr.charAt(enc2) +
this._keyStr.charAt(enc3) + this._keyStr.charAt(enc4);
}
return output;
},
// public method for decoding
decode : function (input)
{
var output = "";
var chr1, chr2, chr3;
var enc1, enc2, enc3, enc4;
var i = 0;
input = input.replace(/[^A-Za-z0-9\+\/\=]/g, "");
while (i < input.length) {
enc1 = this._keyStr.indexOf(input.charAt(i++));
enc2 = this._keyStr.indexOf(input.charAt(i++));
enc3 = this._keyStr.indexOf(input.charAt(i++));
enc4 = this._keyStr.indexOf(input.charAt(i++));
chr1 = (enc1 << 2) | (enc2 >> 4);
chr2 = ((enc2 & 15) << 4) | (enc3 >> 2);
chr3 = ((enc3 & 3) << 6) | enc4;
output = output + String.fromCharCode(chr1);
if (enc3 != 64) {
output = output + String.fromCharCode(chr2);
}
if (enc4 != 64) {
output = output + String.fromCharCode(chr3);
}
}
output = Base64._utf8_decode(output);
return output;
},
// private method for UTF-8 encoding
_utf8_encode : function (string)
{
string = string.replace(/\r\n/g,"\n");
var utftext = "";
for (var n = 0; n < string.length; n++) {
var c = string.charCodeAt(n);
if (c < 128) {
utftext += String.fromCharCode(c);
}
else if((c > 127) && (c < 2048)) {
utftext += String.fromCharCode((c >> 6) | 192);
utftext += String.fromCharCode((c & 63) | 128);
}
else {
utftext += String.fromCharCode((c >> 12) | 224);
utftext += String.fromCharCode(((c >> 6) & 63) | 128);
utftext += String.fromCharCode((c & 63) | 128);
}
}
return utftext;
},
// private method for UTF-8 decoding
_utf8_decode : function (utftext)
{
var string = "";
var i = 0;
var c = c1 = c2 = 0;
while ( i < utftext.length ) {
c = utftext.charCodeAt(i);
if (c < 128) {
string += String.fromCharCode(c);
i++;
}
else if((c > 191) && (c < 224)) {
c2 = utftext.charCodeAt(i+1);
string += String.fromCharCode(((c & 31) << 6) | (c2 & 63));
i += 2;
}
else {
c2 = utftext.charCodeAt(i+1);
c3 = utftext.charCodeAt(i+2);
string += String.fromCharCode(((c & 15) << 12) | ((c2 & 63) << 6) | (c3 & 63));
i += 3;
}
}
return string;
}
}

Looks like a character encoding problem, make sure all you files are using the same encoding (UTF-8?) even you JavaScript files.
If not try searching to see if others have experienced the same problem, most likely with those special characters. (I'm from Norway, so I know how it is with those damn characters ;)
If this don't solve your problem, try another JavaScript base64 decoder.

You might give this a shot to see if it solves your problem:
http://phpjs.org/

Calculate colordifference

Is it possible to check if a certain hexadecimal color is closer to FFF or 000 based on a defined 'center-value'?
I want to check if a color lies closer to #FFF or #000 based on #888.
So if I check for #EFEFEF it should return #FFF and if I try #878787 it should return #000.
How can this be achieved? I'm not sure what to search for on Google...
Thanks in advance

You could convert the colours to numbers:
$color_num = hexdec(substr($color, 1)); // skip the initial #
Then compare them to either 0x0 or 0xffffff.
You could also break them down into R, G and B and make three comparisons; then average them? Not sure how precise you want this thing :)

The easiest way to solve your problem is to calculate the distance between colors using their greyscale values (there are other ways, but this is simple). So something like:
// returns a distance between two colors by comparing each component
// using average of the RGB components, eg. a grayscale value
function color_distance($a, $b)
{
$decA = hexdec(substr($a, 1));
$decB = hexdec(substr($a, 1));
$avgA = (($decA & 0xFF) + (($decA >> 8) & 0xFF) + (($decA >> 16) & 0xFF)) / 3;
$avgB = (($decB & 0xFF) + (($decB >> 8) & 0xFF) + (($decB >> 16) & 0xFF)) / 3;
return abs($avgA - $avgB);
}
// I am going to leave the naming of the function to you ;)
// How this works is that it'll return $minColor if $color is closer to $refColorMin
// and $maxColor if $color is closer to $refColorMax
// all colors should be passed in format #RRGGBB
function foo($color, $refColorMin, $refColorMax, $minColor, $maxColor)
{
$distMin = color_distance($color, $refColorMin);
$distMax = color_distance($color, $refColorMax);
return ($distMin < $distMax) ? $minColor : $maxColor;
}
// Example usage to answer your original question:
$colorA = foo('#EFEFEF', '#888888', '#FFFFFF', '#000000', '#FFFFFF');
$colorA = foo('#898989', '#888888', '#FFFFFF', '#000000', '#FFFFFF');
// Check the values
var_dump($colorA, $colorB);
The output is:
string(7) "#FFFFFF"
string(7) "#000000"

You could do something like the following:
function hex2rgb($hex) {
$hex = str_replace("#", "", $hex);
if(strlen($hex) == 3) {
$r = hexdec(substr($hex,0,1).substr($hex,0,1));
$g = hexdec(substr($hex,1,1).substr($hex,1,1));
$b = hexdec(substr($hex,2,1).substr($hex,2,1));
} else {
$r = hexdec(substr($hex,0,2));
$g = hexdec(substr($hex,2,2));
$b = hexdec(substr($hex,4,2));
}
$rgb = array($r, $g, $b);
//return implode(",", $rgb); // returns the rgb values separated by commas
return $rgb; // returns an array with the rgb values
}
$rgb = hex2rgb("#cc0");
From that you could take the values of $rgb and see if their values, on average, area greater than or less than 122.5. If its greater than 122.5 you'd be closer to #FFFFFF, lower than 122.5 you'd be closer to #000000.

Thanks everybody for the help!
In my startpost I made a little typo as mentioned by oezi. For me the solution was a small modification to the accepted answer (by reko_t):
function color_distance($sColor1, $sColor2)
{
$decA = hexdec(substr($sColor1, 1));
$decB = hexdec(substr($sColor2, 1));
$avgA = (($decA & 0xFF) + (($decA >> 8) & 0xFF) + (($decA >> 16) & 0xFF)) / 3;
$avgB = (($decB & 0xFF) + (($decB >> 8) & 0xFF) + (($decB >> 16) & 0xFF)) / 3;
return abs($avgA - $avgB);
}
function determine_color($sInputColor, $sRefColor, $sMinColor, $sMaxColor)
{
$distRef = color_distance($sInputColor, $sRefColor);
$distMin = color_distance($sInputColor, $sMinColor);
$distMax = color_distance($sInputColor, $sMaxColor);
return $distMax - $distRef < $distMin - $distRef ? $sMinColor : $sMaxColor;
}
I needed this function to determine text-color on a background-color which can be set by some setting. Thanks! :) Credits to reko_t!

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How to decode numeric HTML entities in PHP - php

Related

PHP socket - Can someone explain me these functions?

Encoding/Unmasking Websocket binary data in PHP

Convert Int into 4 Byte String in PHP

Decode an base64 encoded string through php in javascript

Calculate colordifference

Categories

Resources