PHP escape unicode characters only

PHP escape unicode characters only - php

in Facebook validation documentation
Please note that we generate the signature using an escaped unicode
version of the payload, with lowercase hex digits. If you just
calculate against the decoded bytes, you will end up with a different
signature. For example, the string äöå should be escaped to
\u00e4\u00f6\u00e5.
I'm trying to make a unittest for the validation that I have, but I don't seem to be able to produce the signutre because I can't escape the payload. I've tried
mb_convert_encoding($payload, 'unicode')
But this encodes all the payload, and not just the needed string, as Facebook does.
My full code:
// on the unittest
$content = file_get_contents(__DIR__.'/../Responses/whatsapp_webhook.json');
// trim whitespace at the end of the file
$content = trim($content);
$secret = config('externals.meta.config.app_secret');
$signature = hash_hmac(
'sha256',
mb_convert_encoding($content, 'unicode'),
$secret
);
$response = $this->postJson(
route('whatsapp.webhook.message'),
json_decode($content, true),
[
'CONTENT_TYPE' => 'text/plain',
'X-Hub-Signature-256' => $signature,
]
);
$response->assertOk();
// on the request validation
/**
* #var string $signature
*/
$signature = $request->header('X-Hub-Signature-256');
if (!$signature) {
abort(Response::HTTP_FORBIDDEN);
}
$signature = Str::after($signature, '=');
$secret = config('externals.meta.config.app_secret');
/**
* #var string $content
*/
$content = $request->getContent();
$payloadSignature = hash_hmac(
'sha256',
$content,
$secret
);
if ($payloadSignature !== $signature) {
abort(Response::HTTP_FORBIDDEN);
}

For one, mb_convert_encoding($payload, 'unicode') converts the input to UTF-16BE, not UTF-8. You would want mb_convert_encoding($payload, 'UTF-8').
For two, using mb_convert_encoding() without specifying the source encoding causes the function to assume that the input is using the system's default encoding, which is frequently incorrect and will cause your data to be mangled. You would want mb_convert_encoding($payload, 'UTF-8', $source_encoding). [Also, you cannot reliably detect string encoding, you need to know what it is.]
For three, mb_convert_encoding() is entirely the wrong function to use to apply the desired escape sequences to the data. [and good lord are the google results for "php escape UTF-8" awful]
Unfortunately, PHP doesn't have a UTF-8 escape function that isn't baked into another function, but it's not terribly difficult to write in userland.
function utf8_escape($input) {
$output = '';
for( $i=0,$l=mb_strlen($input); $i<$l; ++$i ) {
$cur = mb_substr($input, $i, 1);
if( strlen($cur) === 1 ) {
$output .= $cur;
} else {
$output .= sprintf('\\u%04x', mb_ord($cur));
}
}
return $output;
}
$in = "asdf äöå";
var_dump(
utf8_escape($in),
);
Output:
string(23) "asdf \u00e4\u00f6\u00e5"

Instead of trying to re-assemble the payload from the already decoded JSON, you should take the data directly as you received it.
Facebook sends Content-Type: application/json, which means PHP will not populate $_POST to begin with - but you can read the entire request body using file_get_contents('php://input').
Try and calculate the signature based on that, that should work without having to deal with any hassles of encoding & escaping.

Related

Convert String into ASCII Byte Array then base64_encode

I'm trying to convert a combined string into a ASCII Byte Array to pass it into a server as an http header. Been trying numerous ways like unpack, splitting strings and doing a loop to convert each. But the server I am passing the converted string still ignores it. Not so much of a support from the API I'm using so maybe anyone here can help if I'm doing anything wrong.
$billerId = '9999986379225246';
$authToken = '16dfe8d7-889b-4380-925f-9c2c6ea4d930';
$auth = $billerId . ':' . $authToken;
//this results in error
$auth_key_byte_array = unpack("H*",$auth);
//this also results in error
$auth_key_byte_array = hash_hmac("sha256", $auth, false);
//even tried a loop function
function create_byte_array($string){
$array = array();
foreach(str_split($string) as $char){
array_push($array, sprintf("%02X", ord($char)));
}
return implode('', $array);
}
$auth_key_byte_array = create_byte_array($auth);

How to decode UTF-8 only if the string has not been decoded? [duplicate]

I'm reading out lots of texts from various RSS feeds and inserting them into my database.
Of course, there are several different character encodings used in the feeds, e.g. UTF-8 and ISO 8859-1.
Unfortunately, there are sometimes problems with the encodings of the texts. Example:
The "ß" in "Fußball" should look like this in my database: "ÂŸ". If it is a "ÂŸ", it is displayed correctly.
Sometimes, the "ß" in "Fußball" looks like this in my database: "ÃƒÂŸ". Then it is displayed wrongly, of course.
In other cases, the "ß" is saved as a "ß" - so without any change. Then it is also displayed wrongly.
What can I do to avoid the cases 2 and 3?
How can I make everything the same encoding, preferably UTF-8? When must I use utf8_encode(), when must I use utf8_decode() (it's clear what the effect is but when must I use the functions?) and when must I do nothing with the input?
How do I make everything the same encoding? Perhaps with the function mb_detect_encoding()? Can I write a function for this? So my problems are:
How do I find out what encoding the text uses?
How do I convert it to UTF-8 - whatever the old encoding is?
Would a function like this work?
function correct_encoding($text) {
$current_encoding = mb_detect_encoding($text, 'auto');
$text = iconv($current_encoding, 'UTF-8', $text);
return $text;
}
I've tested it, but it doesn't work. What's wrong with it?

If you apply utf8_encode() to an already UTF-8 string, it will return garbled UTF-8 output.
I made a function that addresses all this issues. It´s called Encoding::toUTF8().
You don't need to know what the encoding of your strings is. It can be Latin1 (ISO 8859-1), Windows-1252 or UTF-8, or the string can have a mix of them. Encoding::toUTF8() will convert everything to UTF-8.
I did it because a service was giving me a feed of data all messed up, mixing UTF-8 and Latin1 in the same string.
Usage:
require_once('Encoding.php');
use \ForceUTF8\Encoding; // It's namespaced now.
$utf8_string = Encoding::toUTF8($utf8_or_latin1_or_mixed_string);
$latin1_string = Encoding::toLatin1($utf8_or_latin1_or_mixed_string);
Download:
https://github.com/neitanod/forceutf8
I've included another function, Encoding::fixUFT8(), which will fix every UTF-8 string that looks garbled.
Usage:
require_once('Encoding.php');
use \ForceUTF8\Encoding; // It's namespaced now.
$utf8_string = Encoding::fixUTF8($garbled_utf8_string);
Examples:
echo Encoding::fixUTF8("FÃ©dÃ©ration Camerounaise de Football");
echo Encoding::fixUTF8("FÃÂ©dÃÂ©ration Camerounaise de Football");
echo Encoding::fixUTF8("FÃÂÃÂ©dÃÂÃÂ©ration Camerounaise de Football");
echo Encoding::fixUTF8("FÃÂ©dération Camerounaise de Football");
will output:
Fédération Camerounaise de Football
Fédération Camerounaise de Football
Fédération Camerounaise de Football
Fédération Camerounaise de Football
I've transformed the function (forceUTF8) into a family of static functions on a class called Encoding. The new function is Encoding::toUTF8().

You first have to detect what encoding has been used. As you’re parsing RSS feeds (probably via HTTP), you should read the encoding from the charset parameter of the Content-Type HTTP header field. If it is not present, read the encoding from the encoding attribute of the XML processing instruction. If that’s missing too, use UTF-8 as defined in the specification.
Here is what I probably would do:
I’d use cURL to send and fetch the response. That allows you to set specific header fields and fetch the response header as well. After fetching the response, you have to parse the HTTP response and split it into header and body. The header should then contain the Content-Type header field that contains the MIME type and (hopefully) the charset parameter with the encoding/charset too. If not, we’ll analyse the XML PI for the presence of the encoding attribute and get the encoding from there. If that’s also missing, the XML specs define to use UTF-8 as encoding.
$url = 'http://www.lr-online.de/storage/rss/rss/sport.xml';
$accept = array(
'type' => array('application/rss+xml', 'application/xml', 'application/rdf+xml', 'text/xml'),
'charset' => array_diff(mb_list_encodings(), array('pass', 'auto', 'wchar', 'byte2be', 'byte2le', 'byte4be', 'byte4le', 'BASE64', 'UUENCODE', 'HTML-ENTITIES', 'Quoted-Printable', '7bit', '8bit'))
);
$header = array(
'Accept: '.implode(', ', $accept['type']),
'Accept-Charset: '.implode(', ', $accept['charset']),
);
$encoding = null;
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_HEADER, true);
curl_setopt($curl, CURLOPT_HTTPHEADER, $header);
$response = curl_exec($curl);
if (!$response) {
// error fetching the response
} else {
$offset = strpos($response, "\r\n\r\n");
$header = substr($response, 0, $offset);
if (!$header || !preg_match('/^Content-Type:\s+([^;]+)(?:;\s*charset=(.*))?/im', $header, $match)) {
// error parsing the response
} else {
if (!in_array(strtolower($match[1]), array_map('strtolower', $accept['type']))) {
// type not accepted
}
$encoding = trim($match[2], '"\'');
}
if (!$encoding) {
$body = substr($response, $offset + 4);
if (preg_match('/^<\?xml\s+version=(?:"[^"]*"|\'[^\']*\')\s+encoding=("[^"]*"|\'[^\']*\')/s', $body, $match)) {
$encoding = trim($match[1], '"\'');
}
}
if (!$encoding) {
$encoding = 'utf-8';
} else {
if (!in_array($encoding, array_map('strtolower', $accept['charset']))) {
// encoding not accepted
}
if ($encoding != 'utf-8') {
$body = mb_convert_encoding($body, 'utf-8', $encoding);
}
}
$simpleXML = simplexml_load_string($body, null, LIBXML_NOERROR);
if (!$simpleXML) {
// parse error
} else {
echo $simpleXML->asXML();
}
}

Detecting the encoding is hard.
mb_detect_encoding works by guessing, based on a number of candidates that you pass it. In some encodings, certain byte-sequences are invalid, an therefore it can distinguish between various candidates. Unfortunately, there are a lot of encodings, where the same bytes are valid (but different). In these cases, there is no way to determine the encoding; You can implement your own logic to make guesses in these cases. For example, data coming from a Japanese site might be more likely to have a Japanese encoding.
As long as you only deal with Western European languages, the three major encodings to consider are utf-8, iso-8859-1 and cp-1252. Since these are defaults for many platforms, they are also the most likely to be reported wrongly about. Eg. if people use different encodings, they are likely to be frank about it, since else their software would break very often. Therefore, a good strategy is to trust the provider, unless the encoding is reported as one of those three. You should still doublecheck that it is indeed valid, using mb_check_encoding (note that valid is not the same as being - the same input may be valid for many encodings). If it is one of those, you can then use mb_detect_encoding to distinguish between them. Luckily that is fairly deterministic; You just need to use the proper detect-sequence, which is UTF-8,ISO-8859-1,WINDOWS-1252.
Once you've detected the encoding you need to convert it to your internal representation (UTF-8 is the only sane choice). The function utf8_encode transforms ISO-8859-1 to UTF-8, so it can only used for that particular input type. For other encodings, use mb_convert_encoding.

This cheatsheet lists some common caveats related to UTF-8 handling in PHP:
http://developer.loftdigital.com/blog/php-utf-8-cheatsheet
This function detecting multibyte characters in a string might also prove helpful (source):
function detectUTF8($string)
{
return preg_match('%(?:
[\xC2-\xDF][\x80-\xBF] # non-overlong 2-byte
|\xE0[\xA0-\xBF][\x80-\xBF] # excluding overlongs
|[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2} # straight 3-byte
|\xED[\x80-\x9F][\x80-\xBF] # excluding surrogates
|\xF0[\x90-\xBF][\x80-\xBF]{2} # planes 1-3
|[\xF1-\xF3][\x80-\xBF]{3} # planes 4-15
|\xF4[\x80-\x8F][\x80-\xBF]{2} # plane 16
)+%xs',
$string);
}

A little heads up. You said that the "ß" should be displayed as "ÂŸ" in your database.
This is probably because you're using a database with Latin-1 character encoding or possibly your PHP-MySQL connection is set wrong, this is, P believes your MySQL is set to use UTF-8, so it sends data as UTF-8, but your MySQL believes PHP is sending data encoded as ISO 8859-1, so it may once again try to encode your sent data as UTF-8, causing this kind of trouble.
Take a look at mysql_set_charset. It may help you.

Your encoding looks like you encoded into UTF-8 twice; that is, from some other encoding, into UTF-8, and again into UTF-8. As if you had ISO 8859-1, converted from ISO 8859-1 to UTF-8, and treated the new string as ISO 8859-1 for another conversion into UTF-8.
Here's some pseudocode of what you did:
$inputstring = getFromUser();
$utf8string = iconv($current_encoding, 'utf-8', $inputstring);
$flawedstring = iconv($current_encoding, 'utf-8', $utf8string);
You should try:
detect encoding using mb_detect_encoding() or whatever you like to use
if it's UTF-8, convert into ISO 8859-1, and repeat step 1
finally, convert back into UTF-8
That is presuming that in the "middle" conversion you used ISO 8859-1. If you used Windows-1252, then convert into Windows-1252 (latin1). The original source encoding is not important; the one you used in flawed, second conversion is.
This is my guess at what happened; there's very little else you could have done to get four bytes in place of one extended ASCII byte.
The German language also uses ISO 8859-2 and Windows-1250 (Latin-2).

A really nice way to implement an isUTF8-function can be found on php.net:
function isUTF8($string) {
return (utf8_encode(utf8_decode($string)) == $string);
}

The interesting thing about mb_detect_encoding and mb_convert_encoding is that the order of the encodings you suggest does matter:
// $input is actually UTF-8
mb_detect_encoding($input, "UTF-8", "ISO-8859-9, UTF-8");
// ISO-8859-9 (WRONG!)
mb_detect_encoding($input, "UTF-8", "UTF-8, ISO-8859-9");
// UTF-8 (OK)
So you might want to use a specific order when specifying expected encodings. Still, keep in mind that this is not foolproof.

Working out the character encoding of RSS feeds seems to be complicated. Even normal web pages often omit, or lie about, their encoding.
So you could try to use the correct way to detect the encoding and then fall back to some form of auto-detection (guessing).

You need to test the character set on input since responses can come coded with different encodings.
I force all content been sent into UTF-8 by doing detection and translation using the following function:
function fixRequestCharset()
{
$ref = array(&$_GET, &$_POST, &$_REQUEST);
foreach ($ref as &$var)
{
foreach ($var as $key => $val)
{
$encoding = mb_detect_encoding($var[$key], mb_detect_order(), true);
if (!$encoding)
continue;
if (strcasecmp($encoding, 'UTF-8') != 0)
{
$encoding = iconv($encoding, 'UTF-8', $var[$key]);
if ($encoding === false)
continue;
$var[$key] = $encoding;
}
}
}
}
That routine will turn all PHP variables that come from the remote host into UTF-8.
Or ignore the value if the encoding could not be detected or converted.
You can customize it to your needs.
Just invoke it before using the variables.

mb_detect_encoding:
echo mb_detect_encoding($str, "auto");
Or
echo mb_detect_encoding($str, "UTF-8, ASCII, ISO-8859-1");
I really don't know what the results are, but I'd suggest you just take some of your feeds with different encodings and try if mb_detect_encoding works or not.
auto is short for "ASCII,JIS,UTF-8,EUC-JP,SJIS". It returns the detected charset, which you can use to convert the string to UTF-8 with iconv.
<?php
function convertToUTF8($str) {
$enc = mb_detect_encoding($str);
if ($enc && $enc != 'UTF-8') {
return iconv($enc, 'UTF-8', $str);
} else {
return $str;
}
}
?>
I haven't tested it, so no guarantee. And maybe there's a simpler way.

I know this is an older question, but I figure a useful answer never hurts. I was having issues with my encoding between a desktop application, SQLite, and GET/POST variables. Some would be in UTF-8, some would be in ASCII, and basically everything would get screwed up when foreign characters got involved.
Here is my solution. It scrubs your GET/POST/REQUEST (I omitted cookies, but you could add them if desired) on each page load before processing. It works well in a header. PHP will throw warnings if it can't detect the source encoding automatically, so these warnings are suppressed with #'s.
//Convert everything in our vars to UTF-8 for playing nice with the database...
//Use some auto detection here to help us not double-encode...
//Suppress possible warnings with #'s for when encoding cannot be detected
try
{
$process = array(&$_GET, &$_POST, &$_REQUEST);
while (list($key, $val) = each($process)) {
foreach ($val as $k => $v) {
unset($process[$key][$k]);
if (is_array($v)) {
$process[$key][#mb_convert_encoding($k,'UTF-8','auto')] = $v;
$process[] = &$process[$key][#mb_convert_encoding($k,'UTF-8','auto')];
} else {
$process[$key][#mb_convert_encoding($k,'UTF-8','auto')] = #mb_convert_encoding($v,'UTF-8','auto');
}
}
}
unset($process);
}
catch(Exception $ex){}

It's simple: when you get something that's not UTF-8, you must encode that into UTF-8.
So, when you're fetching a certain feed that's ISO 8859-1 parse it through utf8_encode.
However, if you're fetching an UTF-8 feed, you don't need to do anything.

harpax' answer worked for me. In my case, this is good enough:
if (isUTF8($str)) {
echo $str;
}
else
{
echo iconv("ISO-8859-1", "UTF-8//TRANSLIT", $str);
}

I was checking for solutions to encoding since ages, and this page is probably the conclusion of years of search! I tested some of the suggestions you mentioned and here are my notes:
This is my test string:
this is a "wròng wrìtten" string bùt I nèed to pù 'sòme' special
chàrs to see thèm, convertèd by fùnctìon!! & that's it!
I do an INSERT to save this string on a database in a field that is set as utf8_general_ci
The character set of my page is UTF-8.
If I do an INSERT just like that, in my database, I have some characters probably coming from Mars...
So I need to convert them into some "sane" UTF-8. I tried utf8_encode(), but still aliens chars were invading my database...
So I tried to use the function forceUTF8 posted on number 8, but in the database the string saved looks like this:
this is a "wrÃ²ng wrÃ¬tten" string bÃ¹t I nÃ¨ed to pÃ¹ 'sÃ²me' special
chÃ rs to see thÃ¨m, convertÃ¨d by fÃ¹nctÃ¬on!! & that's it!
So collecting some more information on this page and merging them with other information on other pages I solved my problem with this solution:
$finallyIDidIt = mb_convert_encoding(
$string,
mysql_client_encoding($resourceID),
mb_detect_encoding($string)
);
Now in my database I have my string with correct encoding.
NOTE:
Only note to take care of is in function mysql_client_encoding!
You need to be connected to the database, because this function wants a resource ID as a parameter.
But well, I just do that re-encoding before my INSERT so for me it is not a problem.

After sorting out your PHP scripts, don't forget to tell MySQL what charset you are passing and would like to receive.
Example: set the character to UTF-8
Passing UTF-8 data to a Latin 1 table in a Latin 1 I/O session gives those nasty birdfeets. I see this every other day in OsCommerce shops. Back and fourth it might seem right. But phpMyAdmin will show the truth. By telling MySQL what charset you are passing, it will handle the conversion of MySQL data for you.
How to recover existing scrambled MySQL data is another question. :)

Get the encoding from headers and convert it to UTF-8.
$post_url = 'http://website.domain';
/// Get headers ///////////////////////////////////////////////
function get_headers_curl($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 15);
$r = curl_exec($ch);
return $r;
}
$the_header = get_headers_curl($post_url);
/// Check for redirect ////////////////////////////////////////
if (preg_match("/Location:/i", $the_header)) {
$arr = explode('Location:', $the_header);
$location = $arr[1];
$location = explode(chr(10), $location);
$location = $location[0];
$the_header = get_headers_curl(trim($location));
}
/// Get charset ///////////////////////////////////////////////
if (preg_match("/charset=/i", $the_header)) {
$arr = explode('charset=', $the_header);
$charset = $arr[1];
$charset = explode(chr(10), $charset);
$charset = $charset[0];
}
///////////////////////////////////////////////////////////////////
// echo $charset;
if($charset && $charset != 'UTF-8') {
$html = iconv($charset, "UTF-8", $html);
}

ÂŸ is Mojibake for ß. In your database, you may have one of the following hex values (use SELECT HEX(col)...) to find out):
DF if the column is "latin1",
C39F if the column is utf8 -- OR -- it is latin1, but "double-encoded"
C383C5B8 if double-encoded into a utf8 column
You should not use any encoding/decoding functions in PHP; instead, you should set up the database and the connection to it correctly.
If MySQL is involved, see: Trouble with UTF-8 characters; what I see is not what I stored

if(!mb_check_encoding($str)){
$str = iconv("windows-1251", "UTF-8", $str);
}
It helped for me

Try without 'auto'
That is:
mb_detect_encoding($text)
instead of:
mb_detect_encoding($text, 'auto')
More information can be found here: mb_detect_encoding

Try to use this... every text that is not UTF-8 will be translated.
function is_utf8($str) {
return (bool) preg_match('//u', $str);
}
$myString = "Fußball";
if(!is_utf8($myString)){
$myString = utf8_encode($myString);
}
// or 1 line version ;)
$myString = !is_utf8($myString) ? utf8_encode($myString) : trim($myString);

I found a solution at http://deer.org.ua/2009/10/06/1/:
class Encoding
{
/**
* http://deer.org.ua/2009/10/06/1/
* #param $string
* #return null
*/
public static function detect_encoding($string)
{
static $list = ['utf-8', 'windows-1251'];
foreach ($list as $item) {
try {
$sample = iconv($item, $item, $string);
} catch (\Exception $e) {
continue;
}
if (md5($sample) == md5($string)) {
return $item;
}
}
return null;
}
}
$content = file_get_contents($file['tmp_name']);
$encoding = Encoding::detect_encoding($content);
if ($encoding != 'utf-8') {
$result = iconv($encoding, 'utf-8', $content);
} else {
$result = $content;
}
I think that # is a bad decision and made some changes to the solution from deer.org.ua.

When you try to handle multi languages, like Japanese and Korean, you might get in trouble.
mb_convert_encoding with the 'auto' parameter doesn't work well. Setting mb_detect_order('ASCII,UTF-8,JIS,EUC-JP,SJIS,EUC-KR,UHC') doesn't help since it will detect EUC-* wrongly.
I concluded that as long as input strings comes from HTML, it should use 'charset' in a meta element. I use Simple HTML DOM Parser because it supports invalid HTML.
The below snippet extracts the title element from a web page. If you would like to convert the entire page, then you may want to remove some lines.
<?php
require_once 'simple_html_dom.php';
echo convert_title_to_utf8(file_get_contents($argv[1])), PHP_EOL;
function convert_title_to_utf8($contents)
{
$dom = str_get_html($contents);
$title = $dom->find('title', 0);
if (empty($title)) {
return null;
}
$title = $title->plaintext;
$metas = $dom->find('meta');
$charset = 'auto';
foreach ($metas as $meta) {
if (!empty($meta->charset)) { // HTML5
$charset = $meta->charset;
} else if (preg_match('#charset=(.+)#', $meta->content, $match)) {
$charset = $match[1];
}
}
if (!in_array(strtolower($charset), array_map('strtolower', mb_list_encodings()))) {
$charset = 'auto';
}
return mb_convert_encoding($title, 'UTF-8', $charset);
}

This version is for the German language, but you can modify the $CHARSETS and the $TESTCHARS.
class CharsetDetector
{
private static $CHARSETS = array(
"ISO_8859-1",
"ISO_8859-15",
"CP850"
);
private static $TESTCHARS = array(
"€",
"ä",
"Ä",
"ö",
"Ö",
"ü",
"Ü",
"ß"
);
public static function convert($string)
{
return self::__iconv($string, self::getCharset($string));
}
public static function getCharset($string)
{
$normalized = self::__normalize($string);
if(!strlen($normalized))
return "UTF-8";
$best = "UTF-8";
$charcountbest = 0;
foreach (self::$CHARSETS as $charset)
{
$str = self::__iconv($normalized, $charset);
$charcount = 0;
$stop = mb_strlen($str, "UTF-8");
for($idx = 0; $idx < $stop; $idx++)
{
$char = mb_substr($str, $idx, 1, "UTF-8");
foreach (self::$TESTCHARS as $testchar)
{
if($char == $testchar)
{
$charcount++;
break;
}
}
}
if($charcount > $charcountbest)
{
$charcountbest = $charcount;
$best = $charset;
}
//echo $text . "<br />";
}
return $best;
}
private static function __normalize($str)
{
$len = strlen($str);
$ret = "";
for($i = 0; $i < $len; $i++)
{
$c = ord($str[$i]);
if ($c > 128) {
if (($c > 247))
$ret .= $str[$i];
elseif
($c > 239) $bytes = 4;
elseif
($c > 223) $bytes = 3;
elseif
($c > 191) $bytes = 2;
else
$ret .= $str[$i];
if (($i + $bytes) > $len)
$ret .= $str[$i];
$ret2 = $str[$i];
while ($bytes > 1)
{
$i++;
$b = ord($str[$i]);
if ($b < 128 || $b > 191)
{
$ret .= $ret2;
$ret2 = "";
$i += $bytes-1;
$bytes = 1;
break;
}
else
$ret2 .= $str[$i];
$bytes--;
}
}
}
return $ret;
}
private static function __iconv($string, $charset)
{
return iconv ($charset, "UTF-8", $string);
}
}

I had the same issue with phpQuery (ISO-8859-1 instead of UTF-8) and this hack helped me:
$html = '<?xml version="1.0" encoding="UTF-8" ?>' . $html;
mb_internal_encoding('UTF-8'), phpQuery::newDocumentHTML($html, 'utf-8'), mbstring.internal_encoding and other manipulations didn't take any effect.

For Chinese characters, it is common to be encoded in the GBK encoding. In addition, when tested, the most voted answer doesn't work. Here is a simple fix that makes it work as well:
function toUTF8($raw) {
try{
return mb_convert_encoding($raw, "UTF-8", "auto");
}catch(\Exception $e){
return mb_convert_encoding($raw, "UTF-8", "GBK");
}
}
Remark: This solution was written in 2017 and should fix problems for PHP in those days. I have not tested whether latest PHP already understands auto correctly.

HMAC-SHA-256 in PHP

I have to build an authorization hash from this string:
kki98hkl-u5d0-w96i-62dp-xpmr6xlvfnjz:20151110171858:b2c13532-3416-47d9-8592-a541c208f755:hKSeRD98BHngrNa51Q2IgAXtoZ8oYebgY4vQHEYjlmzN9KSbAVTRvQkUPsjOGu4F
This secret is used for a HMAC hash function:
LRH9CAkNs-zoU3hxHbrtY0CUUcmqzibPeN7x6-vwNWQ=
The authorization hash I have to generate is this:
P-WgZ8CqV51aI-3TncZj5CpSZh98PjZTYxrvxkmQYmI=
There are some things to take care of:
The signature have to be built with HMAC-SHA-256 as specified in RFC 2104.
The signature have to be encoded with Base64 URL-compatible as specified in RFC 4648 Section 5 (Safe alphabet).
There is also some pseudo-code given for the generation:
Signatur(Request) = new String(encodeBase64URLCompatible(HMAC-SHA-256(getBytes(Z, "UTF-8"), decodeBase64URLCompatible(getBytes(S, "UTF-8")))), "UTF-8")
I tried various things in PHP but have not found the correct algorithm yet. This is the code I have now:
if(!function_exists('base64url_encode')){
function base64url_encode($data) {
$data = str_replace(array('+', '/'), array('-', '_'), base64_encode($data));
return $data;
}
}
$str = "kki98hkl-u5d0-w96i-62dp-xpmr6xlvfnjz:20151110171858:b2c13532-3416-47d9-8592-a541c208f755:hKSeRD98BHngrNa51Q2IgAXtoZ8oYebgY4vQHEYjlmzN9KSbAVTRvQkUPsjOGu4F";
$sec = "LRH9CAkNs-zoU3hxHbrtY0CUUcmqzibPeN7x6-vwNWQ=";
$signature = mhash(MHASH_SHA256, $str, $sec);
$signature = base64url_encode($signature);
if($signature != "P-WgZ8CqV51aI-3TncZj5CpSZh98PjZTYxrvxkmQYmI=")
echo "wrong: $signature";
else
echo "correct";
It gives this signature:
K9lw3V-k5gOedmVwmO5vC7cOn82JSEXsNguozCAOU2c=
As you can see, the length of 44 characters is correct. Please help me with finding the mistake, this simple problem takes me hours yet and there is no solution.

There's a couple of things to notice:
Your key is base64-encoded. You have to decode it before you could use it with php functions. That's the most important thing you have missed.
Mhash is obsoleted by Hash extension.
You want output to be encoded in a custom fashion, so it follows that you need raw output from hmac function (php, by default, will hex-encode it).
So, using hash extension this becomes:
$key = "LRH9CAkNs-zoU3hxHbrtY0CUUcmqzibPeN7x6-vwNWQ=";
$str = "kki98hkl-u5d0-w96i-62dp-xpmr6xlvfnjz:20151110171858:b2c13532-3416-47d9-8592-a541c208f755:hKSeRD98BHngrNa51Q2IgAXtoZ8oYebgY4vQHEYjlmzN9KSbAVTRvQkUPsjOGu4F";
function encode($data) {
return str_replace(['+', '/'], ['-', '_'], base64_encode($data));
}
function decode($data) {
return base64_decode(str_replace(['-', '_'], ['+', '/'], $data));
}
$binaryKey = decode($key);
var_dump(encode(hash_hmac("sha256", $str, $binaryKey, true)));
Outputs:
string(44) "P-WgZ8CqV51aI-3TncZj5CpSZh98PjZTYxrvxkmQYmI="

Simply use hash_hmac() function available in PHP.
Example :
hash_hmac('sha256', $string, $secret);
Doc here : http://php.net/manual/fr/function.hash-hmac.php

Bigcommerce - Fail to verify the load callbacks

hello fellow developers,
I’m facing an issue with the load callback (and the uninstall callback by extension).
I’m trying to verify the requests authenticity following the algorithm described in the documentation. https://developer.bigcommerce.com/apps/load#signed-payload
I am able to decode the json string and the data is correct, but the signatures never match. I made sure to use the right client secret and tried out different encoding/decoding scenarios with no luck.
An other concern is with the snippet of code (PHP) they provide in example (and in their sample app). They seem to return null when the signatures match and the decoded data when they don’t… (try secureCompare())
Meaning that the security test would pass every time, since in all my attempts the signatures didn’t match.
Am I missing something here ?
Edit: Here is the example in the doc. I can't really give you sample data as the client secret is to remain secret...
function verify($signedRequest, $clientSecret)
{
list($payload, $encodedSignature) = explode('.', $signedRequest, 2);
// decode the data
$signature = base64_decode($encodedSignature);
$data = json_decode(base64_decode($payload), true);
// confirm the signature
$expectedSignature = hash_hmac('sha256', $payload, $clientSecret, $raw = true);
if (secureCompare($signature, $expectedSignature)) {
error_log('Bad Signed JSON signature!');
return null;
}
return $data;
}
function secureCompare($str1, $str2)
{
$res = $str1 ^ $str2;
$ret = strlen($str1) ^ strlen($str2); //not the same length, then fail ($ret != 0)
for($i = strlen($res) - 1; $i >= 0; $i--) {
$ret += ord($res[$i]);
}
return !$ret;
}

You're not missing anything, and it's not a clock sync issue - the 28 lines of sample code provided both here and here has some pretty critical flaws:
The sample code does a hash_hmac of the raw base64-encoded JSON, instead of the base64-decoded JSON. (The hash provided to you by the BigCommerce API is really a hash of the base64-decoded JSON).
Since hash_hmac is called with $raw=true, this means the two strings will always be vastly different: one is raw binary, and the other is hexits.
Bad check of secureCompare logic. The if (secureCompare... part of the verify function expects opposite behavior from the secureCompare function. If the secureCompare function returns true when the strings match, why are we calling error_log?
Put all three of these issues together, and you end up with code that appears to work, but is actually silently failing. If you use the sample code, you're likely allowing any and all "signed" requests to be processed by your application!
Here's my corrected implementation of the verify function:
<?php
function verifySignedRequest($signedRequest, $clientSecret)
{
list($encodedData, $encodedSignature) = explode('.', $signedRequest, 2);
// decode the data
$signature = base64_decode($encodedSignature);
$jsonStr = base64_decode($encodedData);
$data = json_decode($jsonStr, true);
// confirm the signature
$expectedSignature = hash_hmac('sha256', $jsonStr, $clientSecret, $raw = false);
if (!hash_equals($expectedSignature, $signature)) {
error_log('Bad signed request from BigCommerce!');
return null;
}
return $data;
}

PHP: json_decode not working

This does not work:
$jsonDecode = json_decode($jsonData, TRUE);
However if I copy the string from $jsonData and put it inside the decode function manually it does work.
This works:
$jsonDecode = json_decode('{"id":"0","bid":"918","url":"http:\/\/www.google.com","md5":"6361fbfbee69f444c394f3d2fa062f79","time":"2014-06-02 14:20:21"}', TRUE);
I did output $jsonData copied it and put in like above in the decode function. Then it worked. However if I put $jsonData directly in the decode function it does not.
var_dump($jsonData) shows:
string(144) "{"id":"0","bid":"918","url":"http:\/\/www.google.com","md5":"6361fbfbee69f444c394f3d2fa062f79","time":"2014-06-02 14:20:21"}"
The $jsonData comes from a encrypted $_GET variable. To encrypt it I use this:
$key = "SOME KEY";
$iv_size = mcrypt_get_iv_size(MCRYPT_BLOWFISH, MCRYPT_MODE_ECB);
$iv = mcrypt_create_iv($iv_size, MCRYPT_RAND);
$enc = mcrypt_encrypt(MCRYPT_BLOWFISH, $key, $data, MCRYPT_MODE_ECB, $iv);
$iv = rawurlencode(base64_encode($iv));
$enc = rawurlencode(base64_encode($enc));
//To Decrypt
$iv = base64_decode(rawurldecode($_GET['i']));
$enc = base64_decode(rawurldecode($_GET['e']));
$data = mcrypt_decrypt(MCRYPT_BLOWFISH, $key, $enc, MCRYPT_MODE_ECB, $iv);

some time there is issue of html entities, for example \" it will represent like this \&quot, so you must need to parse the html entites to real text, that you can do using
html_entity_decode()
method of php.
$jsonData = stripslashes(html_entity_decode($jsonData));
$k=json_decode($jsonData,true);
print_r($k);

You have to use preg_replace for avoiding the null results from json_decode
here is the example code
$json_string = stripslashes(html_entity_decode($json_string));
$bookingdata = json_decode( preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $json_string), true );

Most likely you need to strip off the padding from your decrypted data. There are 124 visible characters in your string but var_dump reports 144. Which means 20 characters of padding needs to be removed (a series of "\0" bytes at the end of your string).
Probably that's 4 "\0" bytes at the end of a block + an empty 16-bytes block (to mark the end of the data).
How are you currently decrypting/encrypting your string?
Edit:
You need to add this to trim the zero bytes at the end of the string:
$jsonData = rtrim($jsonData, "\0");

Judging from the other comments, you could use,
$jsonDecode = json_decode(trim($jsonData), TRUE);

While moving on php 7.1 I encountered with json_decode error number 4 (json syntex error). None of the above solution on this page worked for me.
After doing some more searching i found solution at https://stackoverflow.com/a/15423899/1545384 and its working for me.
//Remove UTF8 Bom
function remove_utf8_bom($text)
{
$bom = pack('H*','EFBBBF');
$text = preg_replace("/^$bom/", '', $text);
return $text;
}

Be sure to set header to JSON
header('Content-type: application/json;');

str_replace("\t", " ", str_replace("\n", " ", $string))
because json_decode does not work with special characters. And no error will be displayed. Make sure you remove tab spaces and new lines.
Depending on the source you get your data, you might need also:
stripslashes(html_entity_decode($string))
Works for me:
<?php
$sql = <<<EOT
SELECT *
FROM `students`;
EOT;
$string = '{ "query" : "' . str_replace("\t", " ", str_replace("\n", " ", $sql)).'" }';
print_r(json_decode($string));
?>
output:
stdClass Object
(
[query] => SELECT * FROM `students`;
)

I had problem that json_decode did not work, solution was to change string encoding to utf-8. This is important in case you have non-latin characters.

Interestingly mcrypt_decrypt seem to add control characters other than \0 at the end of the resulting text because of its padding algorithm. Therefore instead of rtrim($jsonData, "\0")
it is recommended to use
preg_replace( "/\p{Cc}*$/u", "", $data)
on the result $data of mcrypt_decrypt. json_decode will work if all trailing control characters are removed. Pl refer to the comment by Peter Bailey at http://php.net/manual/en/function.mdecrypt-generic.php .

USE THIS CODE
<?php
$json = preg_replace('/[[:cntrl:]]/', '', $json_data);
$json_array = json_decode($json, true);
echo json_last_error();
echo json_last_error_msg();
print_r($json_array);
?>

Make sure your JSON is actually valid. For some reason I was convinced that this was valid JSON:
{ type: "block" }
While it is not. Point being, make sure to validate your string with a linter if you find json_decode not te be working.

Try the JSON validator.
The problem in my case was it used ' not ", so I had to replace it to make it working.

In notepad+ I changed encoding of json file on: "UTF-8 without BOM".
JSON started to work

TL;DR Be sure that your JSON not containing comments :)
I've taken a JSON structure from API reference and tested request using Postman. I've just copy-pasted the JSON and didn't pay attention that there was a comment inside it:
...
"payMethod": {
"type": "PBL" //or "CARD_TOKEN", "INSTALLMENTS"
},
...
Of course after deletion the comment json_decode() started working like a charm :)

Use following function:
If JSON_ERROR_UTF8 occurred :
$encoded = json_encode( utf_convert( $responseForJS ) );
Below function is used to encode Array data recursively
/* Use it for json_encode some corrupt UTF-8 chars
* useful for = malformed utf-8 characters possibly incorrectly encoded by json_encode
*/
function utf_convert( $mixed ) {
if (is_array($mixed)) {
foreach ($mixed as $key => $value) {
$mixed[$key] = utf8ize($value);
}
} elseif (is_string($mixed)) {
return mb_convert_encoding($mixed, "UTF-8", "UTF-8");
}
return $mixed;
}

Maybe it helps someone, check in your json string if you have any NULL values, json_decode will not work if a NULL is present as a value.
This super basic function may help you. I made the NULL in an array just in case I need to add more stuff in the future.
function jsonValueFix($json){
$json = str_replace( array('NULL'),'""',$json );
return $json;
}

I just used json_decode twice and it worked for me
$response = json_decode($apiResponse, true);
$response = json_decode($response, true);

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.