json_encode JSON_UNESCAPED_SLASHES not working and still escaping slashes - php

My autocomplete search feature is broken because of how characters with accents are stored in the mySQL.
For example, in the mySQL column, É is stored like \u00c9
In the PHP, which receives the user's input and calls on mySQL, É is \xc3\x89
json_encode() almost works perfectly to take "\xc3\x89" and convert it to "\u00c9"
$clean = json_encode($criteria, JSON_UNESCAPED_SLASHES);
Except it converts it to "\\u00c9" and so the characters don't match even though they are both É.
The option JSON_UNESCAPED_SLASHES isn't working. Why does it not keep another backslash from being added in front of the backslash?
How do I get this to work?
Edit: I just added the actual code and error log output below. code:
error_log("criteria vvvvvvvvvvvvv");
error_log($criteria);
$clean = json_encode($criteria, JSON_UNESCAPED_SLASHES);
error_log("json_encode(criteria) vvvvvvvvvvvvvv");
error_log($clean);
The error log:
[Fri Aug 23] criteria vvvvvvvvvvvvvvv,
[Fri Aug 23 \xc3\x89
[Fri Aug 23] json_encode(criteria) vvvvvvvvvvvvvvv,
[Fri Aug 23] "\\u00c9"

First JSON_UNESCAPED_SLASHES is used to prevent escaping "SLASHES" / as the name implies, don't expect it to prevent escaping backslashes \
echo json_encode('/'); // prints "\/"
echo json_encode('/', JSON_UNESCAPED_SLASHES); // prints "/"
echo json_encode("\\", JSON_UNESCAPED_SLASHES); // prints "\\"
//note on line 3 : the input is 1 backslash
As you can see it prevents escaping slashes only , not backslashes
Regarding your problem, if you ended up by using json_encode with something like \\u00c9 then you must have gave it this string as input \u00c9 , json_encode() did nothing wrong , you feed it with the string "\u00c9" not the Unicode character00c9 and it escaped the backslash at the string beginning.
Your $criteria variable is probably holding a JSON encoded string like "\u00c9" that has been encoded without using the JSON_UNESCAPED_UNICODE option, in other words don't use json_encode() twice.
check these examples, it could clear things out
echo json_encode("É", JSON_UNESCAPED_SLASHES) . "\n";
echo json_encode("\u00c9", JSON_UNESCAPED_SLASHES) . "\n";
echo json_encode("\xc3\x89", JSON_UNESCAPED_SLASHES) . "\n";
echo json_encode("/") . "\n";
echo json_encode("/", JSON_UNESCAPED_SLASHES) . "\n";
echo json_encode("\\", JSON_UNESCAPED_SLASHES) . "\n";
This outputs
"\u00c9"
"\\u00c9"
"\u00c9"
"\/"
"/"
"\\"
live demo

Related

How to encode data in hexadecimal?

The code:
#0c0f56415445532d413636373231343939
Is: VATES-A66721499 but encoded in hex.
I have made the following attempt:
$hex = bin2hex('VATES-A66721499');
echo $hex;
output:
56415445532d413636373231343939
But I need to get this other part:
#0c0f
I have tried the following but no result: #0c0f56415445532d413636373231343939
0c and 0f are unprintable control characters, and # is not part of hexadecimal encoding at all.
You can either:
'#' . bin2hex("\x0c\x0f" . 'VATES-A66721499')
Or:
'#0c0f' . bin2hex('VATES-A66721499')
Both will give the desired output.

Remove hidden midpoint character from json string

Sending an API request I get a json string as answer which seems to include a hidden character, a midpoint [·]. In my ATOM editor the character is not visible but trying to remove the character after the midpoint results in no visible action, which indicates that it then removed the midpoint.
The consequence of the problem that transforming the json string to a PHP array results in array having value NULL.
Question:
What is the most straightforward way to remove the hidden character?
Should I search for the character and simply cut that character out of the string?
I understand that potentially the best would be to find the root-cause of why the midpoint got there, but I cannot find the root-cause.
Investigation and outcomes:
Comparing [$body1] and [body2] in https://www.diffchecker.com/, it shows:
[$body1] ·'{"columns":"test"}'
[$body1] '{"columns":"test"}'
This test shows that I do in fact have a hidden character.
It might not work in your environment to test since the hidden character probably is removed by copy/paste.
$body1 = '{"columns":"test"}'; // Hidden character.
$body2 = '{"columns":"test"}'; // Removed hidden character.
$body3 = '{"columns":"test"}'; // Same as body2.
var_dump(json_decode($body2, true));
if($body1 == $body2) {
echo 'Content the same';
} else
echo 'Content differs';
Result:
Content differs
Checking string length of the body strings.
echo strlen($body1) . "\n";
echo strlen($body2) . "\n";
echo strlen($body3) . "\n";
Result:
21
18
18

PHP UTF-8 mb_convert_encode and Internet-Explorer

Since some days I read about Character-Encoding, I want to make all my Pages with UTF-8 for Compability. But I get stuck when I try to convert User-Input to UTF-8, this works on all Browsers, expect Internet-Explorer (like always).
I don't know whats wrong with my code, it seems fine to me.
I set the header with char encoding
I saved the file in UTF-8 (No BOM)
This happens only, if you try to access to the page via $_GET on the internet-Explorer myscript.php?c=äüöß
When I write down specialchars on my site, they would displayed correct.
This is my Code:
// User Input
$_GET['c'] = "äüöß"; // Access URL ?c=äüöß
//--------
header("Content-Type: text/html; charset=utf-8");
mb_internal_encoding('UTF-8');
$_GET = userToUtf8($_GET);
function userToUtf8($string) {
if(is_array($string)) {
$tmp = array();
foreach($string as $key => $value) {
$tmp[$key] = userToUtf8($value);
}
return $tmp;
}
return userDataUtf8($string);
}
function userDataUtf8($string) {
print("1: " . mb_detect_encoding($string) . "<br>"); // Shows: 1: UTF-8
$string = mb_convert_encoding($string, 'UTF-8', mb_detect_encoding($string)); // Convert non UTF-8 String to UTF-8
print("2: " . mb_detect_encoding($string) . "<br>"); // Shows: 2: ASCII
$string = preg_replace('/[\xF0-\xF7].../s', '', $string);
print("3: " . mb_detect_encoding($string) . "<br>"); // Shows: 3: ASCII
return $string;
}
echo $_GET['c']; // Shows nothing
echo mb_detect_encoding($_GET['c']); // ASCII
echo "äöü+#"; // Shows "äöü+#"
The most confusing Part is, that it shows me, that's converted from UTF-8 to ASCII... Can someone tell me why it doesn't show me the specialchars correctly, whats wrong here? Or is this a Bug on the Internet-Explorer?
Edit:
If I disable converting it says, it's all UTF-8 but the Characters won't show to me either... They are displayed like "????"....
Note: This happens ONLY in the Internet-Explorer!
Although I prefer using urlencoded strings in address bar but for your case you can try to encode $_GET['c'] to utf8. Eg.
$_GET['c'] = utf8_encode($_GET['c']);
An approach to display the characters using IE 11.0.18 which worked:
Retrieve the Unicode of your character : example for 'ü' = 'U+00FC'
According to this post, convert it to utf8 entity
Decode it using utf8_decode before dumping
The line of code illustrating the example with the 'ü' character is :
var_dump(utf8_decode(html_entity_decode(preg_replace("/U\+([0-9A-F]{4})/", "&#x\\1;", 'U+00FC'), ENT_NOQUOTES, 'UTF-8')));
To summarize: For displaying purposes, go from Unicode to UTF8 then decode it before displaying it.
Other resources:
a post to retrieve characters' unicode

utf (chinese char) covert to Hexadecimal format in php

I am passing my message to SMS api,
This is the documentation
Normally Unicode Messages are Arabic and Chinese Message, which are
defined by GSM Standards. Unicode messages are nothing but normal text
type messages but it has to be submitted in HEX form. To submit
Unicode messages following Url to be used.
I tried bin2hex() there is not working for the output.
$str = '人';
//$str = 'a';
$output = bin2hex($str);
echo $output;
//output
//人 = e4baba ; I would expect '4EBA'
I found a similar solution but it is in VB.net anyone can convert it?
http://www.supportchain.com/index.php?/Knowledgebase/Article/View/28/7/unable-to-send-sms-with-chinese-character-using-api
the sample i had tried, and it is work:-
example of conversion : a converted to hexadecimal is 0061, 人 converted to hexadecimal is 4EBA
The issue you are facing has to do with encoding. Since these are considered special characters, you need to add some encoding details when converting to hex.
Each of these outputs exactly what you were looking for when I run them:
echo bin2hex(iconv('UTF-8', 'ISO-10646-UCS-2', '人')) . PHP_EOL;
//Outputs 4eba
echo bin2hex(iconv('UTF-8', 'UNICODE-1-1', '人')) . PHP_EOL;
//Outputs 4eba
echo bin2hex(iconv('UTF-8', 'UTF-16BE', '人')) . PHP_EOL;
//Outputs 4eba
Pick whichever one you fancy.
If you want to convert back:
echo iconv('UTF-16BE', 'UTF-8', hex2bin('4eba')) . PHP_EOL;
//outputs 人

printing a php variable as it is : with all the special characters

Ok I need to find out what is contained inside a PHP variable and I have it to do it visually, is there a function to display whatever that's contained in a string as it is?
For example :
$TEST = '&nbsp' . "\n" . ' ';
if I use echo the output will be :
while i want it to be :
&nbsp\n&nbsp
is it possible? (I hope I was clear enough)
ty
You can use json_encode with htmlspecialchars:
$TEST = ' ' . "\n" . ' ';
echo json_encode(htmlspecialchars($TEST));
Note that json_encode has third agrument in PHP 5.4.
var_dump() should do the work for you?
Example:
echo "<pre>";
var_dump($variable);
echo "</pre>";
Use <pre> to keep the format structure, makes it alot easier to read.
Resources:
http://php.net/manual/en/function.var-dump.php
http://www.w3schools.com/tags/tag_pre.asp
Try print_r, var_dump or var_export functions, you'll find them very handy for this kind of needs!
http://www.php.net/manual/en/function.htmlspecialchars.php
or
http://www.php.net/manual/en/function.htmlentities.php
$TEST = '&nbsp' . "\n" . ' ';
echo htmlspecialchars(str_replace('\n','\\n', $TEST), ENT_QUOTES);
or
$TEST = '&nbsp' . "\n" . ' ';
echo htmlentities(str_replace('\n','\\n',$TEST), ENT_QUOTES);
You may have to encode the newlines manually. If you want to encode them as actual newlines you can use nl2br. Or string replace these characters with your preference. Update: as I have added to the code per request. String replace special characters you wish to see like newlines and tabs.
assuming you want it for the debugging purposes, let me suggest to use urlencode(). I am using it to make sure I don't miss any invisible character.
The output is not that clear but it works for me.

Categories