Unescaping string in PHP that was escaped in AS3 - php

I have a problem. I have UTF-8 strings that was escaped in AS3 by using escape() function. Now I want unescape them in PHP. The problem is that if I'm using rawurldecode() or urldecode() I get only common character unescaped like ./+[] etc. but not special Latin characters (in my case ĄČĘĖĮŠŲŪŽ) - they are left encoded. So how do I correctly decode strings in PHP?
EDIT
This is also aplicable for JavaScript.
Thanks!

You shouldn't have to escape your strings when you send them to PHP. Flash with do that for you.
So, if they are already escaped and you can do anything about it, just unescape them before sending them using URLLoader.
You should have clean values on the PHP side.

Ok, I got a solution from my colleague and it solves this issue
html_entity_decode(preg_replace("/%u([0-9a-f]{3,4})/i","&#x\\1;", urldecode($str)), null, 'UTF-8');

Related

PHP and Unicode or UTF-8?

My PHP application outputs JSON where special characters are encoded, f.ex. the string "Brøndum" is represented as "Br\u00f8ndum".
Can you tell me which encoding this is, as well as how I get back from "Br\u00f8ndum" to "Brøndum".
I have tried utf8_encode/decode but they don't work as expected.
Thanks!
That's standard JSON unicode escaping.
You get back to the actual character by using a JSON parser. json_decode in the case of PHP.
You can tell PHP not to escape Unicode characters in the first place with the JSON_UNESCAPED_UNICODE flag.
json_encode("Brøndum", JSON_UNESCAPED_UNICODE)
mb_detect_encoding is your function. You just pass it the string and it detects the codification. You can also send it an array with the possibilities (as a regular string like "hello" could potentially be encoded in different codifications.
echo mb_detect_encoding("Br\u00f8ndum");

decoding ISO characters

I got Chinese characters encoded in ISO-8859-1, for example 兼 = 兼
Those characters are taken form the database using AJAX and sent by Json using json_encode.
I then use the template Handlebars to set the data on the page.
When I look at the ajax page the characters are displayed correctly, the source is still encoded.
But the final result displays the encrypted characters.
I tried to decode on the javascript part with unescape but there is no foreach with the template that gives me the possibility to decode the specific variable, so it crashes.
I tried to decode on the PHP side with htmlspecialchars_decode but without success.
Both pages are encoded in ISO-8859-1, but I can change them in UTF8 if necessary, but the data in the database remains encoded in ISO-8859-1.
Thank you for your help.
You're simply representing your characters in HTML entities. If you want them as "actual characters", you'll need to use an encoding that can represent those characters, ISO-8859 won't do. htmlspecialchars_decode doesn't work because it only decodes a handful of characters that are special in HTML and leaves other characters alone. You'll need html_entity_decode to decode all entities, and you'll need to provide it with a character set to decode to which can handle Chinese characters, UTF-8 being the obvious best choice:
$str = html_entity_decode($str, ENT_COMPAT, 'UTF-8');
You'll then need to make sure the browser knows that you're sending it UTF-8. If you want to store the text in the database in UTF-8 as well (which you really should), best follow the guide How to handle UTF-8 in a web app which explains all the pitfalls.
Are you including your text with the "double-stache" Handlebars syntax?
{{your expression}}
As the Handlebars documentation mentions, that syntax HTML-escapes its output, which would cause the results you're mentioning, where you're seeing the entity 兼 instead of 兼.
Using three braces instead ("triple-stache") won't escape the output and will let the browser correctly interpet those numeric entities:
{{{your expression}}}

Does sometime fputs() or fwrite() encode html special characters?

I am outputting a string that consists of html content to a html file, but in the html file the html special characters are encoded (for example " in \" ). I've even used htmlspecialcharacters_decode before using the write functions. The wierd part is that on my computer the characters are not encoded, while uploaded on some server are encoded. How can I deal with this problem?
Anticipated thanks!
You are probably suffering from Magic Quotes
Check you phpinfo();
To clear Magic Quotes look into the discussion at php.net:
http://www.php.net/manual/en/function.stripslashes.php
Example (c) jeremysawesome:
array_walk_recursive($_POST, create_function('&$val', '$val = stripslashes($val);'));

Weird Word Character breaks AJAX

I seem to be having a probelm whenever I try and send something by AJAX that has the Word '-' (hyphen) character in it. It seems to turn he whole string into 'null' in PHP when I convert to JSON.
Has anyone else seen/solved this?
the "Word hyphen" you're talking about is probably an em-dash. This is not a standard ascii character, which means that your issue is likely to be around character encoding.
Either encode all the extended characters in your string as HTML entities using the PHP htmlentities() function, or else ensure that all your content is served as UTF-8.
What are you using? json_decode? Try seeing what you get out of json_last_error
http://www.php.net/manual/en/function.json-last-error.php
The json decode example function has in it, a dash, so its probably not an issue.
http://php.net/manual/en/function.json-decode.php
Check the section on there that says 'common errors'.

getting json_encode to not escape html entities

I send json_encoded data from my PHP server to iPhone app. Strings containing html entities, like '&' are escaped by json_encode and sent as &.
I am looking to do one of two things:
make json_encode not escape html entities. Doc says 'normal' mode shouldn't escape it but it doesn't work for me. Any ideas?
make the iPhone app un-escape html entities cheaply. The only way I can think of doing it now involves spinning up a XML/HTML parser which is very expensive. Any cheaper suggestions?
Thanks!
Neither PHP 5.3 nor PHP 5.2 touch the HTML entities.
You can test this with the following code:
<?php
header("Content-type: text/plain"); //makes sure entities are not interpreted
$s = 'A string with & &#x6F8 entities';
echo json_encode($s);
You'll see the only thing PHP does is to add double quotes around the string.
json_encode does not do that. You have another component that is doing the HTML encoding.
If you use the JSON_HEX_ options you can avoid that any < or & characters appear in the output (they'd get converted to \u003C or similar JS string literal escapes), thus possibly avoiding the problem:
json_encode($s, JSON_HEX_TAG|JSON_HEX_AMP|JSON_HEX_QUOT)
though this would depend on knowing exactly which characters are being HTML-encoded further downstream. Maybe non-ASCII characters too?
Based on the manual it appears that json_encode shouldn't be escaping your entities, unless you explicitly tell it to, in PHP 5.3. Are you perhaps running an older version of PHP?
Going off of Artefacto's answer, I would recommend using this header, it's specifically designed for JSON data instead of just using plain text.
<?php
header('Content-Type: application/json'); //Also makes sure entities are not interpreted
$s = 'A string with & &#x6F8 entities';
echo json_encode($s);
Make sure you check out this post for more specific reasons why to use this content type, What is the correct JSON content type?

Categories