I'm using CKEditor, and typing in some text with special characters: "Bâtisseurs passionnés", note french special characters. I then use javascript escape() to get the input and send it via AJAX/JSON to the PHP server script.
On the PHP side of things, the log output looks like the following before and after using urldecode(), it appears to convert the tag parts but the special characters only show up as '?' and stored as such into the database. Is there another call I should be using? Or are special characters not included for urldecode?
$json = json_decode($data);
error_log("URLDecode: before: " . $data);
error_log("URLDecode: after: " . urldecode($data));
and the output looks like
URLDecode: before: %3Cp%3E%0A%09B%E2tisseurs%20passionn%E9s%3C/p%3E%0A
URLDecode: after: <p>
B?tisseurs passionn?s</p>
escape isn't a match pair for php's urldecode.
Use encodeURIComponent in javascript.
escape and unescape Functions
The escape and unescape functions do not work properly for non-ASCII characters and have been deprecated. In JavaScript 1.5 and later, use encodeURI, decodeURI, encodeURIComponent, and decodeURIComponent.
MDC:functions
Related
Ok, so I have some JSON, that when decoded, I print out the result. Before the JSON is decoded, I use stripslashes() to remove extra slashes. The JSON contains website links, such as https://www.w3schools.com/php/default.asp and descriptions like Hello World, I have u00249999999 dollars
When I print out the JSON, I would like it to print out
Hello World, I have $9999999 dollars, but it prints out Hello World, I have u00249999999 dollars.
I assume that the u0024 is not getting parsed because it has no backslash, though the thing is that the website links' forward slashes aren't removed through strip slashes, which is good - I think that the backslashes for the Unicode symbols are removed with stripslashes();
How do I get the PHP to automatically detect and parse the Unicode dollar sign? I would also like to apply this rule to every single Unicode symbol.
Thanks In Advance!
According to the PHP documentation on stripslashes (), it
un-quotes a quoted string.
Which means, that it basically removes all backslashes, which are used for escaping characters (or Unicode sequences). When removing those, you basically have no chance to be completely sure that any sequence as "u0024" was meant to be a Unicode entity, your user could just have entered that.
Besides that, you will get some trouble when using stripslashes () on a JSON value that contains escaped quotes. Consider this example:
{
"key": "\"value\""
}
This will become invalid when using stripslashes () because it will then look like this:
{
"key": ""value""
}
Which is not parseable as it isn't a valid JSON object. When you don't use stripslashes (), all escape sequences will be converted by the JSON parser and before outputting the (decoded) JSON object to the client, PHP will automatically decode (or "convert") the Unicode sequences your data may contain.
Conclusion: I'd suggest not to use stripslashes () when dealing with JSON entities as it may break things (as seen in the previous example, but also in your problem).
Your assumption is correct: u0024 is not getting parsed because it has no backslash. You can use regex to add backslash back after the conversion.
It looks like you have UTF-8 encoded strings internally, PHP outputs them properly, but your browser fails to auto-detect the encoding (it decides for ISO 8859-1 or some other encoding).
The best way is to tell the browser that UTF-8 is being used by sending the corresponding HTTP header:
header("content-type: text/html; charset=UTF-8");
Then, you can leave the rest of your code as-is and don't have to html-encode entities or create other mess.
If you want, you can additionally declare the encoding in the generated HTML by using the <meta> tag:
<meta http-equiv=Content-Type content="text/html; charset=UTF-8"> for HTML <=4.01
<meta charset="UTF-8">
for HTML5
HTTP header has priority over the <meta> tag, but the latter may be useful if the HTML is saved to HD and then read locally.
The main question you have to understand, is why do you need to strip slashes?
And, if it is really necessary to strip slashes, how to manage the encoding? Probably it is a good idea to convert unicode symbols before to strip slashes, not after, using html_entity_decode .
Anyway, you can try fix the problem with this workaround:
$string = "Hello World, I have u00249999999 dollars";
$string = preg_replace( "/u([0-9A-F]{0,4})/", "&#x$1;", $string ); // recover "u" + 4 alnums
$string = html_entity_decode( $string, ENT_COMPAT, 'UTF-8' ); // convert to utf-8
My PHP application outputs JSON where special characters are encoded, f.ex. the string "Brøndum" is represented as "Br\u00f8ndum".
Can you tell me which encoding this is, as well as how I get back from "Br\u00f8ndum" to "Brøndum".
I have tried utf8_encode/decode but they don't work as expected.
Thanks!
That's standard JSON unicode escaping.
You get back to the actual character by using a JSON parser. json_decode in the case of PHP.
You can tell PHP not to escape Unicode characters in the first place with the JSON_UNESCAPED_UNICODE flag.
json_encode("Brøndum", JSON_UNESCAPED_UNICODE)
mb_detect_encoding is your function. You just pass it the string and it detects the codification. You can also send it an array with the possibilities (as a regular string like "hello" could potentially be encoded in different codifications.
echo mb_detect_encoding("Br\u00f8ndum");
I have a problem. I have UTF-8 strings that was escaped in AS3 by using escape() function. Now I want unescape them in PHP. The problem is that if I'm using rawurldecode() or urldecode() I get only common character unescaped like ./+[] etc. but not special Latin characters (in my case ĄČĘĖĮŠŲŪŽ) - they are left encoded. So how do I correctly decode strings in PHP?
EDIT
This is also aplicable for JavaScript.
Thanks!
You shouldn't have to escape your strings when you send them to PHP. Flash with do that for you.
So, if they are already escaped and you can do anything about it, just unescape them before sending them using URLLoader.
You should have clean values on the PHP side.
Ok, I got a solution from my colleague and it solves this issue
html_entity_decode(preg_replace("/%u([0-9a-f]{3,4})/i","&#x\\1;", urldecode($str)), null, 'UTF-8');
I am outputting a string that consists of html content to a html file, but in the html file the html special characters are encoded (for example " in \" ). I've even used htmlspecialcharacters_decode before using the write functions. The wierd part is that on my computer the characters are not encoded, while uploaded on some server are encoded. How can I deal with this problem?
Anticipated thanks!
You are probably suffering from Magic Quotes
Check you phpinfo();
To clear Magic Quotes look into the discussion at php.net:
http://www.php.net/manual/en/function.stripslashes.php
Example (c) jeremysawesome:
array_walk_recursive($_POST, create_function('&$val', '$val = stripslashes($val);'));
I sent a text via GET method to decode html entities ( w = w )
> ?text=w&type=htmldecode&format=text
I got errors in the $text variable then I tried to set it in the last of the link
?format=text&type=htmldecode&text=w
and I got the same errors
how I can fix that ?
There are 2 types of encoding pertinent to your problem. HTML escape characters, and URL escape chars.
When you have a character in an HTML page, you use the HTML escape characters. eg
w = w
But you cannot use those characters in a URL - & and # have special meanings in URLs. So you have to encode again - this time using URL escape characters.
# = %23
& = %26
; = %3B
So your string, ('w') fit to be put into a URL, would be:
%23%26119%3B
and your entire query string:
?text=%23%26119%3B&type=htmldecode&format=text
the aforementioned PHP urlencode() does this.
The snippet:
<?php echo urlencode("w"); ?>
outputs
%26%23119%3B
I think you need to decode it then re-encode it using URL encoding urlencode