PHP - $_GET - decode utf-8 - php

The documentation on this page http://ru2.php.net/manual/en/function.urldecode.php says that "The superglobals $_GET and $_REQUEST are already decoded".
But on my server this code
var_dump($_GET['str'])
returns
string(21) "ффф"
How can I make php decode strings in $_GET ?

You should set correct header content-type on pages with form:
header('Content-Type: text/html; charset="UTF-8"');
And you should get correct data from $_GET without any decoding operations.

As #deceze states, that string already is decoded. But if you want to transform it into readable characters, use html_entity_decode().
$string = 'ффф';
echo html_entity_decode($string);
returns
ффф
Example: http://3v4l.org/eqDf3

That is decoded. The value is already decoded from its URL percent encoded form. The original was likely:
%26%231092%3B%26%231092%3B%26%231092%3B
It has now been decoded to:
ффф
The content of the string is escaped HTML. If you're sending escaped HTML, you'll get escaped HTML. If you don't like escaped HTML, don't send escaped HTML. PHP is not going to try every possible encoding format recursively on URL values until nothing more can be decoded.

The number after &# is a decimal unicode code-point which is unrelated to UTF-8.
According to http://www.utf8-chartable.de/unicode-utf8-table.pl?start=1024&number=1024&unicodeinhtml=dec, your character is:
U+0444 ф d1 84 ф ф CYRILLIC SMALL LETTER EF
Here, d1 84 is the UTF-8 representation for it.
As mentioned earlier, html_entity_decode("ффф", null, 'UTF-8') should do the trick.
It returns the following string:
'ÐäÐäÐä'
Which hexadecimal representation can be found like this:
>> bin2hex($s)
'd184d184d184'
It is indeed correct according to the table quoted previously.

Related

Output PHP string to show escaped characters

In PHP, is it at all possible to output the contents of a string to show any escaped characters that may be contained within the string? I get that the whole point of escaping characters is so that they aren't treated in the usual way. But I would still like to be able to view the raw contents of a string so I can see for myself exactly how characters like \n and \r, etc. are represented. Does PHP have a method for doing this?
Use json_encode() to encode the string as JSON. The JSON encoding of strings (which is, in fact, JavaScript) is the same as the one used by PHP. Both JavaScript and PHP were inspired from C and they copied the notation of string literals from it.
if you use single quotation marks it should do what you need
eg echo 'this\n'; will output this\n where as echo "this\n"; will output this and a new line

PHP and Unicode or UTF-8?

My PHP application outputs JSON where special characters are encoded, f.ex. the string "Brøndum" is represented as "Br\u00f8ndum".
Can you tell me which encoding this is, as well as how I get back from "Br\u00f8ndum" to "Brøndum".
I have tried utf8_encode/decode but they don't work as expected.
Thanks!
That's standard JSON unicode escaping.
You get back to the actual character by using a JSON parser. json_decode in the case of PHP.
You can tell PHP not to escape Unicode characters in the first place with the JSON_UNESCAPED_UNICODE flag.
json_encode("Brøndum", JSON_UNESCAPED_UNICODE)
mb_detect_encoding is your function. You just pass it the string and it detects the codification. You can also send it an array with the possibilities (as a regular string like "hello" could potentially be encoded in different codifications.
echo mb_detect_encoding("Br\u00f8ndum");

$_SERVER['QUERY_STRING'] does not print unicode values as it is

http://localhost/fw/api/fw_api.php?rule=unicode&action=create&phrase=යුනිකෝඩ්
I accessing the above url. In fw_api.php, when I echo the $_SERVER['QUERY_STRING'] it does not give the actual value of my Unicode phrase value "යුනිකෝඩ්" as in the URL. Is there any fix for this or am I doing/expecting something wrong here? Need help.
header ('Content-type: text/html; charset=utf-8');
echo $_GET['phrase'];
echo $_SERVER['QUERY_STRING'];
die;
Actual Result:
යුනිකෝඩ්
rule=unicode&action=create&phrase=%E0%B6%BA%E0%B7%94%E0%B6%B1%E0%B7%92%E0%B6%9A%E0%B7%9D%E0%B6%A9%E0%B7%8A
What I expected
යුනිකෝඩ්
rule=unicode&action=create&phrase=යුනිකෝඩ්
The actual value is actually "%E0%B6%BA%E0%B7%94%E0..."!
URLs must consist of a subset of ASCII, they cannot contain other "Unicode characters". Your browser may be so nice as to let you input arbitrary Unicode characters and actually display them as characters, but behind the scenes the URL value is percent encoded. You'll have to decode it with rawurldecode.
The query string is automatically being parsed and decoded by PHP and placed in the $_GET array (and $_POST for the request body). But the raw query string you'll have to parse and decode yourself.
Encode a value with special characters.
$token = "a{l#3a3s9a";
rawurlencode($token); //The coding would be "%7Bl%403a3s9a"
Send the encoded value to the database
Receive the parameter value by URL
$body = file_get_contents("php://input");
if ($body == null && isset($_SERVER['QUERY_STRING'])) {
parse_str($_SERVER['QUERY_STRING'], $this->parameters);
return;
}
The parameter values are automatically decoded with parse_str () without the need to use rawurldecode()
Use the value obtained by URL ("a{l#3a3s9a")
This encoding would be used to obtain special characters through a URL segment.
GL

write unicode characters into a file in php

I have a json array which is holding the correct string independent of language but when the json is encoded and wrriten into the file it doesnot have the correct values. Its has the the other value random english alphabets eg:(uuadb) I want to write a string into a file where the string could be in any language.Now i am testing with tamil language. But i found PHP doesn't support unicode. please help me how to write unicode charaters into the file using PHP.
I tried using pack function but how to use the pack function for any languages Or is there any other way of doing this.Please help me......
My guess is that you're seeing \uXXXX escapes instead of the non-ASCII characters you asked for. json_encode appears to always escape Unicode characters:
<?php
$arr = array("♫");
$json = json_encode($arr);
echo "$json\n";
# Prints ["\u266b"]
$str = '["♫"]';
$array = json_decode($str);
echo "{$array[0]}\n";
# Prints ♫
?>
If this is what you're getting, it's not wrong. You just have to ensure it's being decoded properly on the receiving end.
Another possibility is that the string you're passing is not in UTF-8. According to the documentation for json_encode and json_decode, these functions only work with UTF-8 data. Call mb_detect_encoding on your input string, and make sure it outputs either UTF-8 or ASCII.

Translate URLENCODED data into UTF-8 in PHP

I've got a string that is in my database like 中华武魂 when I post my request to retrieve the data via my website I'm getting the data to the server in the format %E4%B8%AD%E5%8D%8E%E6%AD%A6%E9%AD%82
What decoding steps to I have to take in order to get it back to the usable form?
While also cleaning the user input to ensure they're not going to try an SQL injection attack?
(escape string before or after encoding?)
EDIT:
rawurldecode(); // returns "中åŽæ­¦é­‚"
urldecode(); // returns "中åŽæ­¦é­‚"
public function utf8_urldecode($str) {
$str = preg_replace("/%u([0-9a-f]{3,4})/i","&#x\\1;",urldecode($str));
return html_entity_decode($str,null,'UTF-8');
}
// returns "中åŽæ­¦é­‚"
... which actually works when I try and use it in an SQL statement.
I think because I was doing an echo and die(); without specifying a header of UTF-8 (thus I guess that was reading to me as latin)
Thanks for the help!
When your data is actually that percent-encoded form, you just have to call rawurldecode:
$data = '%E4%B8%AD%E5%8D%8E%E6%AD%A6%E9%AD%82';
$str = rawurldecode($data);
This suffices as the data already is encoded in UTF-8: 中 (U+4E2D) is encoded with the byte sequence 0xE4B8AD in UTF-8 and that is encoded with %E4%B8%AD when using the percent-encoding.
That your output does not seem to be as expected is probably because the output is interpreted with the wrong character encoding, probably Windows-1252 instead of UTF-8. Because in Windows-1252, 0xE4 represents ä, 0xB8 represents ¸, 0xAD represents å, and so on. So make sure to specify the output character encoding properly.
Use PHP's urldecode:
http://php.net/manual/en/function.urldecode.php
You have choices here: urldecode or rawurldecode.
If you had encoded your string using urlencode, you must use urldecode because of the way spaces are handled. While urlencode converts spaces to +, it is not the same with rawurlencode.

Categories