Having a very bizarre issue with the conversion of a $_GET request into a string.
(PHP 5.2.17)
Here is a small snippet of the problem area of the array from print_r():
_GET (array)
...
[address_country_code] => GB
[address_name] => Super Mario
[notify_version] => 3.7
...
There are two cases the _GET data is used:
Case 1): Saved then used later:
// Script1.php
$data = json_encode($_GET);
# > Save to MySQL Database ($data)
// Script2.php (For Viewing & Testing URL later)
# > Load from Database ($result)
echo http_build_query(json_decoded($result,true));
Result of above array snippet: (CORRECT OUTPUT)
address_country_code=GB&address_name=Super+Mario¬ify_version=3.7
Case 2): Used in same script as Case 1) just before its saved in Case 1):
// Script1.php
echo http_build_query($_GET);
Results in: (INCORRECT OUTPUT)
address_country_code=GB&address_name=Super+Mario¬ify_version=3.7
How is it possible that a few chars are output as a ¬ in case 2 yet case 1 is fine!
It is driving me insane :(
I have tried also instead of using http_build_query a custom function that generates the url using urlencode() in the Key and Value of the foreach loop, this just resulted in the the ¬ being changed to %C2%AC in one of my test cases!
Everything is ok with your data. You can verify it if you do:
$query = http_build_query($_GET);
parse_str($query, $data);
print_r($data);
you will get the correct uncorrupted data.
And the reason why you see ¬ symbol is how browser interprets html entities. ¬ is represented as ¬ But browser will render it even without semicolon at the end.
You're most likely displaying this data in a web browser and that is interpreting
¬
as special HTML entity.
Pls see this: https://code.google.com/p/doctype-mirror/wiki/NotCharacterEntity
Try doing
var_dump(http_build_query($_GET))
instead of:
echo http_build_query($_GET)
and see HTML source to get/verify actual string.
So, even though both cases output to web a web browser and both convert from an array using http_build_query().
I fixed problem in Case 2 by replacing http_build_query (Case 1 still uses it..) with this function:
htmlspecialchars(http_build_query($_GET));
Related
I'm retrieving from Odoo 9 on Ubuntu 14.04 ENG a list of partners via XML-RPC using PHP and ripcord
Some names contain one or more diacritics:
Pièr
Frère Pièr
All those names have been entered from a single computer running Windows 8.1 using one version of Chrome.
The strange fact is that I get a list where some diacritics are correct, some other have encoding problems, like:
Pi�r
Fr�re Pièr
The same diacritic in the same string is correctly encoded or not.
In subsequent calls the result is always the same.
If I edit the string, then it could change the results, giving
Frère Pi�r
Frère Pièr
Fr�re Pi�r...
I need to output a JSON, and thus I need to encode this in UTF-8: but it is currently impossible since I don't have a clue of what encoding the original text is (and it seems to not have any encoding at all!)
Any idea?
I found out that the incoming array was in charset "Latin1"
I solved normalizing the array generated from the XML-RPC output, recursively applying a multbyte conversion function:
// given an XML-RPC output named $arr_output...
function descramble_diacritics(&$entry, $key) {
$entry = mb_convert_encoding($entry, 'UTF-8', 'Latin1');
}
array_walk_recursive($arr_output, 'descramble_diacritics');
header('Access-Control-Allow-Origin: *');
header('Content-Type: application/json');
echo json_encode($arr_output);
I am using PHP 5.2.9-2 with WAMP on a Windows machine.
I am having a problem trying to decode a JSON string that contains a copyright symbol in one of the elements. The function always return NULL. My first thought was to attempt to escape the character, but the htmlentities() function just returns the same string. I tried to pass the arguments like so:
htmlentities($json, ENT_NOQUOTES, 'utf-8');
But that only returns an empty string. I thought about trying ENT_IGNORE, but it is only available in PHP 5.3.0+. How can I get this JSON string correctly encoded into a JSON object when it has this copyright symbol in it?
I do not have control over the source of the JSON and yes, it is properly formatted. I am getting the information from a 3rd party API and the string has a file size of a little more than 20MB. I use ajax to get the JSON then save it to a file and later read it in to the PHP.
EDIT: Here's a link to the JSON I'm working with.
DROPBOX LINK
The specific line is this
...{"Ranking":1115,"Name":"©lutchGod-","Rank":55,"TotalExp":8571865,"KDR":1.14,"Kill":66459,"HeadShot":11785,"clan":" pG "}...
EDIT2:
To clarify, I am looking to convert this JSON string into a JSON object so that I can use a foreach loop to extract each part and process it. If I end up with a string at the end, I get no where. I have been using the decode function like so to get associative arrays:
json_decode($json, true);
EDIT 3:
I've put together a barebones version of the problem. All I do is read in the JSON from a txt file and attempt to run it through the json_decode() function. With the copyright symbol, it fails. Without it, it works fine. Here it is:
***Contents of SOExampleJSON.txt***
{"Ranking":1115,"Name":"©lutchGod-","Rank":55,"TotalExp":8571865,"KDR":1.14,"Kill":66459,"HeadShot":11785,"clan":" pG "}
***PHP Code***
<?php
echo '<pre>';
$rawJson = file_get_contents('SOExampleJSON.txt');
var_dump($rawJson);
$json = json_decode($rawJson);
var_dump($json);
echo '</pre>';
?>
***Output***
string(120) "{"Ranking":1115,"Name":"©lutchGod-","Rank":55,"TotalExp":8571865,"KDR":1.14,"Kill":66459,"HeadShot":11785,"clan":" pG "}"
NULL
***Output when copyright is removed***
string(119) "{"Ranking":1115,"Name":"lutchGod-","Rank":55,"TotalExp":8571865,"KDR":1.14,"Kill":66459,"HeadShot":11785,"clan":" pG "}"
object(stdClass)#1 (8) {
["Ranking"]=>
int(1115)
["Name"]=>
string(9) "lutchGod-"
["Rank"]=>
int(55)
["TotalExp"]=>
int(8571865)
["KDR"]=>
float(1.14)
["Kill"]=>
int(66459)
["HeadShot"]=>
int(11785)
["clan"]=>
string(4) " pG "
}
I need the object as it is above, but I need to preserve the "Name" in such a way that I can compare it later. I don't care what format it is in, as long as it is usable. As far as I know, this is the only name with such a symbol. The symbol isn't even allowed as part of a name, but the developers have obviously goofed somewhere in their checking and now I'm having to find a way around it until they fix it. I reported it 2 months ago and there still hasn't been anything done about it, so I don't expect it to be any time soon.
yuck, PHP 5.2.X...
i think this hack should work
<?php
$json=base64_decode('eyJSYW5raW5nIjoxMTE1LCJOYW1lIjoiwqlsdXRjaEdvZC0iLCJSYW5rIjo1NSwiVG90YWxFeHAiOjg1NzE4NjUsIktEUiI6MS4xNCwiS2lsbCI6NjY0NTksIkhlYWRTaG90IjoxMTc4NSwiY2xhbiI6IiBwRyAifQ==');
assert(strlen($json)>1);
$parsedJson=json_decode($json,true);
assert($parsedJson!=null);
?>
<script type="text/javascript">
var str=atob("<?php echo base64_encode(var_export($parsedJson,true));?>");
alert(str);
</script>
though, when you upgrade to 5.4.0 or newer, this would be a much better approach
<?php
$json=base64_decode('eyJSYW5raW5nIjoxMTE1LCJOYW1lIjoiwqlsdXRjaEdvZC0iLCJSYW5rIjo1NSwiVG90YWxFeHAiOjg1NzE4NjUsIktEUiI6MS4xNCwiS2lsbCI6NjY0NTksIkhlYWRTaG90IjoxMTc4NSwiY2xhbiI6IiBwRyAifQ==');
assert(strlen($json)>1);
$parsedJson=json_decode($json,true,1337);
assert($parsedJson!=null);
echo '<pre>';
ob_start();
var_dump($parsedJson);
echo htmlentities(ob_get_clean(),ENT_SUBSTITUTE);
echo '</pre>';
Also, your problem is probably not the json decoding, its the html encoding, you need PHP 5.3.0 or higher for ENT_IGNORE, and you need PHP 5.4.0 for ENT_SUBSTITUTE , and without at least ENT_IGNORE: "If the input string contains an invalid code unit sequence within the given encoding an empty string will be returned, unless either the ENT_IGNORE or ENT_SUBSTITUTE flags are set." ( http://php.net/manual/en/function.htmlentities.php ) ..i guess the # counts. but use assert just to be sure about that. also use error_reporting(E_ALL); , preferably even exception_error_handler ( see here http://php.net/manual/en/class.errorexception.php )
This code works for me using PHP 5.6.6 on Windows. To preserve the © I'm using urlencode before json_encode and then I use urldecode after using json_decode to get it back.
<?php
$test = urlencode('{"Ranking":1115,"Name":"©lutchGod-","Rank":55,"TotalExp":8571865,"KDR":1.14,"Kill":66459,"HeadShot":11785,"clan":" pG "}');
$test2= json_encode($test);
var_dump(urldecode(json_decode($test2)));
Edit: update to address latest comment
This time on Ubuntu (PHP 5.5.9-1ubuntu4.6 (cli) (built: Feb 13 2015 19:17:11))
header('Content-Type: text/html; charset=utf-8');
$test = '{"Ranking":1115,"Name":"©lutchGod-","Rank":55,"TotalExp":8571865,"KDR":1.14,"Kill":66459,"HeadShot":11785,"clan":" pG "}';
$test2 = json_decode($test);
print_r($test2);
Output:
stdClass Object
(
[Ranking] => 1115
[Name] => ©lutchGod-
[Rank] => 55
[TotalExp] => 8571865
[KDR] => 1.14
[Kill] => 66459
[HeadShot] => 11785
[clan] => pG
)
Searching through my application's uncaught exception logs ( js -> php -> vb6 dll ) i noticed a weird error:
file: /displaywords_GET.php?GreekWord=%E1%ED%E8%F1%F9%F0%EF%EC%DE%ED%E1%F2&selectedRes=1 # <b>Source:</b> mydll<br/><b>Description:</b> Invalid procedure call or argument # Variables:
# Array
(
[GreekWord] => ανθρωπομήνας
[selectedRes] => 1
)
so the exception in the .dll occurs for the given parameters. I tested it myself in the app by entering the specific word and the error did not occur. Then I tested to see by entering the encoded URL directly in the address bar and the error was reproduced. So in order to see if there is something wrong with the encoding, i did in javascript
encodeURIcomponent("ανθρωπομήνας")
and the result is :
%CE%B1%CE%BD%CE%B8%CF%81%CF%89%CF%80%CE%BF%CE%BC%CE%AE%CE%BD%CE%B1%CF%82
which is very different from the GET parameter above in the php log. Then i tried to decode the url get parameter as seen in the php file with :
decodeURIcomponent("%E1%ED%E8%F1%F9%F0%EF%EC%DE%ED%E1%F2")
and javascript says : malformed URI sequence. Why is this happening ? Obviously the application crashes because the particular URL parameter is malformed, not a proper one.
Now, my problem is, how can I see if the encoded string is a proper one or a corrupted one ? ( Though I'm not sure why php seems to decode it kind of correctly in the logs, when javascript says it's malformed ).
thanks in advance!
%E1%ED... is the URL-encoding of the string as represented in the ISO-8859-7 character set. You will need to convert to the UTF-8 encoding before URL-encoding the bytes, since JavaScript will only work with UTF-8 strings.
$word = 'ανθρωπομήνας';
var_dump(urlencode($word)); // %E1%ED%E8%F1%F9...
$utf8word = iconv('ISO-8859-7', 'UTF-8', $word);
var_dump(urlencode($utf8word)); // %CE%B1%CE%BD...
I'm writing PHP code that uses a database. To do so, I use an array as a hash-map.
Every time content is added or removed from my DB, I save it to file.
I'm forced by my DB structure to use this method and can't use mysql or any other standard DB (School project, so structure stays as is).
I built two functions:
function saveDB($db){
$json_db = json_encode($db);
file_put_contents("wordsDB.json", $json_db);
} // saveDB
function loadDB(){
$json_db = file_get_contents("wordsDB.json");
return json_decode($json_db, true);
} // loadDB
When echo-ing the string I get after the encoding or after loading from file, I get a valid json (Tested it on a json viewer) Whenever I try to decode the string using json_decode(), I get null (Tested it with var_dump()).
The json string itself is very long (~200,000 characters, and that's just for testing).
I tried the following:
Replacing single/double-quotes with double/single-quotes (Without any backslashes, with one backslash and three backslashes. And any combination I could think of with a different number of backslashes in the original and replaced string), both manually and using str_replace().
Adding quotes before and after the json string.
Changing the page's encoding.
Decoding without saving to file (Right after encoding).
Checked for slashes and backslashes. None to be found.
Tried addslashes().
Tried using various "Escape String" variants.
json_last_error() doesn't work. I get no error number (Get null, not 0).
It's not my server, so I'm not sure what PHP version is used, and I can't upgrade/downgrade/install anything.
I believe the size has something to do with it, because small strings seem to work fine.
Thanks Everybody :)
In your JSON file change null to "null" and it will solve the problem.
Check if your file is UTF8 encoded. json_decode works with UTF8 encoded data only.
EDIT:
After I saw uploaded JSON data, I did some digging and found that there are 'null' key. Search for:
"exceeding":{"S01E01.html":{"2217":1}},null:{"S01E01.html":
Change that null to be valid property name and json_decode will do the job.
I had a similar problem last week. my json was valid according to jsonlint.com.
My json string contained a # and a & and those two made json_decode fail and return null.
by using var_dump(json_decode($myvar)) which stops right where it fails I managed to figure out where the problem was coming from.
I suggest var_dumping and using find dunction to look for these king of characters.
Just on the off chance.. and more for anyone hitting this thread rather than the OP's issue...I missed the following, someone had htmlentities($json) way above me in the call stack. Just ensure you haven't been bitten by the same and check the html source.
Kickself #124
Sorry I can’t log in claim ID is having server issues (im normally Arthur Gibbs)
Data from my database currently outputs this when there are strange charecters...
This is just a example
What I get: De√ilscrat™
What I want: De√ilscrat™
It seems that some characters are being translated into character code by the other guys system..
So what I want to know is:
Is there a function that will expand charecter codes within a string?
Turning FUNCTION(De√ilscrat™) >>> De√ilscrat™.
This √ stuff looks like an HTML entity ; so, let's try de-entitying it...
This can be done using the html_entity_decode function, that's provided by PHP.
For instance, with the string you provided, here's a sample of code :
// So the browser interprets the correct charsert
header('Content-type: text/html; charset=UTF-8');
$input = 'De√ilscrat™';
$output = html_entity_decode($input, ENT_NOQUOTES, 'UTF-8');
var_dump($input, $output);
And the output I'm getting is this one :
string 'De√ilscrat™' (length=19)
string 'De√ilscrat™' (length=15)
(First one is the original version, and second one is the "decoded" version)
So, it seems to do the trick ;-)