I have some weirdly formatted json string which is invalid json, but executes as valid javascript. This means PHP json_decode, will not work.
{
"Devices":{
"Device1":"{ \"Name\"=\>\"AutoTap LDVDS\",\"ID\"=\>\"LDVDSDevice\"}"
}
}
The backslashes are not valid. Is there some way I can escape this string so it can be re-encoded exactly the same as it came in?
Edit I don't care about parsing the messy string at all. It's preventing me from accessing other data. I was doing a simple regex to strip the ugly strings out of the json before parsing it. But now I need to re-encode the result array back into JSON and I want to avoid losing this data. The ugly string should remain exactly the same, as it may be important to some other application that uses this data.
The => comes from ruby object notation in case you are wondering.
Well, it's those weird escaped > that are killing it: \>
I see no reason why you can't str_replace them out of existence safely with a simple:
<?php
$code='{
"Devices":{
"Device1":"{ \"Name\"=\>\"AutoTap LDVDS\",\"ID\"=\>\"LDVDSDevice\"}"
}
}';
$code=str_replace('\\>','>',$code);
var_export(json_decode($code));
But then, you know the domain of your data.
And you should apply a grain of salt before applying that blindly to all your inputs.
You could run stripslashes on it, and then pass that sring into json_decode.
Related
I'm using a 3rd party API that seems to return its data with the entity codes already in there. Such as The Lion’s Pride.
If I print the string as-is from the API it renders just fine in the browser (in the example above it would put in an apostrophe). However, I can't trust that the API will always use the entities in the future so I want to use something like htmlentities or htmlspecialchars myself before I print it. The problem with this is that it will encode the ampersand in the entity code again and the end result will be The Lion’s Pride in the HTML source which doesn't render anything user friendly.
How can I use htmlentities or htmlspecialchars only if it hasn't already been used on the string? Is there a built-in way to detect if entities are already present in the string?
No one seems to be answering your actual question, so I will
How can I use htmlentities or htmlspecialchars only if it hasn't already been used on the string? Is there a built-in way to detect if entities are already present in the string?
It's impossible. What if I'm making an educational post about HTML entities and I want to actually print this on the screen:
The Lion’s Pride
... it would need to be encoded as...
The Lion&;#8217;s Pride
But what if that was the actual string we wanted to print on the string ? ... and so on.
Bottom line is, you have to know what you've been given and work from there – which is where the advice from the other answers comes in – which is still just a workaround.
What if they give you double-encoded strings? What if they start wrapping the html-encoded strings in XML? And then wrap that in JSON? ... And then the JSON is converted to binary strings? the possibilities are endless.
It's not impossible for the API you depend on to suddenly switch the output type, but it's also a pretty big violation of the original contract with your users. To some extent, you have to put some trust in the API to do what it says it's going to do. Unit/Integration tests make up the rest of the trust.
And because you could never write a program that works for any possible change they could make, it's senseless to try to anticipate any change at all.
Decode the string, then re-encode the entities. (Using html_entity_decode())
$string = htmlspecialchars(html_entity_decode($string));
https://eval.in/662095
There is NO WAY to do what you ask for!
You must know what kind of data is the service giving back.
Anything else would be guessing.
Example:
what if the service is giving back & but is not escaping ?
you would guess it IS escaping so you would wrongly interpret as & while the correct value is &
I think the best solution, is first to decode all html entities/special chars from the original string, and then html encode the string again.
That way you will end up with a correctly encoded string, no matter if the original string was encoded or not.
You also have the option of using htmlspecialchars_decode();
$string = htmlspecialchars_decode($string);
It's already in htmlentities:
php > echo htmlentities('Hi&mom', ENT_HTML5, ini_get('default_charset'), false);
Hi&mom
php > echo htmlentities('Hi&mom', ENT_HTML5, ini_get('default_charset'), true);
Hi&;mom
Just use the [optional]4th argument to NOT double-encode.
So I have a json encoded string by a system, Which for a reason I cannot touch.
See below.
[{"item0":"sometext","item1":"sometext too but i have "quoted string" inside of me"}]
so now my problem is, using json_decode($json_array_above); gives me NULL output as it cannot convert the quoted string...
I try some preg_replace code but am too noob to findout how to replace the quoted string and introduce escape char which will look like this \"quoted string\". Seriously, I cannot comprehend with the preg_replace with this condition.. where you will find the occurence of double qoute inside the json_encoded string.. Please enlighten me.
I have tried other questions available here but my understanding was not enough.
Also, pls note that I cannot touch the 1 encoding the json object as it is provided by a 3rd party system..
TIA.
EDIT:
Thanks to those who enlighten me...
So this was not possible and I have to drop this one and try to contact the system developer to correct the json encoded string they provided as it was the best option.
Which for a reason I cannot touch
Then you're stuffed. It's broken, and you can't reliably fix it.
I try some preg_replace code
You can't. There's no way for you to know whether a " is actually meant to terminate the string, or meant to be a character in the string.
Stop all attempts to fix this at your end. The end sending you the invalid JSON is the problem. If you "can't touch" it, contact and berate someone who can until they fix it. You might take it as an opportunity to teach them that this is why don't hand-create JSON, or create it with string concatenation, etc. Instead, you build a structure, then use a proper JSON serializer to create the JSON, which will (in this case) put in the necessary escapes (backslashes) so that it looks like this:
[{"item0":"sometext","item1":"sometext too but i have \"quoted string\" inside of me"}]
I have this URL parameter:
KKe%7bZoE_%24g)tjm%40
When I put it into a variable and echo it, the result is:
KKe{ZoE_$g)tjm#
How to avoid that?
Data in $_GET is already URL-decoded. If you require the original string, get it from $_SERVER['QUERY_STRING']. Note that you will have to process the query string yourself though, including breaking down the individual components.
Alternatively, use rawurlencode($_GET[..]) to re-encode the value; which may or may not produce slightly differently encoded values than you originally got.
Test it with html_entity_decode - it helpt me a lot with my inputs.
If the string is not shown as it is, you have urlencode() or htmlentities() somewhere in your code. Check that, you shouldn't encode html entities before echoing if you want the string to be intact.
I have problems when users input " or \ on a html form
The inputted text will be shown again to the user in html content and html attribute contexts
I have the following data flow:
jQuery form line input
$_POST
escape for html attribute: function escapes either with html entities or hex entities (" or \)
json_encode in php
some unknown javascript interference which blows the fuses
json_parse in a jquery ajax callback
The goal is to show the user the exact same text as they inputted, but to escape properly to avoid xss attacks.
Now first thing I got was that $_POST had slashes added for some reason. So I now use stripslashes first. That solved everything for single quotes, but if the user inputs " or \ it still breaks.
The problems seems to be that javascript does some decoding before the json_parse gets the data. it turns the hex escapes back to \ and " thus killing json_parse.
So then I thought if between step 4 and 5 I use htmlspecialchars( $data, NO_QUOTES, 'utf-8' ) I encode the ampersands to &, which should neutralise the javascript decoding, but no. It doesn't decode &for some reason while it does decode " and the hex encodings...
Where am I going wrong?
Is there a way to know exactly what the javascipt decodes and neutralize it from php?
What I'm doing now, after wasting half a day:
I think it's probably some jQuery thing to interfere with the data before the onsuccess handler gets it. I have no time to dig it up and kill it right now, so I'm just sneaking past it with a hack that means 3 string transformations just to keep a string untransformed, but hey, developer time is a rare commodity here.
in php:
// due to a problem with the jQuery callback code which seems to decode html entities and hex entities except for &
// we need to do something to keep our data intact, otherwise parse_json chokes on unescaped backslashes
// and quotes. So we mask the entity by transforming the & into & here and back in js.
// TODO: unit test this to prevent regression
// TODO: debug the jQuery to avoid this workaround
//
// echo json_encode( $response );
echo preg_replace( '/&/u', '&', json_encode( $response ) );
in js before parse_json:
// due to a problem with the jQuery callback code which seems to decode html entities and hex entities except for &
// we need to do something to keep our data intact, otherwise parse_json chokes on unescaped backslashes
// and quotes. So we mask the entity by transforming the & into & here and back in js.
// See function xxxxxx() in file xxxxx.php for the corresponding transformation
//
responseText = responseText.replace( /&/g, '&' );
I couldn't be bothered at the moment to write the unit tests for it, but I don't seem to be able to break it.
The true question remains how can I knock out the unwanted transformation while getting the same result?
Try turning off "Magic Quotes" in php. That way the data comes in through $_POST just like the user typed it. See: http://www.php.net/manual/en/security.magicquotes.disabling.php
Then you can escape it according to your needs.
I had a problem like your problem and used utf8_encode() function. Now it works well. Can you try it ?
I am having a problem passing a json string back to a php script to process.
I have a json string that's been created by using dojo.toJson() that contains a / and looks like this:
[{"id":"2","company":"My Company / Corporation","jobrole":"Consultant","jobtitle":"System Integration Engineer"}]
When I pass the string back to the php script it get's chopped at the / and creates a malformed json string, which then means I can't convert it into a php array.
What is the best way of escaping the / in this string? I was looking at regular expressions and doing a string.replace() however my regex isn't that strong, and I'm not sure if there are better ways of doing this?
Many thanks
You shouldn't need to do anything special to represent a / in JSON - a string can contain any character except a " or (when not used to start an escape sequence) \.
The problem is possibly therefore in:
the way you parse the JSON server side
the way your parse the HTTP data to get the JSON string
the way you encode the string before making the HTTP request
(I'd bet on it being the last of those options).
I would start by using a tool such as LiveHttpHeaders or Charles Proxy to see exactly what data is sent to the server.
(I'd also expand the question with the code you use to make the request, and the code you use to parse it at the other end).
\/. Take a look here. The documentation is really easy to read, concise and clear. But unescaped / should still be valid in JSON's string so maybe your bug is somewhere else?
Ok. Anyway.
When passing variables to PHP don't use JSON - it's good for passing variables other way.
Instead you better use http://api.dojotoolkit.org/jsdoc/1.3/dojo.objectToQuery method and on PHP side parse standard PHP $_GET variables.
EDIT: Ok, I'm 'lost in the woods' here also, but here's a tip - check if you don't have some mod_rewrite rules in action here. Kind of seems like that.
Also, if you can send me the URL which gave you 404 (you can cut out domain part, i'm interested in script filename and all afterwards) maybe I can give you more detailed answer.
To be clear, whether you choose to send JSON to PHP or use regular form values is a matter of preference. It /should/ work either way. It sounds like you aren't url-encoding the JSON at the client-side so the server-side is treating / as a path delimiter. In which case its borked before json_decode gets to it.
so, try encodeURIComponent( dojo.toJson(stuff) )
json_encode() used to escape forward slashes. like this:
prompt> json_encode(json_decode('"A/B"'));
string(6) ""A\/B""
JSON_UNESCAPED_SLASHES was added in PHP5.4 to suppress this behavior.