Why does JSON encoder adds escaping character when encoding URLs? - php

I am using json_encode in PHP to encode an URL
$json_string = array ('myUrl'=> 'http://example.com');
echo json_encode ($json_string);
The above code generates the following JSON string:
{"myUrl":"http:\/\/example.com"}
Rather than
{"myUrl":"http://example.com"}
I am just newbie, which output is correct? Is JSON parser able to evaluate the second output correctly?

According to https://www.json.org/, one should escape that character, although it is not strictly necessary in JavaScript:
Also read this related bug report on php.net for a brief discussion.
See 2.5 of the RFC:
All Unicode characters may be placed
within the quotation marks except for
the characters that must be escaped:
quotation mark, reverse solidus, and
the control characters (U+0000 through
U+001F).
Any character may be escaped.
So it doesn't sound like it needs to be escaped, but it can be, and the website (and a text diagram in the RFC) illustrates it as being escaped.

My guess is that the writers of that function added that unnecessary encoding through nothing more than plain ignorance. Escaping forward slashes is not required.
A surprisingly large number of programmers I've known are just as bad with keeping their slashes straight as the rest of the world. And an even greater number are really poor with doing encoding and decoding properly.
Update:
After doing some searches, I came across this discussion. It brings up a good point that escaping a / is sometimes necessary for bad HTML parsers. I've come across a problem once where when IE 6 incorrectly handles content like this:
<script>
var json = { scriptString: "<script> /* JavaScript here */ </script>" };
</script>
IE 6 would see the </script> inside of the string and close out the script tag too early. Thus, this is more IE 6 safe (though the opening script tag in string might also break things... I can't remember):
<script>
var json = { scriptString: "<script> \/* JavaScript here *\/ <\/script>" };
</script>
And they also say that some bad parsers would see the // in http:// and treat the rest of the line like a JavaScript comment.
So it looks like this is yet another case of Internet technologies being hijacked by Browser Fail.

If you are using php 5.4 you can use json_encode options. see the manual.
Several options added in php 5.3 but JSON_UNESCAPED_SLASHES in 5.4.

I think this solves your problem
json_encode ($json_string, JSON_UNESCAPED_SLASHES );
You can see the documentation:
https://www.php.net/manual/en/function.json-encode.php https://www.php.net/manual/en/json.constants.php

I see another problem here. The string result {"myUrl":"http://example.com"} should not have the member name myUrl quoted. In JavaScript and JSON, I think all object literal member ids are unquoted strings. So, I would expect the result to be {myUrl:"http://example.com"}.
This seems too big a bug in PHP, so I must be wrong.
Edit, 2/11/11: Yes, I'm wrong. JSON syntax requires even the field names to be in double quotation marks.

Related

PHP Escape a string if it hasn't already been escaped with entities

I'm using a 3rd party API that seems to return its data with the entity codes already in there. Such as The Lion’s Pride.
If I print the string as-is from the API it renders just fine in the browser (in the example above it would put in an apostrophe). However, I can't trust that the API will always use the entities in the future so I want to use something like htmlentities or htmlspecialchars myself before I print it. The problem with this is that it will encode the ampersand in the entity code again and the end result will be The Lion&#8217;s Pride in the HTML source which doesn't render anything user friendly.
How can I use htmlentities or htmlspecialchars only if it hasn't already been used on the string? Is there a built-in way to detect if entities are already present in the string?
No one seems to be answering your actual question, so I will
How can I use htmlentities or htmlspecialchars only if it hasn't already been used on the string? Is there a built-in way to detect if entities are already present in the string?
It's impossible. What if I'm making an educational post about HTML entities and I want to actually print this on the screen:
The Lion&#8217;s Pride
... it would need to be encoded as...
The Lion&amp&semi;&num;8217&semi;s Pride
But what if that was the actual string we wanted to print on the string ? ... and so on.
Bottom line is, you have to know what you've been given and work from there – which is where the advice from the other answers comes in – which is still just a workaround.
What if they give you double-encoded strings? What if they start wrapping the html-encoded strings in XML? And then wrap that in JSON? ... And then the JSON is converted to binary strings? the possibilities are endless.
It's not impossible for the API you depend on to suddenly switch the output type, but it's also a pretty big violation of the original contract with your users. To some extent, you have to put some trust in the API to do what it says it's going to do. Unit/Integration tests make up the rest of the trust.
And because you could never write a program that works for any possible change they could make, it's senseless to try to anticipate any change at all.
Decode the string, then re-encode the entities. (Using html_entity_decode())
$string = htmlspecialchars(html_entity_decode($string));
https://eval.in/662095
There is NO WAY to do what you ask for!
You must know what kind of data is the service giving back.
Anything else would be guessing.
Example:
what if the service is giving back & but is not escaping ?
you would guess it IS escaping so you would wrongly interpret as & while the correct value is &
I think the best solution, is first to decode all html entities/special chars from the original string, and then html encode the string again.
That way you will end up with a correctly encoded string, no matter if the original string was encoded or not.
You also have the option of using htmlspecialchars_decode();
$string = htmlspecialchars_decode($string);
It's already in htmlentities:
php > echo htmlentities('Hi&mom', ENT_HTML5, ini_get('default_charset'), false);
Hi&mom
php > echo htmlentities('Hi&mom', ENT_HTML5, ini_get('default_charset'), true);
Hi&amp&semi;mom
Just use the [optional]4th argument to NOT double-encode.

how to replace '\\\' to '\'?

my code is not working ? and i dont want to use str_replace , for there maybe more slashes than 3 to be replaced. how can i do the job using preg_replace?
my code here like this:
<?php
$str='<li>
<span class=\"highlight\">Color</span>
Can\\\'t find the exact color shown on the model pictures? Just leave a message (eg: color as shown in the first picture...) when you place order.
Please note that colors on your computer monitor may differ slightly from actual product colors depending on your monitor settings.
</li>';
$str=preg_replace("#\\+#","\\",$str);
echo $str;
There is merit in the other answers, but to me it looks like what you're actually trying to accomplish is something very different. In the php code \\\' is not three slashes followed by an apostrophe, it's one escaped slash followed by an escaped apostrophe, and in the rendered output, that's exactly what you see—a slash followed by an apostrophe (with no need to escape them in the rendered html). It's important to realize that the escape character is not actually part of the string; it's merely a way to help you represent a character that normally has very different meaning in within php—in this case, an apostrophe normally terminates a string literal. What looks like 4 characters in php is actually only 2 characters in the string.
If this is the extent of your code, there's no need for string manipulation or regular expressions. What you actually need is just this:
<?php
$str='<li>
<span class="highlight">Color</span>
Can\'t find the exact color shown on the model pictures? Just leave a message (eg: color as shown in the first picture...) when you place order.
Please note that colors on your computer monitor may differ slightly from actual product colors depending on your monitor settings.
</li>';
echo $str;
?>
Only one escape character is needed here for the apostrophe, and in the rendered HTML you will see no slashes at all.
Further Reading:
Escape sequences
The root of this problem is actually in how it was written into your database and likely to be caused by magic_quotes_gpc; this was used in older versions and a really bad idea.
The best fix
This requires a few steps:
Fix the script that puts the HTML inside your database by disabling magic_quotes_gpc.
Write a script that reads all existing database entries, applies stripslashes() and saves the changes.
Fix the presentation part (though, that may need no changes at all.
Alternative patch
Use stripslashes() before you present the HTML.
use this pattern
preg_replace('#\\+#', '\\', $text);
This replaces two or more \ symbols preceding an ' symbol with \'
$theConvertedString = preg_replace("/\\{2,}'/", "\'", $theSourceString);
Ideally, you shouldn't have code causing this issue in the first place so I would have a look at why you have \\' in your code to begin with. If you've manually put it in your variables, take it out. Often, this also happens with multiple calls to addslashes() or mysql_real_escape_string() or a cheap hosting providers' automatic transformation of all POST request variables to escape slashes, combined with your server side PHP code to do the same.

magento escape string for JavaScript part 2

This is a follow up on
magento escape string for javascript
where I accepted #AlanStorm suggestion to use json_encode to escape string literals.
But I now have a new problem with this solution.
when trying to escape a URL that has /'s in it to be rendered as a string literal for JavaScript json_encode seems to add redundant \'s in front of the /'s.
Any new suggestions here?
solutions should take a string variable and return a string that would properly be evaluated to a string literal in JavaScript. (I don't care if its surrounded with single or double quotes - although I prefer single quotes. And it must also support newlines in the string.)
Thanks
some more info: how comes '/');echo
json_encode($v); ?> results in
{"a":"\/"} ?
Details can be found here http://bugs.php.net/bug.php?id=49366
work around for this issue:
str_replace('\\/', '/', $jsonEncoded);
for your issue you can do something like
$jsonDecoded = str_replace(array("\\/", "/'s"), array("/", "/\'s"), $jsonEncoded);
Hope this helps
When I check the JSON format I see that solidi are allowed to be escaped so json_encode is in fact working correctly.
(source: json.org)
The bug link posted by satrun77 even says "It's not incorrect to escape slashes."
If you're adamant to do without and (in this case) are certain to be working with a string you can use a hack like this:
echo '["', addslashes($string), '"]';
Obviously that doesn't help for more complicated structures but as luck has it, you are using Magento which is highly modifiable. Copy lib/Zend/Json/Encoder.php to app/core/local/Zend/Json/Encoder.php (which forms an override) and fix it's _encodeString method.

Showing plain PHP code in a HTML page

And I'm talking (especially) forums here - [PHP]code here[/PHP] - style. Some forums escape double quotes or other "dangerous characters" and others don't.
What is the best method? What are you guys using?
Can it be done without the fear of code injection?
Edit: Who said anything about reinventing the wheel?
When PHP echo or print text, it never executes it. That only happens with eval. This means that if you did this:
echo '<?php ... ?>';
it would carry through to the page output and not be parsed or executed.
This means that all you need to do is escape the usual characters (<, >, &, etc.) and you should generally be safe.
Don't reinvent the wheel. I see BBCode in your question. Grab a markdown library and use it instead. SO uses this: http://daringfireball.net/projects/markdown/
There is no fear of PHP code injection (unless you are doing some unusual things like eval'ing HTML templates) but always a fear of JS code injection, often called XSS. And all danger coming only from possible JS code.
Thus, there is no special treatment for the PHP code, shown on a HTML page. Just treat it as any other data. < > brackets usually being escaped, for obvious reason.
Don't reinvent the wheel. PHP has it's highlight_string function for this
If you see escaped quotes on some page, that's most likely because their script escaped them twice (for example magic_quotes did it once, then mysql_query() again). When data sanitisation is done properly, you should not see escape characters in output.

escaping json string with a forward slash?

I am having a problem passing a json string back to a php script to process.
I have a json string that's been created by using dojo.toJson() that contains a / and looks like this:
[{"id":"2","company":"My Company / Corporation","jobrole":"Consultant","jobtitle":"System Integration Engineer"}]
When I pass the string back to the php script it get's chopped at the / and creates a malformed json string, which then means I can't convert it into a php array.
What is the best way of escaping the / in this string? I was looking at regular expressions and doing a string.replace() however my regex isn't that strong, and I'm not sure if there are better ways of doing this?
Many thanks
You shouldn't need to do anything special to represent a / in JSON - a string can contain any character except a " or (when not used to start an escape sequence) \.
The problem is possibly therefore in:
the way you parse the JSON server side
the way your parse the HTTP data to get the JSON string
the way you encode the string before making the HTTP request
(I'd bet on it being the last of those options).
I would start by using a tool such as LiveHttpHeaders or Charles Proxy to see exactly what data is sent to the server.
(I'd also expand the question with the code you use to make the request, and the code you use to parse it at the other end).
\/. Take a look here. The documentation is really easy to read, concise and clear. But unescaped / should still be valid in JSON's string so maybe your bug is somewhere else?
Ok. Anyway.
When passing variables to PHP don't use JSON - it's good for passing variables other way.
Instead you better use http://api.dojotoolkit.org/jsdoc/1.3/dojo.objectToQuery method and on PHP side parse standard PHP $_GET variables.
EDIT: Ok, I'm 'lost in the woods' here also, but here's a tip - check if you don't have some mod_rewrite rules in action here. Kind of seems like that.
Also, if you can send me the URL which gave you 404 (you can cut out domain part, i'm interested in script filename and all afterwards) maybe I can give you more detailed answer.
To be clear, whether you choose to send JSON to PHP or use regular form values is a matter of preference. It /should/ work either way. It sounds like you aren't url-encoding the JSON at the client-side so the server-side is treating / as a path delimiter. In which case its borked before json_decode gets to it.
so, try encodeURIComponent( dojo.toJson(stuff) )
json_encode() used to escape forward slashes. like this:
prompt> json_encode(json_decode('"A/B"'));
string(6) ""A\/B""
JSON_UNESCAPED_SLASHES was added in PHP5.4 to suppress this behavior.

Categories