Extra linebreak when producing JSON with PHP

Extra linebreak when producing JSON with PHP - php

I have some chunks of HTML in my database that I put in a JSON this way, upon first page render:
foreach ($data as $key => $value) {
$output .= '
"'.$key.'" : "'.addSlashes($value).'",
';
}
usually it works fine, except in a few cases where I get a leading line feed like this below:
"html" : "
<div id=\"object_01\" class=\"object_textbox\" namn=\"object_01\" style=\"z-index:1;font-family:Arial;left:750.9776000976563px;top:30.979339599609375px;position:absolute;width:709.6444444656372px;height:327.6444444656372px;\"><div class=\"object_inner\" style=\"border:0px solid rgb(0,0,0);background-color:rgb(255,255,255);color:rgb(0,0,0);opacity:.8;font-size:28px;line-height:42px;padding:20px;\">“Behold my Beloved Son, in whom I am well pleased, in whom I have glorified my name—hear ye him.” And it came to pass, as they understood they cast their eyes up again towards heaven; and behold, they saw a Man descending out of heaven; and he was clothed in a white robe; and he came down and stood in the midst of them...</div></div><div id=\"object_02\" class=\"object_textbox\" namn=\"object_02\" style=\"z-index:2;font-family:Arial;position:absolute;top:309.9988098144531px;left:1269.9826049804688px;width:187.6444444656372px;height:48.64444446563721px;\"><div class=\"object_inner\" style=\"border:0px solid rgb(0,0,0);font-style:italic;font-weight:bold;\">3 Nephi 11:7-8</div></div>",
Which obviously breaks the entiry page since javascript doesn't support linefeeds inside strings.
Any idéa what might cause it, or a way around it?
All the data from the database is picked up from a page at form save, so the user never gets to manually insert that linefeed... I just use jQuery element.html() to pick it up so there shouldn't be a linefeed there...

Never build JSON "manually". Use the built in function:
<?php
$output = json_encode($data);
One of the reasons JSON is so popular (other than it being so compatible with JavaScript) is that it is a widely supported serialization scheme. Almost every language and platform now have libraries for working with it so that you can reliably serialize and deserialize your data in any environment for any other.
Once you're using json_encode, whatever ends up in that data is what was there before serialization. So if the line break is still there, it's not the serialization causing it--you'll need to find it in your other code/data.

Related

Do you bother with markup formatting?

I am a front end guy who is getting more and more into scripting and that being the case, I like my regurgitated markup to kind of look nice.
I ran a loop over some database values for a list and while most sites would just show a big old concatenated slew of <LI> tags back to back, I kind of like them \r\n distanced with proper \t tabbing. Weird thing is, the first list member renders like LI> rather than <LI> about 1 out of 5 page serves.
Anyone seen this? Should I not bother? Am I formatting the loops badly? Here's an example:
while ($whatever = mysql_fetch_array($blah_query)){
echo "\t\t\t\t\t\t";
echo "<li>\n";
echo "\t\t\t\t\t\t";
echo '<a href="#'.$whatever['name'].'" id="category_id_'.$whatever['id'].'">';
echo ucfirst($whatever['name']);
echo "</a>\n\t\t\t\t\t\t</li>\n";
}

this seems as if the goal is to output a page source that types out the proper indentions for you?
at least for right now to debug and be easier read?
while ($whatever = mysql_fetch_array($blah_query)){
echo "\t\t\t\t\t\t";
echo "<li>\n";
echo "\t\t\t\t\t\t";
echo '<a href="#'.$whatever['name'].'" id="category_id_'.$whatever['id'].'">';
echo ucfirst($whatever['name']);
echo "</a>\n\t\t\t\t\t\t</li>\n";
}
since you're using PHP to echo out those HTML codes, just type them as you would see them on the page source
while($whatever = mysql_fetch_array($blah_query)){
//When you want a new line, just hit enter. PHP will echo the carriage returns too
echo'
<li>
ucfirst($whatever['name'])
</li>
';
}
this is how I would do it so that it would line break every time including the first time incase I have a left over "</div>" or some other closing tag without a line break after it.
it will output a nicer clean list item that tabbed in with the breaks

Removing spaces between code can significantly decrease the sizes of files especially if your code is of significant length. By removing any indenting and minimising spaces within files, you can maximise connection speeds to your site by delivering the requested pages considerably faster than if you were indenting. This adds up if your website is receiving any reasonable amount of traffic, as each page served may be made more efficient by removing 5-10kb of spacing. In the long run, if you're serving users pages regularly, the added network strain can be minimised by ensuring your code uses as little of the space as possible.
Although, if you happen to be developing in a private environment, it's good practice to use indenting for debugging purposes. The style of the code allows you to follow it's logic and flow in comparison to minified code that lacks legibility.

Typically, removing the spaces between elements, is a way to 'save bandwidth' for high traffic sites. It is something akin to minifying JavaScript or CSS. If you are still in 'testing/development' mode, then sure, indent it, so you can see if you are making mistakes. However in any production environment, with any appreciable traffic, you should 'minify' you html too.
This is not just to cut back on the monetary cost of bandwidth. This is to cut down on the system resources cost as well. It takes a little longer to send a 45k file (with spaces) and it does a 29k file (without spaces). Therefore, your server can push it out faster, which in turn means it can free up a open connection faster, which means it can accept a new incoming connection now. There are lots of talks dedicated to this idea, of minification. Minification, coupled with compression, is the leading reason in why webpages are held to high standards for loading quickly. The less you send out, the faster you can do so, the more people you can get it too.

I am like you. Everything must be clean and tidy.
I would recommend using XSL as a templating engine. This autmatically make all your HTML properly formatted if you set it to formatOutput = true.
I use those setting for my local copy, but for the live copy I set XSL to use no fromatting and white space. This returns all the HTML on one line. This saves about 20-30% or whatever of the HTML file size. So you save bandwidth and get quicker load times. Probably slightly quicker for browsers to render too.
See:
http://www.php.net/xsl
$xsl->preserveWhiteSpace = false;
$xsl->formatOutput = TRUE OR FALSE;
Just looking at my code the above is what I use to either set to indent nicely, or output all on one line.

PHP, convert $_GET elements to HTML entities

Edit: I think %27 is actually the wrong kind of quote. I am still stuck though, I cannot find a PHP function that does the conversion I want.
Edit (again): I found a solution where I stick %26rsquo%3bs into the URL and it turns into ’. It works so I posted it as an answer below but I'd still be interested in knowing how it'd be done with PHP functions.
I'm working on a website that uses a PHP tree as if it were a directory. For example, if someone types index.php?foo=visual programming (or index.php?foo=visual%20programming) then the website opens the item "Visual Programming" (I'm using strtolower()).
Another working example would be index.php?foo=visual programming&bar=animated path finder which opens "Animated Path Finder", a child of "Visual Programming".
The problem is that some of the items are named things like "Conway’s Game of Life" which uses a HTML entity. My guess of what someone should type to open this would be index.php?foo=visual%20programming&bar=conway%27s%20game%20of%20life. The problem is that ' is not === to ’.
What do I need to do to make this work? Here is my code that selects an item based on $_GET (the PHP is inside of <script type="text/javascript">):
<?php
function echoActiveDirectory($inTree) {
// Compare $_GET with PhpTree
$itemId = 0;
foreach ($_GET as $name) {
if ($inTree->children !== null) {
foreach ($inTree->children as $child) {
if (strtolower($child->title) === strtolower($name)) {
$itemId = $child->id;
$inTree = $child;
break;
}
}
}
}
// Set jsItems[$itemId].selected(), it will be 0 if nothing was found
echo "\t\tjsItems[".$itemId."].selected();\n";
}
echo "// Results of PHP echoActiveDirectory(\$root)\n";
echoActiveDirectory($root);
?>
The website is a work in progress, but it can be tested here to see $_GET working: http://alexsimes.com/index.php

The hex code %27 (39 decimal) will never translate to ’, since it is a completely different entity (Wikipedia). It could be translated to &apos;, but PHP doesn't do that (although I don't know the reason for that).
Edit
While there is no standard for URL-encoding multibyte character sets, PHP will treat a string as just a set of bytes, and if those match an UTF-8 sequence, it will work:
php -r 'echo htmlentities(urldecode("%E2%80%99"), ENT_QUOTES|ENT_HTML401);'
should output
’

You can use html_entity_encode() and html_entity_decode() PHP functions to convert those characters to html entities or decode them back to desired characters before comparison.

You can try the htmlentities function to convert special characters to corresponding html entity. But in your case if data is already stored in db as html entity form, the data from $_GET parameter must be first passed through htmlentities before using it in your query.

Not able to parse this json

I am trying to parse the json output from
http://www.nyc.gov/portal/apps/311_contentapi/services/all.json
And my php json_decode returns a NULL
I am not sure where the issue is, I tried running a small subset of the data through JSONLint and it validated the json.
Any Ideas?

The error is in this section:
{
"id":"2002-12-05-22-24-56_000010083df0188b4001eb56",
"service_name":"Outdoor Electric System Complaint",
"expiration":"2099-12-31T00:00:00Z",
"brief_description":"Report faulty Con Edison equipment, including dangling or corroded power lines or "hot spots.""
}
See where it says "hot spots." in an already quoted string. Those "'s should've been escaped. Since you don't have access to edit the JSON perhaps you could do a search for "hot spots."" and replace it with \"hot spots.\"" like str_replace('"hot spots.""', '\\"hot spots.\\""\, $str); for as long as that's in there. Of course that only helps if this is a one time thing. If the site continues to make errors in their JSON output you'll have to come up with something more complex.

What I did to identify the errors in the JSON ...
Since faulty quoting is the first thing to look for, I downloaded the JSON to a text file, opened in a text editor (I used vim but any full featured editor would do), ran a search and replace that removed all characters except double-quote and looked at the result. It was clear that correct lines should have 4 double-quotes so I simply searched for 5 double-quotes together and found the first bad line. I noted the line number and then undid the search and replace to get the original file back and looked at that line. This gives you what you need to get the developers of the API to fix the JSON.
Writing code to automatically fix the bad JSON before giving it to json_decode() would be quite a bit harder but doable using techniques like those in another answer.

According to the PHP manual:
In the event of a failure to decode, json_last_error() can be used to determine the exact nature of the error.
Try calling it to see where the error is.

Why doesn't jQuery.parseJSON() work on all servers?

Hey there, I have an Arabic contact script that uses Ajax to retrieve a response from the server after filling the form.
On some apache servers, jQuery.parseJSON() throws an invalid json excepion for the same json it parses perfectly on other servers. This exception is thrown only on chrome and IE.
The json content gets encoded using php's json_encode() function. I tried sending the correct header with the json data and setting the unicode to utf-8, but that didn't help.
This is one of the json responses I try to parse (removed the second part of if because it's long):
{"pageTitle":"\u062e\u0637\u0623 \u0639\u0646\u062f \u0627\u0644\u0625\u0631\u0633\u0627\u0644 !"}
Note: This language of this data is Arabic, that's why it looks like this after being parsed with php's json_encode().
You can try to make a request in the examples given down and look at the full response data using firebug or webkit developer tools. The response passes jsonlint!
Finally, I have two urls using the same version of the script, try to browse them using chrome or IE to see the error in the broken example.
The working example : http://namodg.com/n/
The broken example: http://www.mt-is.co.cc/my/call-me/
Updated: To clarify more, I would like to note that I manged to fix this by using the old eval() to parse the content, I released another version with this fix, it was like this:
// Parse the JSON data
try
{
// Use jquery's default parser
data = $.parseJSON(data);
}
catch(e)
{
/*
* Fix a bug where strange unicode chars in the json data makes the jQuery
* parseJSON() throw an error (only on some servers), by using the old eval() - slower though!
*/
data = eval( "(" + data + ")" );
}
I still want to know if this is a bug in jquery's parseJSON() method, so that I can report it to them.

Found the problem! It was very hard to notice, but I saw something funny about that opening brace... there seemed to be a couple of little dots near it. I used this JavaScript bookmarklet to find out what it was:
javascript:window.location='http://www.google.com/search?q=u+'+('000'+prompt('String?').charCodeAt(prompt('Index?')).toString(16)).slice(-4)
I got the results page. Guess what the problem is! There is an invisible character, repeated twice actually, at the beginning of your output. The zero width non-breaking space is also called the Unicode byte order mark (BOM). It is the reason why jQuery is rejecting your otherwise valid JSON and why pasting the JSON into JSONLint mysteriously works (depending on how you do it).
One way to get this unwanted character into your output is to save your PHP files using Windows Notepad in UTF-8 mode! If this is what you are doing, get another text editor such as Notepad++. Resave all your PHP files without the BOM to fix your problem.
Step 1: Set up Notepad++ to encode files in UTF-8 without BOM by default.
Step 2: Open each existing PHP file, change the Encoding setting, and resave it.

You should try using json2.js (it's on https://github.com/douglascrockford/JSON-js)
Even John Resig (creator of jQuery) says you should:
This version of JSON.js is highly recommended. If you're still using the old version, please please upgrade (this one, undoubtedly, cause less issues than the previous one).
http://ejohn.org/blog/the-state-of-json/

I don't see anything related to parseJSON()
The only difference I see is that in the working example a session-cookie is set(guess it is needed for the "captcha", the mathematical calculation), in the other example no session-cookie is set. So maybe the comparision of the calculation-result fails without the session-cookie.

Images not uploading when htmlentities has 'UTF-8' set

I have a form that, among other things, accepts an image for upload and sticks it in the database. Previously I had a function filtering the POSTed data that was basically:
function processInput($stuff) {
$formdata = $stuff;
$formdata = htmlentities($formdata, ENT_QUOTES);
return "'" . mysql_real_escape_string(stripslashes($formdata)) . "'";
}
When, in an effort to fix some weird entities that weren't getting converted properly I changed the function to (all that has changed is I added that 'UTF-8' bit in htmlentities):
function processInput($stuff) {
$formdata = $stuff;
$formdata = htmlentities($formdata, ENT_QUOTES, 'UTF-8'); //added UTF-8
return "'" . mysql_real_escape_string(stripslashes($formdata)) . "'";
}
And now images will not upload.
What would be causing this? Simply removing the 'UTF-8' bit allows images to upload properly but then some of the MS Word entities that users put into the system show up as gibberish. What is going on?
**EDIT: Since I cannot do much to change the code on this beast I was able to slap a bandaid on by using htmlspecialchars() rather than htmlentities() and that seems to at least leave the image data untouched while converting things like quotes, angle brackets, etc.
bobince's advice is excellent but in this case I cannot now spend the time needed to fix the messy legacy code in this project. Most stuff I deal with is object oriented and framework based but now I see first hand what people mean when they talk about "spaghetti code" in PHP.

function processInput($stuff) {
$formdata = $stuff;
$formdata = htmlentities($formdata, ENT_QUOTES);
return "'" . mysql_real_escape_string(stripslashes($formdata)) . "'";
}
This function represents a basic misunderstanding of string processing, one common to PHP programmers.
SQL-escaping, HTML-escaping and input validation are three separate functions, to be used at different stages of your script. It makes no sense to try to do them all in one go; it will only result in characters that are ‘special’ to any one of the processes getting mangled when used in the other parts of the script. You can try to tinker with this function to try to fix mangling in one part of the app, but you'll break something else.
Why are images being mangled? Well, it's not immediately clear via what path image data is going from a $_FILES temporary upload file to the database. If this function is involved at any point though, it's going to completely ruin the binary content of an image file. Backslashes removed and HTML-escaped... no image could survive that.
mysql_real_escape_string is for escaping some text for inclusion in a MySQL string literal. It should be used always-and-only when making an SQL string literal with inserted text, and not globally applied to input. Because some things that come in in the input aren't going immediately or solely to the database. For example, if you echo one of the input values to the HTML page, you'll find you get a bunch of unwanted backslashes in it when it contains characters like '. This is how you end up with pages full of runaway backslashes.
(Even then, parameterised queries are generally preferable to manual string hacking and mysql_real_escape_string. They hide the details of string escaping from you so you don't get confused by them.)
htmlentities is for escaping text for inclusion in an HTML page. It should be used always-and-only in the output templating bit of your PHP. It is inappropriate to run it globally over all your input because not everything is going to end up in an HTML page or solely in an HTML page, and most probably it's going to go to the database first where you absolutely don't want a load of < and & rubbish making your text fail to search or substring reliably.
(Even then, htmlspecialchars is generally preferable to htmlentities as it only encodes the characters that really need it. htmlentities will add needless escaping, and unless you tell it the right encoding it'll also totally mess up all your non-ASCII characters. htmlentities should almost never be used.)
As for stripslashes... well, you sometimes need to apply that to input, but only when the idiotic magic_quotes_gpc option is turned on. You certainly shouldn't apply it all the time, only when you detect magic_quotes_gpc is on. It is long deprecated and thankfully dying out, so it's probably just as good to bomb out with an error message if you detect it being turned on. Then you could chuck the whole processInput thing away.
To summarise:
At start time, do no global input processing. You can do application-specific validation here if you want, like checking a phone number is just numbers, or removing control characters from text or something, but there should be no escaping happening here.
When making an SQL query with a string literal in it, use SQL-escaping on the value as it goes into the string: $query= "SELECT * FROM t WHERE name='".mysql_real_escape_string($name)."'";. You can define a function with a shorter name to do the escaping to save some typing. Or, more readably, parameterisation.
When making HTML output with strings from the input or the database or elsewhere, use HTML-escaping, eg.: <p>Hello, <?php echo htmlspecialchars($name); ?>!</p>. Again, you can define a function with a short name to do echo htmlspecialchars to save on typing.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Extra linebreak when producing JSON with PHP - php

Related

Do you bother with markup formatting?

PHP, convert $_GET elements to HTML entities

Not able to parse this json

Why doesn't jQuery.parseJSON() work on all servers?

Images not uploading when htmlentities has 'UTF-8' set

Categories

Resources