hidden field becomes visible with large amount of serialized data - php

I have hidden fields which contain a very large amount of serialized data (I'm talking around 1300 records from a database). With all of this data, the hidden fields become visible as text boxes containing the serialized data. When, on the other hand, I limit this data to say, 200 records, the fields remain hidden as they should. I went ahead and inspected this issue in Chrome's "inspect element" and I noticed that many of the HTML characters throughout the page are all out of place.
For example:
class="cont_buffer" turned into class="”cont_buffer”"
The extra quotation marks are messing up the input type="hidden" and thus showing the fields.
What could I do to get around this issue?

Two things:
First, it sounds like you should be html encoding the value of the hidden input with htmlentities($value, ENT_QUOTES) to make sure quotes are properly encoded for use as an HTML attribute value.
However, there is probably a better way to do what you are doing rather than to pass thousands of records back to the browser so they can be returned by the browser to the server. Instead of passing to a hidden input, you are probably better off storing the rows temporarily in $_SESSION. In some cases, it may even be faster to just requery that many rows when the form post is recieved, rather than have the browser pass them back.
If you are serializing the rows in PHP with serialize(), I assume you intend to unserialize() them in PHP. Use extreme caution when doing so, because this opens the possibility that the browser could send malicious data back that would unserialize into something harmful to your application. If you must send serialized data from the browser to unserialize(), be certain to validate the object or array you receive -- make sure it contains all the keys or properties you expect, and only those you expect so that you don't have problems when iterating over it.

Related

When don't you sanitize data?

So I have been working on an API for a little side project and all the user input won't be directly used for SQL or displayed to the users so do I need to sanitize it? If I do what should I do exactly?
I am currently checking to make sure if they integers, strings, arrays, etc. but other than that is there anything else I need to do?
The question is always for what purpose? If you just take values you get from a user and you don't do anything with them (you just store and display them), then there's nothing to sanitize. If you let those values "actively do" something, you may want to sanitize them to avoid them doing something you don't like.
For instance, you accept HTML input from a user and want HTML formatted content, but you want to avoid XSS problems; in this case you will want to selectively remove HTML elements, i.e. you want to sanitize the input.
// some HTML is allowed, but not everything
echo remove_unwanted_html_elements($_POST['content']);
If OTOH you do not allow HTML input to be interpreted anyway, i.e. whatever the user posts is just displayed literally back to him without any of being interpreted as HTML, then you do not need to sanitize anything. You may just need to escape the content according to its target format.
// don't care what the user enters, just display it right back as is
echo htmlspecialchars($_POST['content']);
Sanitization is only relevant if you evaluate the value in some not entirely predictable way. Sanitization means to take a value and change it into something else, typically removing something from it. This must be very targeted and purposeful since it can be a very error prone operation; you don't just sanitize data somehow just because. The other alternative is simple validation, i.e. checking that a value conforms to expected norms and otherwise rejecting it outright.
Even taking a supposed number entered by the user and casting it to an int is a very simple form of sanitization; it's effective since it means you are guaranteed to get a harmless number, but that number may or may not have anything to do with the value the user submitted. Validation may be the better option here.

How to encode large HTML lists for jQuery?

I have a PHP script that fetches a relatively large amount of data, and formats it as HTML unordered lists for use in an Ajax application.
Considering the data is in the order of tens to possibly more than a hundred KB, and that I want to be able to differentiate between the different lists with Javascript, what would be the best way to go about doing this?
I thought about json_encode, but that results in [null] when more than a certain amount of rows are requested (maybe PHP memory limit?).
Thanks a lot,
Fela
Certain illegal characters in the string could be breaking the json_encode() function in PHP which you will need to sanitize before this will work correctly. You could do this using regular expressions if this becomes a problem.
However, if you are sending requests with that amount of data it may be unwise to send this using AJAX as your application will seem very unresponsive. It may be better to get this data directly from the database as this would be a far faster method although you will have to obviously compromise.
I cannot click up, but I agree with Daniel West.
Ensure your strings are UTF-8 encoded or use mysql_set_charset('utf8') when you connect. The default charset for mysql is unfortunately Latin/Windows. Null is the result of a failed encoding because of this, if it was out of memory, the script itself would fail.
I would pass the data around via JSON, and do it in small batches that you can present incrementally. This will make it appear faster and may get around the memory issues you are having. If the data above the scroll ( first 40 lines or so ) loads quick, it is ok if the rest of the page takes several seconds to load. If you want to get tricky, you can even load the first page and then wait for a scroll event to load the rest, so you don't have to hit the server too much if the user never scrolls to look at the data below the scrollbar. If php is returning null from the json_encode it is because of invalid characters. If you cant control the data, you could just send HTML from the server to the client and avoid all the encoding/decoding, but this means more data to transfer.
With jquery you can transform your unordered list into a javascript array in your ajax application.
$.map( $('li'), function (element) { return $(element).text() });
Also, underscorejs as some very neat functions for javascript arrays and collections.
http://documentcloud.github.com/underscore/
I would prefer JSON, and I thought the 'null' you get from PHP encode is resulted from jSON's double escaping mechanism with javaScript. I have explained it in another post.
You need to double escape special character(One suspicious 'null' cause is '\n' in your case)
json parse error with double quotes

Filtering user input - clarification needed

I would like to clarify what is the proper way to filter user input with php. For example I have a web form that a user enters information into. When submitted the data from the form will be entered into a database.
My understanding is you don't want to sanitize the data going into the database, except for escaping it such as mysql_escape_string, you want to sanitize it when displaying it on the front end with something like htmlentities or htmlspecialchars. However if you want you can validate/filter the user input when they submit the form to make sure the data is in the proper format such as if a field is for an email address you want to validate that it has the proper email format. Is that correct?
My next question is what do you do with the data when you re-display it in a web form? Lets say the user is allowed to edit the information in that form after they filled it out and the information was added to the database. They then go back in and see the data in the fields they originally entered, do you have to sanitize the data for it to show correctly in the form fields? For example there is a field called My Title, the person enters My title is "Manager". You see the quotations around manager, when you display it as is into the form field it breaks because of the quotations:
<input type="text" name="title" value="My title is "Manager"">
So don't you have to do something like htmlentities to turn the quotations into its html entities? Otherwise the value of the field would look like My title is
Hope this makes sense.
Nothing says you can't sanitize data before database insertion. After all, if your script/site/company has a certain policy regarding what's acceptable in a form field, it's best to strip out anything that's not allowed before saving it. That way you only sanitize once, before data insertion/update, rather than EVERY TIME you retrieve the data.
If you allow HTML entities for (say) accented characters, but not HTML tags, then you have to both check for invalid entities (&foobar;?) and HTML tags as well. Since you don't allow them, don't bother storing them. If you require a valid email address, then check if it's at RFC 5322 compliant and only store it once the user's entered proper data. (Whether that email address actually exists is another matter).
Now, let's get one thing straight. There's a difference between sanitization and escaping. Sanitization means literally to clean up - you're removing anything you don't want from the data. You can either silently drop it, or present an error to the user and tell them to fix it. On the other hand, escaping is just a means of encoding data so it's displayed properly.
With your My title is "Manager" string, you don't need to sanitize it, as there's nothing really wrong or offensive about it. What you do need to do is escape it, with at least htmlspecialchars(), so that the embedded double quotes don't "break" your form. If you embed it verbatim, most browsers will see it as having value="My title is" and some bogus attribute/garbage Manager"". So, you run it through htmlspecialchars and end up My title is "Manager", which embeds into the value="" perfectly with no trouble. No sanitization, just proper encoding.
Now, when that form is submitted, then you do have to sanitize/validate again, as the data's been in the hands of a potentially malicious user, and the data could have been changed to My title is <script>document.location='http://attacksite.com';</script>pwn me.
Basically, the workflow should be:
present form to user
get data submitted.
sanitize data
if form is not correctly filled out, displays errors and go to 1)
escape data for sql query
insert into database
then later
retrieve data from database
escape/encode as appropriate for however it will be displayed
display data. if data's going into a form, do 1-6 as before.

working with user input data in php. What's better?

I am trying to figure out what is the best way to manage the data a user inputs concerning non desirable tags he might insert:
strip_tags() - the tags are removed and they are not inserted in the database
the tags are inserted in the database, but when reading that field and displaying it to the user we would use htmlspecialchars()
What's the better, and is there any disadvantage in any of these?
Regards
This depends on what your priority is:
if it's important to display special characters from user input (like on StackOverflow, for example), then you'll need to store this information in the database and sanitize it on display - in this case, you'll want to at least use htmlspecialchars() to display the output (if not something more sophisticated)
if you just want plain text comments, use strip_tags() before you stick it in the database - this way you'll reduce the amount of data that you need to store, and reduce processing time when displaying the data on the screen
the tags are inserted in the database, but when reading that field and displaying it to the user we would use htmlspecialchars()
This. You usually want people to be able to type less-than signs and ampersands and have them displayed as such on the page. htmlspecialchars on every text-to-HTML output step (whether that text came directly from user input, or from the database, or from somewhere else entirely) is the right way to achieve this. Messing about with the input is a not-at-all-appropriate tactic for dealing with an output-encoding issue.
Of course, you will need a different escape — or parameterisation — for putting text in an SQL string.
The measures taken to secure user input depends entirely on in what context the data is being used. For instance:
If you're inserting it into a SQL database, you should use parameterized statements. PHP's mysql_real_escape_string() works decently, as well.
If you're going to display it on an HTML page, then you need to strip or escape HTML tags.
In general, any time you're mixing user input with another form of mark-up or another language, that language's elements need to be escaped or stripped from the input before put into that context.
The last point above segues into the next point: Many feel that the original input should always be maintained. This makes a lot of sense when, later, you decide to use the data in a different way and, for instance, HTML tags aren't a big deal in the new context. Also, if your site is in some way compromised, you have a record of the exact input given.
Specifically related to HTML tags in user input intended for display on an HTML page: If there is any conceivable reason for a user to input HTML tags, then simply escape them. If not, strip them before display.

PHP & mySQL: When exactly to use htmlentities?

PLATFORM:
PHP & mySQL
For my experimentation purposes, I have tried out few of the XSS injections myself on my own website. Consider this situation where I have my form textarea input. As this is a textarea, I am able to enter text and all sorts of (English) characters. Here are my observations:
A). If I apply only strip_tags and mysql_real_escape_string and do not use htmlentities on my input just before inserting the data into the database, the query is breaking and I am hit with an error that shows my table structure, due to the abnormal termination.
B). If I am applying strip_tags, mysql_real_escape_string and htmlentities on my input just before inserting the data into the database, the query is NOT breaking and I am able to successfully able to insert data from the textarea into my database.
So I do understand that htmentities must be used at all costs but unsure when exactly it should be used. With the above in mind, I would like to know:
When exactly htmlentities should be used? Should it be used just before inserting the data into DB or somehow get the data into DB and then apply htmlentities when I am trying to show the data from the DB?
If I follow the method described in point B) above (which I believe is the most obvious and efficient solution in my case), do I still need to apply htmlentities when I am trying to show the data from the DB? If so, why? If not, why not? I ask this because it's really confusing for me after I have gone through the post at: http://shiflett.org/blog/2005/dec/google-xss-example
Then there is this one more PHP function called: html_entity_decode. Can I use that to show my data from DB (after following my procedure as indicated in point B) as htmlentities was applied on my input? Which one should I prefer from: html_entity_decode and htmlentities and when?
PREVIEW PAGE:
I thought it might help to add some more specific details of a specific situation here. Consider that there is a 'Preview' page. Now when I submit the input from a textarea, the Preview page receives the input and shows it html and at the same time, a hidden input collects this input. When the submit button on the Preview button is hit, then the data from the hidden input is POST'ed to a new page and that page inserts the data contained in the hidden input, into the DB. If I do not apply htmlentities when the form is initially submitted (but apply only strip_tags and mysql_real_escape_string) and there's a malicious input in the textarea, the hidden input is broken and the last few characters of the hidden input visibly seen as " /> on the page, which is undesirable. So keeping this in mind, I need to do something to preserve the integrity of the hidden input properly on the Preview page and yet collect the data in the hidden input so that it does not break it. How do I go about this? Apologize for the delay in posting this info.
Thank you in advance.
Here's the general rule of thumb.
Escape variables at the last possible moment.
You want your variables to be clean representations of the data. That is, if you are trying to store the last name of someone named "O'Brien", then you definitely don't want these:
O'Brien
O\'Brien
.. because, well, that's not his name: there's no ampersands or slashes in it. When you take that variable and output it in a particular context (eg: insert into an SQL query, or print to a HTML page), that is when you modify it.
$name = "O'Brien";
$sql = "SELECT * FROM people "
. "WHERE lastname = '" . mysql_real_escape_string($name) . "'";
$html = "<div>Last Name: " . htmlentities($name, ENT_QUOTES) . "</div>";
You never want to have htmlentities-encoded strings stored in your database. What happens when you want to generate a CSV or PDF, or anything which isn't HTML?
Keep the data clean, and only escape for the specific context of the moment.
Only before you are printing value(no matter from DB or from $_GET/$_POST) into HTML. htmlentities have nothing to do with database.
B is overkill. You should mysql_real_escape_string before inserting to DB, and htmlentities before printing to HTML. You don't need to strip tags, after htmlentities tags will be displayed on screen as < b r / > e.t.c
Theoretically you may do htmlentities before inserting to DB, but this might make further data processing harder, if you would need original text.
3. See above
In essence, you should use mysql_real_escape_string prior to database insertion (to prevent SQL injection) and then htmlentities, etc. at the point of output.
You'll also want to apply sanity checking to all user input to ensure (for example) that numerical values are really numeric, etc. Functions such as is_int, is_float, etc. are useful at this point. (See the variable handling functions section of the PHP manual for more information on these functions and other similar ones.)
I've been through this before and learned two important things:
If you're getting values from $_POST/$_GET/$_REQUEST and plan to add to DB, use mysql_real_escape_string function to sanitize the values. Do not encode them with htmlentities.
Why not just encode them with htmlentities and put them in database? Well, here's the thing - the goal is to make data as meaningful and clean as possible and when you encode the data with htmlentities like Jeff's Dog becomes Jeff"s Dog ... that will cause the context of data to lose its meaning. And if you decide to implement REST servcies and you fetch that string from DB and put it in JSON - it'll come up like Jeff"s Dog which isn't pretty. You'd have to add another function to decode as well.
Suppose you want to search for "Jeff's Dog" using SQL "select * from table where field='Jeff\'s Dog'", you won't find it since "Jeff's Dog" does not match "Jeff"s Dog." Bad, eh?
To output alphanumeric strings (from CHAR type) to a webpage, use htmlentities - ALWAYS!

Categories