Is it safe, to decode HTML entities all the time

Is it safe, to decode HTML entities all the time - php

Here's a scenario, I'm importing data from multiple different sources, and some do encode special chars, some don't.
For example some will send like this: 6.67" and others will send same data as 6.67".
Is there any possible downside (I don't care about any potential performance hit), if I simply run all strings through html_entity_decode?
If there's some downside, what would be the best way to ensure, that ultimately I end up with uniform values?

Related

Is this a good method of ensuring $_REQUEST data is safe?

Basically, my idea is that if I convert a string whose contents I cannot predict into a hex string, then in the script that receives the data, convert it back to a readable string, it will prevent any potential XSS vulnerabilities, as well as ensure that any special characters such as spaces, ampersands, question marks, etc. don't mess up execution of Script2.php. Is this correct, or is there more I need to do?
In Script1.php:
echo('<TD>Proceed</TD>');
In Script2.php:
echo('<input type="text" name="reason">' . strlen($_REQUEST['reason'])?pack('H*', $_REQUEST['reason']):'<I>No reason specified</I>' . '</input>');

Basically, there is exactly one thing that you need to look out for: when you issue a command to an external system, you have to make sure that the command means exactly what you think it means.
If you are programming in PHP, you frequently deal with two external systems:
the web browser to wich you send your HTML,
the database where you store data.
For point 1, filter data that comes from the database through htmlspecialchars(). There are cases when you don't want to do this, but in those cases you have to know exactly why this does not compromise the security of your users.
For point 2, use prepared statements to insert and update database records. For new code, there are no exceptions, regardless of where the data is comming from. For old code, that uses interfaces that do not support prepared statements, use something like mysql_real_escape_string() to prepare values for inserting into or updating the database; again, regardless of where the data is comming from.
These two points are technical requirements (i.e. they are imposed by the technology that you are using). Additionally, there might be business requirements (like a credit card number being valid, a birthdate beeing before Aug 30th, 1995, a venue can only be booked for up to 7 days, whatever). Technical requirements and business requirements change at different rates, so you should handle them in different components. Don't mix preparing data to be technically fit for insertion into the database with validating whether the data meets your business needs.
Applying this to your special scenario, it seems that in Script1.php, you want to use some data in the query string of a URL in a HTML document. That's what urlencode() is for. In Script2.php, the browser has sent you data that you want to sent back to the browser. This is usually not critical for your or your users security. Still, the data must be passed through htmlspecialchars, because if the user sends </input> as $_REQUEST['reason'] it will confuse the user. It is not clear, what you intend with strlen and pack; don't do that, it serves no purpose other than to confuse fellow developers (which is bad), users (which is also bad) and potential attackers (which they regard as a challange rather than a hindrance).

OWASP has some very detailed information on how to best prevent XSS: https://www.owasp.org/index.php/XSS_%28Cross_Site_Scripting%29_Prevention_Cheat_Sheet#A_Positive_XSS_Prevention_Model

What is the best way to sanitize user inputs?

I need to prevent XSS attacks as much as possible and in a centralized way so that I don't have to explicitly sanitize each input.
My question is it better to sanitize all inputs at URL/Request processing level, encode/sanitize inputs before serving, or at the presentation level (output sanitization)?
Which one is better and why?

There are two areas where you need to be aware:
Anywhere where you use input as part of a script in any language, most notably including SQL. In the particular case of SQL, the only recommended way of dealing with things is the use of parameterized queries (which will result in unescaped content being in the database, but just as strings: that's ideal). Anything involving the magic quoting of characters before substituting them directly into the SQL string is inferior (because it's so easy to get wrong). Anything that can't be done with a parameterized query is something that a service secured against SQL-injection should never allow a user to specify.
Anywhere where you present something that was input as output. The source of the input could be direct (including via a cookie) or indirect (via the database or a file). In this case, your default approach should be to make the text that the user sees be the text that was input. That's very easy to implement correctly since the only characters you actually have to quote are < and &, and you can wrap it all in <pre> for display.
But that's often not enough. For example, you might want to allow users to do some sort of formatting. This is where it is ever so easy to go wrong. The simplest approach in this case is to parse the input and detect all the formatting instructions; everything else needs to be quoted properly. You should store the formatted version additionally in the database as an extra column so that you don't need to do much work when returning it to the user, but you should also store the original version that the user input so you can search over it. Do not mix them up! Really! Audit your application to make totally sure that you get this right (or, better yet, get someone else to do the audit).
But everything about being careful with SQL still applies, and there are many HTML tags (e.g., <script>, <object>) and attributes (e.g., onclick) that are never ever safe.
You were looking for advice about specific packages to do the work? You really need to pick a language then. The above advice is all totally language-independent. Add-on packages/libraries can make many of the steps above really easy in practice, but you still absolutely need to be careful.

Safe and effective approach to transfer data between server and client with AJAX/PHP/MySQL?

Well I'm new to PHP but thinking about these stuff has really confused me these days.
Here is my question in detail:
First the situtation is that I need to transfer the data between clients and the server. Ajax should be used to present the data at clients' side. At the server side, PHP and MySQL will be used to parse the request, grab the data and then send it back.
Then the problem comes. User data sent from the brower can contain &, %, ', " which may damage the formatted POST request or making database attack. Although we can use JS to detect them, the user is still able to send data bypassing the validation, let alone we cannot simply remove them as they may be useful to the user.
So then I checked my weapon library.
At server side, it turns out that I have too many functions to deal with them:
Apart from regular expessions I have
htmlspecialchars, htmlspecialchars_decode, addslashes, stripslashes
urlencode, urldecode, mysql_escape_string, mysql_real_escape_string
At client side with javascript, I don't have that many but still have:
escape, unescape
OK. Learning that I have so many weapons is good but...how to choose, combine and use them is really a headache for me.
For example suppose I want to have a product name called:
'%Hooray%'+&ABC
Well maybe no product will be named like this in real life but let's use it as an example.
The & mark will break the POST message.
The + may impact ajax parsing.
Single quotes may allow SQL injection.
The % mark may cause problems but I'm not sure if it will.
But I still want that name exactly the same after sending it to the database and fetching it back,
which means, the name in database can be different but its presentation in the brower should be the same.
Well this question may be a little bit too long but hope somebody could share some good experience: how to deal with user input string using those functions?

Before sending your data via ajax, encode them:
Assuming the data is a javascript array.
Why an array?
Because if you have a query already (ex. 'name=me&foo=b&ar') how you can solve that 'b&ar' that is clearly a value to YOU but 'b' and 'ar' for the javascript engine?
for(i in arr)
{ arr[i] = encodeURIComponent(arr[i]); }
This function replaces all harmful url characters (, / ? : # & = + $ #) and some more.
Then you can send your query by building up like arr.join('');
But i'd use jquery and send the array right after.
On PHP side the basic rule you must obey is that you NEVER save your data without mysql_real_escape_string();

Serverside what you want are prepared statements, aka parameterized queries, in order to keep your DB safe (avoid SQL injection).
Client side, I would recommend JSON with defined objects and DOM manipulation, not innerText or innerHTML. The browser will escape everything it needs to in order to do a post.

YAML or serialize() to store data in MySQL

I am trying store temporary data (such as cart products, session_data) in DB. And I choosed YAML for this instead of serialize() function. Because YAML data is easily readable by human and portable between programming languages.
Am I in trouble with YAML if I store my temprory data in database?

Personally I would use serialize for two reasons:
Its included in PHP by default.
What you put in is what you get out.
In regards to the second point. Serialize doesn't just convert to a string it records the type as well and PHP calls functions on objects so you can choose what to serialise and what do do with the data when you unserialise it.
See: __sleep and __wake
It may not be easy to read directly from the database but it wouldn't take two minutes to write a script that could pull it out, unserialise it and do a print_r on the data to view what's stored.

Personally, I wouldn't use YAML. It's too format-dependent (Requiring new lines, whitespace, etc) and there's no native parser in PHP. Instead, I'd use JSON for this. It's trivial to handle natively, and is quite human readable (no as much as YAML, but much more so than serialized). It's the best of both worlds.
But, with that said, you really should ask yourself the question as to why you want to store a serialized representation of a complex data structure in a field in the DB... For most cases, it might be better to store a normalized representation of the data (so it's searchable easily, etc). It's not "bad" to store serialized data, but it might not be optimal or the right choice depending on what you're trying to do. It's generally far better than using an Entity-Attribute-Value store, but you need to really think about what you're doing to decide if it's the right thing.

Just make sure you are escaping everything potentially dangerous i.e. user input and you are fine.

I'm learning PHP on my own and I've become aware of the strip_tags() function. Is this the only way to increase security?

I'm new to PHP and I'm following a tutorial here:
Link
It's pretty scary that a user can write php code in an input and basically screw your site, right?
Well, now I'm a bit paranoid and I'd rather learn security best practices right off the bat than try to cram them in once I have some habits in me.
Since I'm brand new to PHP (literally picked it up two days ago), I can learn pretty much anything easily without getting confused.
What other way can I prevent shenanigans on my site? :D

There are several things to keep in mind when developing a PHP application, strip_tags() only helps with one of those. Actually strip_tags(), while effective, might even do more than needed: converting possibly dangerous characters with htmlspecialchars() should even be preferrable, depending on the situation.
Generally it all comes down to two simple rules: filter all input, escape all output. Now you need to understand what exactly constitutes input and output.
Output is easy, everything your application sends to the browser is output, so use htmlspecialchars() or any other escaping function every time you output data you didn't write yourself.
Input is any data not hardcoded in your PHP code: things coming from a form via POST, from a query string via GET, from cookies, all those must be filtered in the most appropriate way depending on your needs. Even data coming from a database should be considered potentially dangerous; especially on shared server you never know if the database was compromised elsewhere in a way that could affect your app too.
There are different ways to filter data: white lists to allow only selected values, validation based on expcted input format and so on. One thing I never suggest is try fixing the data you get from users: have them play by your rules, if you don't get what you expect, reject the request instead of trying to clean it up.
Special attention, if you deal with a database, must be paid to SQL injections: that kind of attack relies on you not properly constructing query strings you send to the database, so that the attacker can forge them trying to execute malicious instruction. You should always use an escaping function such as mysql_real_escape_string() or, better, use prepared statements with the mysqli extension or using PDO.
There's more to say on this topic, but these points should get you started.
HTH
EDIT: to clarify, by "filtering input" I mean decide what's good and what's bad, not modify input data in any way. As I said I'd never modify user data unless it's output to the browser.

strip_tags is not the best thing to use really, it doesn't protect in all cases.
HTML Purify:
http://htmlpurifier.org/
Is a real good option for processing incoming data, however it itself still will not cater for all use cases - but it's definitely a good starting point.

I have to say that the tutorial you mentioned is a little misleading about security:
It is important to note that you never want to directly work with the $_GET & $_POST values. Always send their value to a local variable, & work with it there. There are several security implications involved with the values when you directly access (or
output) $_GET & $_POST.
This is nonsense. Copying a value to a local variable is no more safe than using the $_GET or $_POST variables directly.
In fact, there's nothing inherently unsafe about any data. What matters is what you do with it. There are perfectly legitimate reasons why you might have a $_POST variable that contains ; rm -rf /. This is fine for outputting on an HTML page or storing in a database, for example.
The only time it's unsafe is when you're using a command like system or exec. And that's the time you need to worry about what variables you're using. In this case, you'd probably want to use something like a whitelist, or at least run your values through escapeshellarg.
Similarly with sending queries to databases, sending HTML to browsers, and so on. Escape the data right before you send it somewhere else, using the appropriate escaping method for the destination.

strip_tags removes every piece of html. more sophisticated solutions are based on whitelisting (i.e. allowing specific html tags). a good whitelisting library is htmlpurifyer http://htmlpurifier.org/
and of course on the database side of things use functions like mysql_real_escape_string or pg_escape_string

Well, probably I'm wrong, but... In all literature, I've read, people say It's much better to use htmlspellchars.
Also, rather necessary to cast input data. (for int for example, if you are sure it's user id).
Well, beforehand, when you'll start using database - use mysql_real_escape_string instead of mysql_escape_string to prevent SQL injections (in some old books it's written mysql_escape_string still).

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.