Basically, my idea is that if I convert a string whose contents I cannot predict into a hex string, then in the script that receives the data, convert it back to a readable string, it will prevent any potential XSS vulnerabilities, as well as ensure that any special characters such as spaces, ampersands, question marks, etc. don't mess up execution of Script2.php. Is this correct, or is there more I need to do?
In Script1.php:
echo('<TD>Proceed</TD>');
In Script2.php:
echo('<input type="text" name="reason">' . strlen($_REQUEST['reason'])?pack('H*', $_REQUEST['reason']):'<I>No reason specified</I>' . '</input>');
Basically, there is exactly one thing that you need to look out for: when you issue a command to an external system, you have to make sure that the command means exactly what you think it means.
If you are programming in PHP, you frequently deal with two external systems:
the web browser to wich you send your HTML,
the database where you store data.
For point 1, filter data that comes from the database through htmlspecialchars(). There are cases when you don't want to do this, but in those cases you have to know exactly why this does not compromise the security of your users.
For point 2, use prepared statements to insert and update database records. For new code, there are no exceptions, regardless of where the data is comming from. For old code, that uses interfaces that do not support prepared statements, use something like mysql_real_escape_string() to prepare values for inserting into or updating the database; again, regardless of where the data is comming from.
These two points are technical requirements (i.e. they are imposed by the technology that you are using). Additionally, there might be business requirements (like a credit card number being valid, a birthdate beeing before Aug 30th, 1995, a venue can only be booked for up to 7 days, whatever). Technical requirements and business requirements change at different rates, so you should handle them in different components. Don't mix preparing data to be technically fit for insertion into the database with validating whether the data meets your business needs.
Applying this to your special scenario, it seems that in Script1.php, you want to use some data in the query string of a URL in a HTML document. That's what urlencode() is for. In Script2.php, the browser has sent you data that you want to sent back to the browser. This is usually not critical for your or your users security. Still, the data must be passed through htmlspecialchars, because if the user sends </input> as $_REQUEST['reason'] it will confuse the user. It is not clear, what you intend with strlen and pack; don't do that, it serves no purpose other than to confuse fellow developers (which is bad), users (which is also bad) and potential attackers (which they regard as a challange rather than a hindrance).
OWASP has some very detailed information on how to best prevent XSS: https://www.owasp.org/index.php/XSS_%28Cross_Site_Scripting%29_Prevention_Cheat_Sheet#A_Positive_XSS_Prevention_Model
Related
The past couple of days, I’ve read through a lot of resources on the sanitization of input and output data with PHP to prevent (most prominently) XSS and SQL injection, i.a. a bunch of question on SO. At this point, however, I feel like I am more confused and insecure about what I am supposed to do and what I am not supposed to do due in part to some contrary information, e.g. I've read many times that I don't need to use mysqli_real_escape_string or any other forms of sanitization of input whatsoever if I use prepared statements, other sources say I should just use it anyway or even that I should sanitize it like so; this page by Apple rather roughly(?) goes over the topic; etc. Therefore, I would really appreciate some clarification on what I am supposed to do - preferably but not necessarily, by someone who has got some experience in the field (server-side security) due to e.g. working in this field, having done a lot of research in it or maybe even being on the attacker’s side(?).
To understand my situation better, I am going to go over it as concisely as possible:
I am currently programming an app using Swift (iOS) and need to send some data to my server where it is saved in a table using SQL and can be retrieved from by other users (e.g. for a blog).
To do this I send the data via POST, encoded as JSON, to my server (“myphp.php”; with Alamofire, which shouldn’t be very important, though) and decode it there. And this is the first spot where I am not sure if I should already sanitize my data in some way (with reference to the question I linked above). Anyway, then I go on to e.g. insert it in a table using prepared statements (MySQL, so nothing’s emulated). Moreover, I would also like the data I output to be usable in html or rather the entire PHP be usable for AJAX, too.
Here is an example of what I mean:
// SWIFT
// set parameters for request
let parameters: Parameters = [
“key”: “value”,
...
]
// request with json encoded parameters
Alamofire.request(“myphp.php”, method: .post, parameters: parameters, encoding: JSONEncoding.default)
.validate().responseJSON(completionHandler: { (response) in
// do things with data (e.g. show blog post)
// PHP
header('Content-Type: application/json');
$decodedPost = json_decode(file_get_contents('php://input'), true);
// what to do with input...?
// PREPARED STATEMENTS: insert, select, etc.
// what to do with output...?
// echo response - json-encoded so that
// json completion handler in swift can work with it
echo json_encode($output, JSON_NUMERIC_CHECK);
I've asked a friend for some advice on this and he told me he always does the following (xss_clean() is a function he sent me, too) - whether the data is in- or outputted:
$key = xss_clean(mysqli_real_escape_string($db, trim(htmlspecialchars($data))));
// e.g. $data = decodedPost["key"]
However, not only my research tells me that this probably isn't necessary, but he also told me this has its limitations, most obviously when data is supposed to be retrieved again from the server and displayed again to e.g. another user - as close to the original input as possible.
As you can see, I am really confused. I want to protect the data of users, which is sent to the server, as well as I can so this is a very important topic for me. I hope this question isn't too broad but many other questions were, like I said, either, at least partly, contradictory or very old and e.g. still using simple mysql extensions and no prepared statements.
If you need more information, feel free to ask. References to official documents (to support answers) are very much appreciated. Thank you!
Input sanitization is a misleading term that indicates that you can wave a magic wand at all data and make it "safe data". The problem is that the definition of "safe" changes when the data is interpreted by different pieces of software as do the encoding requirements. Similarly the concept of "valid" data varies depending on context - your data may very well require special characters (',",&,<) - note that SO allows all of these as data.
Output that may be safe to be embedded in an SQL query may not be safe for embedding in HTML. Or Swift. Or JSON. Or shell commands. Or CSV. And stripping (or outright rejecting) values so that they are safe for embedding in all those contexts (and many others) is too restrictive.
So what should we do? Make sure the data is never in a position to do harm. The best way to achieve this is to avoid interpretation of the data in the first place. Parameterized SQL queries is an excellent example of this; the parameters are never interpreted as SQL, they're simply put in the database as, well, data.
That same data may be used for other other formats, such as HTML. In that case, the data should be encoded / escaped for that particular language at the moment it's embedded. So, to prevent XSS, data should be HTML-escaped (or javascript or URL escaped) at the time it's being put into the ouput. Not at input time. The same applies to other embedding situations.
So, should we just pass anything we get straight to the database?
No - there are definitely things you can check about user input, but this is highly context-dependent. Let's call this what it is - validation. Make sure this is done on the server. Some examples:
If a field is supposed to be an integer, you can certainly validate this field to ensure it contains an integer (or maybe NULL).
You can often check that a particular value is one of a set of known values (white list validation)
You can require most fields to have a minimum and maximum length.
You should usually verify that any string contains only valid characters for its encoding (e.g., no invalid UTF-8 sequences)
As you can see, these checks are very context-dependent. And all of them are to help increase the odds you end up with data that makes sense. They should not be the only defense to protect your application from malicious input (SQL injection, XSS, command injection, etc), because this is not the place to do that.
The reason I ask this question is because I was checking stackoverflow for answer, and since 2012/13 it no longer seems to be a hot topic and all the answers documentation is deprecated. Could you please tell me if we still should be doing this and if so what's a secure way to do so? I'm specifically talking about user defined post data...
Update: the string will be html inputted from user and posted into my dB.
The short answer is yes. Even in 2017 you should be escaping strings in PHP. PHP does not do it by itself because not every developer will want to develop a product / functionality that needs to escape user input (for whatever that reason may be).
If you are echoing user inputted data to a webpage, you should use the function htmlspecialchars() to stop potential malicious coding from executing upon being read by your browser.
When you are retrieving data from a client, you can also use the FILTER_INPUT functions to validate incoming data to validate that the clients data is actually the data you want (e.g checking that no one has bypassed your client side validation and has entered Illegal characters into the data)
From my experience these are two great functions that can be used to 1:) escape output to a client and 2:) prevent the chance of malicious code being stored/processed on your server.
It depends entirely on what you are going to do with the string.
If you are going to treat it as code (whether that code is HTML, JavaScript, PHP, SQL or something else) then it will need escaping.
PHP is not able to tell if you trust the source of the data to write safe code.
In 2017 this is what is usually done in the scenario you describe:
The user inputs text in a form, the text is sent to the server, before that the text is url encoded (this is one form or escaping). This is typically done by the browser/javascript so no need to do it manually (but it does happen).
The server receives the text, decodes it and then creates a MySQL insert/update statement to store it in the database. While some people still run the mysqli_real_escape_string on it, the recommended way is to use prepared statements instead. Therefore in this aspect you do not need to do the escaping, however prepared statements delegate escaping to the database (so again escaping does happen)
If the user inputted text is to be presented back on a page then it is encoded via htmlentities or similar (which is itself another form of escaping). This is mostly ran manually although most new view template frameworks (e.g. twig or blade) take care of that for us.
So that's how it is today as far as I know. Escaping is very much required, but the programmer actually doing it is not so much a requirement if modern frameworks and practices are used.
Yes, escaping the strings from the request (and therefore imputable by the user) is a practical requirement because PHP makes available the data actually added to the payload of the request without any modification that could invalidate the data itself (not all the data needs Of escaping), so any subsequent processing on that data must be made and under the developer's control.
The escape of variables in database interaction operations to prevent SQL Injections.
In past versions of PHP there was the "magic_quoteas" feature that filtered every variable in GET or POST. But it is deprecated and is not a best practice. Why Not?
The state of the art in querying DB is predominantly in using the PDO driver with the prepared statement. At the time the variable is bound, the variable will be escaped automatically.
$conn->prepare('SELECT * FROM users WHERE name = :name');
$conn->bindParam(':name',$_GET['username']); //this do the escape too
$conn->execute();
Alternatively, mysql_real_escape_string manages it manually.
Alternatively, mysqli::real_escape_string manages it manually.
I've been working with PHP for some time and I began asking myself if I'm developing good habits.
One of these is what I belive consists of overusing PHP sanitizing methods, for example, one user registers through a form, and I get the following post variables:
$_POST['name'], $_POST['email'] and $_POST['captcha']. Now, what I usually do is obviously sanitize the data I am going to place into MySQL, but when comparing the captcha, I also sanitize it.
Therefore I belive I misunderstood PHP sanitizing, I'm curious, are there any other cases when you need to sanitize data except when using it to place something in MySQL (note I know sanitizing is also needed to prevent XSS attacks). And moreover, is my habit to sanitize almost every variable coming from user-input, a bad one ?
Whenever you store your data someplace, and if that data will be read/available to (unsuspecting) users, then you have to sanitize it. So something that could possibly change the user experience (not necessarily only the database) should be taken care of. Generally, all user input is considered unsafe, but you'll see in the next paragraph that some things might still be ignored, although I don't recommend it whatsoever.
Stuff that happens on the client only is sanitized just for a better UX (user experience, think about JS validation of the form - from the security standpoint it's useless because it's easily avoidable, but it helps non-malicious users to have a better interaction with the website) but basically, it can't do any harm because that data (good or bad) is lost as soon as the session is closed. You can always destroy a webpage for yourself (on your machine), but the problem is when someone can do it for others.
To answer your question more directly - never worry about overdoing it. It's always better to be safe than sorry, and the cost is usually not more than a couple of milliseconds.
The term you need to search for is FIEO. Filter Input, Escape Output.
You can easily confound yourself if you do not understand this basic principle.
Imagine PHP is the man in the middle, it receives with the left hand and doles out with the right.
A user uses your form and fills in a date form, so it should only accept digits and maybe, dashes. e.g. nnnnn-nn-nn. if you get something which does not match that, then reject it.
That is an example of filtering.
Next PHP, does something with it, lets say storing it in a Mysql database.
What Mysql needs is to be protected from SQL injection, so you use PDO, or Mysqli's prepared statements to make sure that EVEN IF your filter failed you cannot permit an attack on your database. This is an example of Escaping, in this case escaping for SQL storage.
Later, PHP gets the data from your db and displays it onto a HTML page. So you need to Escape the data for the next medium, HTML (this is where you can permit XSS attacks).
In your head you have to divide each of the PHP 'protective' functions into one or other of these two families, Filtering or Escaping.
Freetext fields are of course more complex than filtering for a date, but never mind, stick to the principles and you will be OK.
Hoping this helps http://phpsec.org/projects/guide/
I need to prevent XSS attacks as much as possible and in a centralized way so that I don't have to explicitly sanitize each input.
My question is it better to sanitize all inputs at URL/Request processing level, encode/sanitize inputs before serving, or at the presentation level (output sanitization)?
Which one is better and why?
There are two areas where you need to be aware:
Anywhere where you use input as part of a script in any language, most notably including SQL. In the particular case of SQL, the only recommended way of dealing with things is the use of parameterized queries (which will result in unescaped content being in the database, but just as strings: that's ideal). Anything involving the magic quoting of characters before substituting them directly into the SQL string is inferior (because it's so easy to get wrong). Anything that can't be done with a parameterized query is something that a service secured against SQL-injection should never allow a user to specify.
Anywhere where you present something that was input as output. The source of the input could be direct (including via a cookie) or indirect (via the database or a file). In this case, your default approach should be to make the text that the user sees be the text that was input. That's very easy to implement correctly since the only characters you actually have to quote are < and &, and you can wrap it all in <pre> for display.
But that's often not enough. For example, you might want to allow users to do some sort of formatting. This is where it is ever so easy to go wrong. The simplest approach in this case is to parse the input and detect all the formatting instructions; everything else needs to be quoted properly. You should store the formatted version additionally in the database as an extra column so that you don't need to do much work when returning it to the user, but you should also store the original version that the user input so you can search over it. Do not mix them up! Really! Audit your application to make totally sure that you get this right (or, better yet, get someone else to do the audit).
But everything about being careful with SQL still applies, and there are many HTML tags (e.g., <script>, <object>) and attributes (e.g., onclick) that are never ever safe.
You were looking for advice about specific packages to do the work? You really need to pick a language then. The above advice is all totally language-independent. Add-on packages/libraries can make many of the steps above really easy in practice, but you still absolutely need to be careful.
What do you all think is the correct (read: most flexible, loosely coupled, most robust, etc.) way to make user input from the web safe for use in various parts of a web application? Obviously we can just use the respective sanitization functions for each context (database, display on screen, save on disk, etc.), but is there some general "pattern" for handling unsafe data and making it safe? Is there an established way to enforce treating it as unsafe unless it is properly made safe?
Like it's already been said, there are several things to take into account when you are concerned about web security. Here are some basic principals to take into account:
Avoid direct input from users being integrated into queries and variables.
So this means don't have something like $variable = $_POST['user_input']. For any situation like this, you are handing over too much control to the user. If the input affects some database query, always have whitelists to validate user input against. If the query is for a user name, validate against a list of good user names. Do NOT simply make a query with the user input dropped right in.
One (possible) exception is for a search string. In this case, you need to sanitize, simple as that.
Avoid storing user input without sanitation.
If the user is creating a profile or uploading info for other users, you have to either have a white-list of what kind of data is acceptable, or strip out anything that could be malicious. This not only for your system's security, but for your other users (See next point.)
NEVER output anything from a user to the browser without stripping it.
This is probably the most important thing that security consultants have emphasized to me. You can not simply rely on sanitizing input when it is received by the user. If you did not write the output yourself, always ensure that the output is innocuous by encoding any HTML characters or wrapping it in a <plaintext> tag. It is simple negligence on the part of the developer if user A uploads a bit of javascript that harms any other users that view that page. You will sleep better at night knowing that any and all user output can do nothing but appear as text on all browsers.
Never allow anyone but the user control the form.
XSS is easier than it should be and a real pain to cover in one paragraph. Simply put, whenever you create a form, you are giving users access to a script that will handle form data. If I steal someone's session or someone's cookie, I can now talk to the script as though I was on the form page. I know the type of data it expects and the variables names it will look for. I can simply pass it those variables as though I were the user and the script can't tell the difference.
The above is not a matter of sanitation but of user validation. My last point is directly related to this idea.
Avoid using cookies for user validation or role validation.
If I can steal a user's cookie, I may be able to do more than make that one user have a bad day. If I notice the cookie has a value called "member", I can very easily change that value to "admin". Maybe it won't work, but for many scripts, I would have instant access to any admin-level info.
Simply put, there is not one easy way to secure a web form, but there are basic principals that simplify what you should be doing, and thus eases the stress of securing your scripts.
Once more for good measure:
Sanitize all input
Encode all output
Validate any input used for execution against a strict whitelist
Make sure the input is coming from the actual user
Never make any user or role-based validation browser-side/user-modifiable
And never assume that any one person's list is exhaustive or perfect.
I'm more than a little sceptical that such a general purpose framework could both exist and be less complex than a programming language.
The definition of "safe" is so different between different layers
Input field validation, numbers, dates, lists, postcodes, vehicle registrations
Cross field validation
Domain validation - is that a valid meter reading? Miss Jones used £300,000,000 electricty this month?
Inter-request validation - are you really booking two transatlantic flights for yourself on the same day?
Database consistency, foreign key validation
SQL injection
Also consider the actions when violations are discovered.
At the UI layer we almost certainly do not just quietly strip out non-digit chras from numberic fields, we raise UI error
In the UI we probably want to validate all fields and flag each individual error
in other layers we might throw an exception or intiate a business process
Perhaps I'm missing your vision? Have you seen anything that gets close to what you have in mind?
You cannot use a single method to sanitize data for all uses, but a good start is:
Use Filter_Var to Validate/Sanitize
Filter Var takes a number of different types of data and strips out bad characters (like non-digits for things you expect to be numbers), and makes sure it is of valid format (IP Addresses).
Note: Email Addresses are far more complicated than the Filter_Var's implementation, so Google around for the proper function.
Use mysql_real_escape_string before inputting anything into a Mysql Database
I wouldn't suggest using this until you are about to input stuff into a database, and it is probably better to just use prepared mysqli statements anyway.