Sorry, for my bad English!
As I'm learning PHP and having some questions about insert and output data from the database.
I am using PHP PDO.
To insert data to the database I'm using following function:
public static function validate( $string ){
$string = trim($string);
$string = htmlspecialchars($string, ENT_QUOTES, 'UTF-8');
return $string;
}
So when I insert this data O'Really <script>alert(is it safe?)</script> I see the data is properly(maybe) escaped/saved in the database. like that: <script>alert(1)</script>
Now, When I output this data should I use any PHP function?
If not then Is it safe?
Okay, If I use any PHP function like htmlentities then the data is showing like that O'Really <script>alert(is it safe?)</script>
Off course which I don't want.
Now, when I edit this data I see the data is saved to the database like this way:
O'Really <script>alert(is it safe?)</script>
Can you guys tell me the proper way / guide to safely insert/output data to/from the database?
There are (at least) two different risks you want to handle while storing user-given data from a web page in a database:
Cross Site Scripting (XSS) attacks, as AXAI mentioned above. In this scenario the problem isn't actually the database layer, but the dynamic text fields that are inserted into the HTML code. In your code snippet, you handled this problem by turning the tag marks (< and >) into entities before you stored them in the database. I recommend doing the opposite (as tadman says): storing the plain text untouched (but see next section), and use the htmlspecialchars() when outputting the fields in the HTML output.
SQL injection attacks. Basically, you want to escape any special characters correctly, e.g. ' must be turned into \' in a SQL command. If this escaping is done correctly it does not distort what is saved in the database, but assures that exactly all of the characters (whether normal or special) input by the user are put in the database. The article http://php.net/manual/en/security.database.sql-injection.php describes this closer, and also gives event better methods (i.e. variable binding).
Related
1) I have a textarea in my html. Inside the textarea I wrote: <i>ABC Enterprise</i>. When saving into the sql database it saved as <i>XYZ Enterprise</i>
2) Does anyone know how to retain < and </> when saving into the database without converting? If this is not possible, does anyone know how to convert <i>XYZ Enterprise</i> to <i>ABC Enterprise</i> in php? I need the string to maintain this form <i>ABC Enterprise</i> in php not html.
I have tried preg_replace("/&([a-z])[a-z]+;/i", "$1", htmlentities($company)), iconv('utf-8', 'ascii//TRANSLIT', $company), htmlspecialchars($compnay), many other ways I happened to stumble upon on stackoverflow but nothing seemed to work. Any help?
To specifically answer your question:
How to retain <> and </> when inserting into the DB? [paraphrased, emphasis added]
Simple: don't modify your data. As discussed below, however, be smart about it and insert the data using a prepared statement.
Why is your data being changed? Most likely because your code is doing some form of modification of the data before putting it in the database. In PHP, this generally means one of:
htmlentities
htmlspecialchars
The general advice for years was simply "escape all your data or suffer the XSS/CSRF/Sql Injection/other attack consequences!" The problem is that there are nuances of when and how to escape and in the zeal for security, many websites over do it. As you've described your situation, I would consider:
When inserting into the DB: use prepared statements, rather than manual escaping.
When pulling from the DB: be judicious when you apply escaping techniques.
A prepared statement is where you tell the database the format of what you're going to send, then send the data in a separate communication. If there's anything awry, the DB knows best how to find it. For example:
$pstmt = $dbh->prepare('INSERT INTO tab (html) VALUES (?)');
$pstmt->execute(array($_POST['my_textarea']));
Note the lack of any sanitization, using the $_POST variable directly. What the user sent to you is what you put in the DB, with zero modification. Because the DB server was sent a format first, it will not allow any ulterior SQL injection shenanigans.
However, when pulling data out of the DB, you need to be careful of exactly what data goes where. For example, to allow < and > characters inside of the content might be foolhardy, depending on your context. I'll leave it to you to decide whether you want to escape the output inside of your <textarea>:
echo "<textarea>$textarea_content_as_retrieved_from_db</textarea>";
or
echo '<textarea>' . htmlentities( $textarea_content_as_retrieved_from_db ) . '</textarea>';
I have always been told to "sanitize" input to a database and one of the ways to do this (as well as using prepared statements) is using htmlspecialchars() and htmlentities().
This stores quotes as " so printing the output of the database to a page "naked" has never been a problem for XSS attacks etc.
However, I have been asked to have part of my application export certain values as pure data in .csv format and now it's full of said HTML entities.
It seems that I have two options:
Decode all values before exporting the data and leave everything else the way it is.
Exclude "sanitation" before input to the database and make sure to sanitize on the output instead (except for data exports).
As much information as there is out there, I can't find the generally accepted way to do this - is it best to do this process on the way in or way out of the database? Obviously, doing both gives me silly values like &
I was wondering if converting POST input from an HTML form into html entities, (via the PHP function htmlentities() or using the FILTER_SANITIZE_SPECIAL_CHARS constant in tandem with the filter_input() PHP function ), will help defend against any attacks where a user attempts to insert any JavaScript code inside the form field or if there's any other PHP based function or tactic I should employ to create a safe HTML form experience?
Sorry for the loaded run-on sentence question but that's the best I could word it in a hurry.
Any responses would be greatly appreciated and thanks to all in advance.
racl101
It would turn the following:
<script>alert("Muhahaha");</script>
into
<script>alert("Muhahaha");</script>
So if you're printing out this data into HTML later, you would be protected. It wouldn't protect you from:
"; alert("Muhahaha");
just in case you were echoing into a script like so:
var t = "Hello there <?php echo $str;?>";
For this purpose, you should use addslashes() and a database string escaping method like mysql_real_escape_string().
yes, that is one way to sanitise. it has the benefit that you can always display the database contents without fear of xss attacks. however, a 'purer' approach is to store the raw data in the database and sanitise in the view - so every time you want to show the text, use htmlentities() on it.
however, your approach does not take into account sql injection attacks. you might want to look at http://php.net/manual/en/function.mysql-real-escape-string.php to guard against that.
Yes, do this when you want to display data to a webpage, but I recommend you don't store the HTML in the database as encoded, this may seem fine for large text fields, but when you have shorter titles, say a 32 character, a normal 30 character string that contains an & would become & and this would either cause a SQL error or the data to be cut off.
So the rule of thumb is, store everything row (obviously prevent SQL injection) and treat EVERYTHING as tainted, no matter where it comes from: the database, user forms, rss feeds, flat files, XML, etc. This is how you build good security without worrying about the data overflowing, or the fact you might oneday need to extract the data to a non web user where the HTML encoding is a problem.
PLATFORM:
PHP & mySQL
For my experimentation purposes, I have tried out few of the XSS injections myself on my own website. Consider this situation where I have my form textarea input. As this is a textarea, I am able to enter text and all sorts of (English) characters. Here are my observations:
A). If I apply only strip_tags and mysql_real_escape_string and do not use htmlentities on my input just before inserting the data into the database, the query is breaking and I am hit with an error that shows my table structure, due to the abnormal termination.
B). If I am applying strip_tags, mysql_real_escape_string and htmlentities on my input just before inserting the data into the database, the query is NOT breaking and I am able to successfully able to insert data from the textarea into my database.
So I do understand that htmentities must be used at all costs but unsure when exactly it should be used. With the above in mind, I would like to know:
When exactly htmlentities should be used? Should it be used just before inserting the data into DB or somehow get the data into DB and then apply htmlentities when I am trying to show the data from the DB?
If I follow the method described in point B) above (which I believe is the most obvious and efficient solution in my case), do I still need to apply htmlentities when I am trying to show the data from the DB? If so, why? If not, why not? I ask this because it's really confusing for me after I have gone through the post at: http://shiflett.org/blog/2005/dec/google-xss-example
Then there is this one more PHP function called: html_entity_decode. Can I use that to show my data from DB (after following my procedure as indicated in point B) as htmlentities was applied on my input? Which one should I prefer from: html_entity_decode and htmlentities and when?
PREVIEW PAGE:
I thought it might help to add some more specific details of a specific situation here. Consider that there is a 'Preview' page. Now when I submit the input from a textarea, the Preview page receives the input and shows it html and at the same time, a hidden input collects this input. When the submit button on the Preview button is hit, then the data from the hidden input is POST'ed to a new page and that page inserts the data contained in the hidden input, into the DB. If I do not apply htmlentities when the form is initially submitted (but apply only strip_tags and mysql_real_escape_string) and there's a malicious input in the textarea, the hidden input is broken and the last few characters of the hidden input visibly seen as " /> on the page, which is undesirable. So keeping this in mind, I need to do something to preserve the integrity of the hidden input properly on the Preview page and yet collect the data in the hidden input so that it does not break it. How do I go about this? Apologize for the delay in posting this info.
Thank you in advance.
Here's the general rule of thumb.
Escape variables at the last possible moment.
You want your variables to be clean representations of the data. That is, if you are trying to store the last name of someone named "O'Brien", then you definitely don't want these:
O'Brien
O\'Brien
.. because, well, that's not his name: there's no ampersands or slashes in it. When you take that variable and output it in a particular context (eg: insert into an SQL query, or print to a HTML page), that is when you modify it.
$name = "O'Brien";
$sql = "SELECT * FROM people "
. "WHERE lastname = '" . mysql_real_escape_string($name) . "'";
$html = "<div>Last Name: " . htmlentities($name, ENT_QUOTES) . "</div>";
You never want to have htmlentities-encoded strings stored in your database. What happens when you want to generate a CSV or PDF, or anything which isn't HTML?
Keep the data clean, and only escape for the specific context of the moment.
Only before you are printing value(no matter from DB or from $_GET/$_POST) into HTML. htmlentities have nothing to do with database.
B is overkill. You should mysql_real_escape_string before inserting to DB, and htmlentities before printing to HTML. You don't need to strip tags, after htmlentities tags will be displayed on screen as < b r / > e.t.c
Theoretically you may do htmlentities before inserting to DB, but this might make further data processing harder, if you would need original text.
3. See above
In essence, you should use mysql_real_escape_string prior to database insertion (to prevent SQL injection) and then htmlentities, etc. at the point of output.
You'll also want to apply sanity checking to all user input to ensure (for example) that numerical values are really numeric, etc. Functions such as is_int, is_float, etc. are useful at this point. (See the variable handling functions section of the PHP manual for more information on these functions and other similar ones.)
I've been through this before and learned two important things:
If you're getting values from $_POST/$_GET/$_REQUEST and plan to add to DB, use mysql_real_escape_string function to sanitize the values. Do not encode them with htmlentities.
Why not just encode them with htmlentities and put them in database? Well, here's the thing - the goal is to make data as meaningful and clean as possible and when you encode the data with htmlentities like Jeff's Dog becomes Jeff"s Dog ... that will cause the context of data to lose its meaning. And if you decide to implement REST servcies and you fetch that string from DB and put it in JSON - it'll come up like Jeff"s Dog which isn't pretty. You'd have to add another function to decode as well.
Suppose you want to search for "Jeff's Dog" using SQL "select * from table where field='Jeff\'s Dog'", you won't find it since "Jeff's Dog" does not match "Jeff"s Dog." Bad, eh?
To output alphanumeric strings (from CHAR type) to a webpage, use htmlentities - ALWAYS!
I have a textbox on my website and I need to store whatever the user enters into my database, and retrieve it at a later time. I need to store it exactly as the user entered, including special characters, carriage returns, etc.
What process should I use in PHP to store this in my database field (which is a 'text' field)? Should I use PHP's html_encode or anything like that?
Thankyou.
Edit: I also need to store correct formatting i.e tabs and multiple spaces.
Use mysql_real_escape_string():
$safetext = mysql_real_escape_string($_POST['text']);
$query = "INSERT INTO my_table (`my_field`) VALUES ('$safetext')";
mysql_query($query);
That should work.
You shouldn't html-encode the data when writing it to the datastorage - that way you could use your data also for something else (e.g. emails, PDF documents and so on). As Assaf already said: it's mandatory to avoid SQL injections by escaping the input or using parameterized insert-queries.
You should, no, let's say, must however html-encode your data when showing it on an HTML page! That will render dangerous HTML or Javascript code useless as the HTML-tags present in the data will not be recognized as HTML-tags by the browser any more.
The process is a little more complicated when you'll allow the users to post data with HTML-tags inside. You then have to skip the output-encoding in favor of an input-sanitizing which can be arbitrary complex depending on your needs (allowed tags e.g.).
You don't have to encode it in order to store it in a mysql.
Be sure you use a parameterized insert command, to avoid SQL injection.
The following should work:
if (get_magic_quotes_gpc()) {
$content = stripslashes($content);
}
$content = mysql_real_escape_string($content);
If your column is utf8, you shouldn't have problems with special characters. Once you've formatted the content correctly, you can feed it to mysql using your standard insert methods.
To correctly store the user text in addition to the formatting, all you have to do is convert all the newlines to breaks using nl2br($inputtext). Do this after filtering the input.