I have a textbox on my website and I need to store whatever the user enters into my database, and retrieve it at a later time. I need to store it exactly as the user entered, including special characters, carriage returns, etc.
What process should I use in PHP to store this in my database field (which is a 'text' field)? Should I use PHP's html_encode or anything like that?
Thankyou.
Edit: I also need to store correct formatting i.e tabs and multiple spaces.
Use mysql_real_escape_string():
$safetext = mysql_real_escape_string($_POST['text']);
$query = "INSERT INTO my_table (`my_field`) VALUES ('$safetext')";
mysql_query($query);
That should work.
You shouldn't html-encode the data when writing it to the datastorage - that way you could use your data also for something else (e.g. emails, PDF documents and so on). As Assaf already said: it's mandatory to avoid SQL injections by escaping the input or using parameterized insert-queries.
You should, no, let's say, must however html-encode your data when showing it on an HTML page! That will render dangerous HTML or Javascript code useless as the HTML-tags present in the data will not be recognized as HTML-tags by the browser any more.
The process is a little more complicated when you'll allow the users to post data with HTML-tags inside. You then have to skip the output-encoding in favor of an input-sanitizing which can be arbitrary complex depending on your needs (allowed tags e.g.).
You don't have to encode it in order to store it in a mysql.
Be sure you use a parameterized insert command, to avoid SQL injection.
The following should work:
if (get_magic_quotes_gpc()) {
$content = stripslashes($content);
}
$content = mysql_real_escape_string($content);
If your column is utf8, you shouldn't have problems with special characters. Once you've formatted the content correctly, you can feed it to mysql using your standard insert methods.
To correctly store the user text in addition to the formatting, all you have to do is convert all the newlines to breaks using nl2br($inputtext). Do this after filtering the input.
Related
Sorry, for my bad English!
As I'm learning PHP and having some questions about insert and output data from the database.
I am using PHP PDO.
To insert data to the database I'm using following function:
public static function validate( $string ){
$string = trim($string);
$string = htmlspecialchars($string, ENT_QUOTES, 'UTF-8');
return $string;
}
So when I insert this data O'Really <script>alert(is it safe?)</script> I see the data is properly(maybe) escaped/saved in the database. like that: <script>alert(1)</script>
Now, When I output this data should I use any PHP function?
If not then Is it safe?
Okay, If I use any PHP function like htmlentities then the data is showing like that O'Really <script>alert(is it safe?)</script>
Off course which I don't want.
Now, when I edit this data I see the data is saved to the database like this way:
O'Really <script>alert(is it safe?)</script>
Can you guys tell me the proper way / guide to safely insert/output data to/from the database?
There are (at least) two different risks you want to handle while storing user-given data from a web page in a database:
Cross Site Scripting (XSS) attacks, as AXAI mentioned above. In this scenario the problem isn't actually the database layer, but the dynamic text fields that are inserted into the HTML code. In your code snippet, you handled this problem by turning the tag marks (< and >) into entities before you stored them in the database. I recommend doing the opposite (as tadman says): storing the plain text untouched (but see next section), and use the htmlspecialchars() when outputting the fields in the HTML output.
SQL injection attacks. Basically, you want to escape any special characters correctly, e.g. ' must be turned into \' in a SQL command. If this escaping is done correctly it does not distort what is saved in the database, but assures that exactly all of the characters (whether normal or special) input by the user are put in the database. The article http://php.net/manual/en/security.database.sql-injection.php describes this closer, and also gives event better methods (i.e. variable binding).
What's the best route for storing data in MySQL. With MySQL should I just use, TEXT as my field type?
As well when using mysql_real_escape_string() with return'ed values \r\n .
But should I be running the htmlentities() on it after that?
And then when I return data to the screen I should use, NL2BR()?
Just trying to figure out the best route here for storing this information.
Thank you for your help!
TEXT or TINYTEXT or anything similar should be fine for storing ASCII data from the user. If you don't need a lot of space you may think about VARCHAR
i think that mysql_real_escape_string() escapes characters that may compromise the security of an SQL query (single quote, double quote, etc.) but doesn't do much more than that.
htmlentities() converts reserved html characters like < and > into their html encoded equivalent, < and > respectively. These characters are not dangerous for SQL queries so you probably do not need to escape them unless you want to display the HTML tag entered by the user as text, and not let it be interpreted as HTML.
NL2BR() is probably not necessary either.
Most importantly, your decision on when to use each of these functions will depend on your end application. You may need / want some but not others ( though you should definitely use mysql_real_escape_string() )
Really depends on what you are trying to store. For things such as usernames, passwords, etc... then you can use varchar. But if your storing long text such as news posts or html data, then you can use TEXT or LONG TEXT (Depending on how long it is).
You should ALWAYS use mysql_real_escape_string() when inserting into the DB. If you're outputting HTML from the DB, you may wan to run htmlentities or html_specialchars to ensure that you aren't outputting user injected javascript that could redirect your users to hacker websites and such.
One other idea is that you could escape your data using htmlentities before inserting into the DB, but it's your choice.
NL2BR is great for forcing all \r\n to tags instead.
So, it seems like your on the right track...
I was wondering if converting POST input from an HTML form into html entities, (via the PHP function htmlentities() or using the FILTER_SANITIZE_SPECIAL_CHARS constant in tandem with the filter_input() PHP function ), will help defend against any attacks where a user attempts to insert any JavaScript code inside the form field or if there's any other PHP based function or tactic I should employ to create a safe HTML form experience?
Sorry for the loaded run-on sentence question but that's the best I could word it in a hurry.
Any responses would be greatly appreciated and thanks to all in advance.
racl101
It would turn the following:
<script>alert("Muhahaha");</script>
into
<script>alert("Muhahaha");</script>
So if you're printing out this data into HTML later, you would be protected. It wouldn't protect you from:
"; alert("Muhahaha");
just in case you were echoing into a script like so:
var t = "Hello there <?php echo $str;?>";
For this purpose, you should use addslashes() and a database string escaping method like mysql_real_escape_string().
yes, that is one way to sanitise. it has the benefit that you can always display the database contents without fear of xss attacks. however, a 'purer' approach is to store the raw data in the database and sanitise in the view - so every time you want to show the text, use htmlentities() on it.
however, your approach does not take into account sql injection attacks. you might want to look at http://php.net/manual/en/function.mysql-real-escape-string.php to guard against that.
Yes, do this when you want to display data to a webpage, but I recommend you don't store the HTML in the database as encoded, this may seem fine for large text fields, but when you have shorter titles, say a 32 character, a normal 30 character string that contains an & would become & and this would either cause a SQL error or the data to be cut off.
So the rule of thumb is, store everything row (obviously prevent SQL injection) and treat EVERYTHING as tainted, no matter where it comes from: the database, user forms, rss feeds, flat files, XML, etc. This is how you build good security without worrying about the data overflowing, or the fact you might oneday need to extract the data to a non web user where the HTML encoding is a problem.
I am trying to figure out what is the best way to manage the data a user inputs concerning non desirable tags he might insert:
strip_tags() - the tags are removed and they are not inserted in the database
the tags are inserted in the database, but when reading that field and displaying it to the user we would use htmlspecialchars()
What's the better, and is there any disadvantage in any of these?
Regards
This depends on what your priority is:
if it's important to display special characters from user input (like on StackOverflow, for example), then you'll need to store this information in the database and sanitize it on display - in this case, you'll want to at least use htmlspecialchars() to display the output (if not something more sophisticated)
if you just want plain text comments, use strip_tags() before you stick it in the database - this way you'll reduce the amount of data that you need to store, and reduce processing time when displaying the data on the screen
the tags are inserted in the database, but when reading that field and displaying it to the user we would use htmlspecialchars()
This. You usually want people to be able to type less-than signs and ampersands and have them displayed as such on the page. htmlspecialchars on every text-to-HTML output step (whether that text came directly from user input, or from the database, or from somewhere else entirely) is the right way to achieve this. Messing about with the input is a not-at-all-appropriate tactic for dealing with an output-encoding issue.
Of course, you will need a different escape — or parameterisation — for putting text in an SQL string.
The measures taken to secure user input depends entirely on in what context the data is being used. For instance:
If you're inserting it into a SQL database, you should use parameterized statements. PHP's mysql_real_escape_string() works decently, as well.
If you're going to display it on an HTML page, then you need to strip or escape HTML tags.
In general, any time you're mixing user input with another form of mark-up or another language, that language's elements need to be escaped or stripped from the input before put into that context.
The last point above segues into the next point: Many feel that the original input should always be maintained. This makes a lot of sense when, later, you decide to use the data in a different way and, for instance, HTML tags aren't a big deal in the new context. Also, if your site is in some way compromised, you have a record of the exact input given.
Specifically related to HTML tags in user input intended for display on an HTML page: If there is any conceivable reason for a user to input HTML tags, then simply escape them. If not, strip them before display.
PLATFORM:
PHP & mySQL
For my experimentation purposes, I have tried out few of the XSS injections myself on my own website. Consider this situation where I have my form textarea input. As this is a textarea, I am able to enter text and all sorts of (English) characters. Here are my observations:
A). If I apply only strip_tags and mysql_real_escape_string and do not use htmlentities on my input just before inserting the data into the database, the query is breaking and I am hit with an error that shows my table structure, due to the abnormal termination.
B). If I am applying strip_tags, mysql_real_escape_string and htmlentities on my input just before inserting the data into the database, the query is NOT breaking and I am able to successfully able to insert data from the textarea into my database.
So I do understand that htmentities must be used at all costs but unsure when exactly it should be used. With the above in mind, I would like to know:
When exactly htmlentities should be used? Should it be used just before inserting the data into DB or somehow get the data into DB and then apply htmlentities when I am trying to show the data from the DB?
If I follow the method described in point B) above (which I believe is the most obvious and efficient solution in my case), do I still need to apply htmlentities when I am trying to show the data from the DB? If so, why? If not, why not? I ask this because it's really confusing for me after I have gone through the post at: http://shiflett.org/blog/2005/dec/google-xss-example
Then there is this one more PHP function called: html_entity_decode. Can I use that to show my data from DB (after following my procedure as indicated in point B) as htmlentities was applied on my input? Which one should I prefer from: html_entity_decode and htmlentities and when?
PREVIEW PAGE:
I thought it might help to add some more specific details of a specific situation here. Consider that there is a 'Preview' page. Now when I submit the input from a textarea, the Preview page receives the input and shows it html and at the same time, a hidden input collects this input. When the submit button on the Preview button is hit, then the data from the hidden input is POST'ed to a new page and that page inserts the data contained in the hidden input, into the DB. If I do not apply htmlentities when the form is initially submitted (but apply only strip_tags and mysql_real_escape_string) and there's a malicious input in the textarea, the hidden input is broken and the last few characters of the hidden input visibly seen as " /> on the page, which is undesirable. So keeping this in mind, I need to do something to preserve the integrity of the hidden input properly on the Preview page and yet collect the data in the hidden input so that it does not break it. How do I go about this? Apologize for the delay in posting this info.
Thank you in advance.
Here's the general rule of thumb.
Escape variables at the last possible moment.
You want your variables to be clean representations of the data. That is, if you are trying to store the last name of someone named "O'Brien", then you definitely don't want these:
O'Brien
O\'Brien
.. because, well, that's not his name: there's no ampersands or slashes in it. When you take that variable and output it in a particular context (eg: insert into an SQL query, or print to a HTML page), that is when you modify it.
$name = "O'Brien";
$sql = "SELECT * FROM people "
. "WHERE lastname = '" . mysql_real_escape_string($name) . "'";
$html = "<div>Last Name: " . htmlentities($name, ENT_QUOTES) . "</div>";
You never want to have htmlentities-encoded strings stored in your database. What happens when you want to generate a CSV or PDF, or anything which isn't HTML?
Keep the data clean, and only escape for the specific context of the moment.
Only before you are printing value(no matter from DB or from $_GET/$_POST) into HTML. htmlentities have nothing to do with database.
B is overkill. You should mysql_real_escape_string before inserting to DB, and htmlentities before printing to HTML. You don't need to strip tags, after htmlentities tags will be displayed on screen as < b r / > e.t.c
Theoretically you may do htmlentities before inserting to DB, but this might make further data processing harder, if you would need original text.
3. See above
In essence, you should use mysql_real_escape_string prior to database insertion (to prevent SQL injection) and then htmlentities, etc. at the point of output.
You'll also want to apply sanity checking to all user input to ensure (for example) that numerical values are really numeric, etc. Functions such as is_int, is_float, etc. are useful at this point. (See the variable handling functions section of the PHP manual for more information on these functions and other similar ones.)
I've been through this before and learned two important things:
If you're getting values from $_POST/$_GET/$_REQUEST and plan to add to DB, use mysql_real_escape_string function to sanitize the values. Do not encode them with htmlentities.
Why not just encode them with htmlentities and put them in database? Well, here's the thing - the goal is to make data as meaningful and clean as possible and when you encode the data with htmlentities like Jeff's Dog becomes Jeff"s Dog ... that will cause the context of data to lose its meaning. And if you decide to implement REST servcies and you fetch that string from DB and put it in JSON - it'll come up like Jeff"s Dog which isn't pretty. You'd have to add another function to decode as well.
Suppose you want to search for "Jeff's Dog" using SQL "select * from table where field='Jeff\'s Dog'", you won't find it since "Jeff's Dog" does not match "Jeff"s Dog." Bad, eh?
To output alphanumeric strings (from CHAR type) to a webpage, use htmlentities - ALWAYS!