I have a simple textbox in a form and I want to safely store special characters in the database after POST or GET and I use the code below.
$text=mysql_real_escape_string(htmlspecialchars_decode(stripslashes(trim($_GET["text"])),ENT_QUOTES));
When I read the text from the database and put it in the text value I use the code above.
$text=htmlspecialchars($text_from_DB,ENT_QUOTES,'UTF-8',false);
<input type="text" value="<?=$text?>" />
I am trying to save in the database with no special characters (meaning I don't want to write in database field " or ')
Actually when writing to the database do htmlspecialchars_decode to the text.
When writing to the form text box do htmlspecialchars to the text.
Is this the best approach for safe writing special chars to the database?
You have the right idea of keeping the text in the database as raw. Not sure what all the HTML entity stuff is for; you shouldn't need to be doing that for a database insertion.
[The only reason I can think of why you might try to entity-decode incoming input for the database would be if you find you are getting character references like Š in your form submission input. If that's happening, it's because the user is inputting characters that don't exist in the encoding used by the page with the form. This form of encoding is totally bogus because you then can't distinguish between the user typing Š and literally typing Š! You should avoid this by using the UTF-8 encoding for all your pages and content, as every possible character fits in this encoding.]
Strings in your script should always be raw text with no escaping. That means you don't do anything to them until the time you output them into a context that isn't plain-text. So for putting them into an SQL string:
$category= trim($_POST['category']);
mysql_query("SELECT * FROM things WHERE category='".mysql_real_escape_string($category)."'");
(or use parameterised queries to avoid having to manually escape it.) When putting content into HTML:
<input type="text" name="category" value="<?php echo htmlspecialchars($category); ?>" />
(you can define a helper function with a shorter name like function h($s) { echo htmlspecialchars($s, ENT_QUOTES); } if you want to cut down on the amount of typing you have to do in templates.)
And... that's pretty much it. You don't need to process strings that come out of the database, as they're already raw strings. You don't need to process input strings(*), other than any application-specific field validation you want to do.
*: well, except if magic_quotes_gpc is turned on, in which case you do either need to stripslashes() everything that comes in from get/post/cookie, or, my favoured option, just immediately fail:
if (get_magic_quotes_gpc())
die(
'Magic quotes are turned on. They are utterly bogus and no-one should use them. '.
'Turn them off, you idiot, or I refuse to run. So there!'
);
When you write to db, use htmlentities but when you read back, use html_entity_decode function.
As a sidenote, if you are looking for some security, then for strings use mysql_real_escape_string and for numbers use intval.
I'd like to point out a couple of things:
there is nothing wrong in saving characters like ' and " in a database, SQL injections are just a matter of string manipulation, they actually have nothing to do with SQL or databases -- the problem only relies in how the query string is built. If you want to write your own queries (not recommended) you don't have to encode every apostrophe or double quote: just escape them once to build a safe string, and save them in the database. A better approach is using PDO as mentioned, or using the mysqli extension which allows queries with prepared statements
htmlentities() and similar functions should be used when sending data as output to the browser, not for encoding data to be stored in a database for at least two reasons: first of all it's useless, the DB doesn't care about html entities, it just contains data; secondly you should always treat data coming from the database as potentially insecure, so you should save it in "raw" format and encode it when using it.
The best approach to safe write to a DB is to use the PDO abstraction layer and make use of prepared statements.
http://www.php.net/manual/en/intro.pdo.php
A good tutorial (I learned from this one) is
http://www.phpro.org/tutorials/Introduction-to-PHP-PDO.html
However, you might have to rewrite alot of your site just to implement this. But this is no doubt the most elegant method than having to make use of all those functions. Plus, prepared statements are becoming the de facto now. Another benefit of this is that you do not have to rewrite your queries if you switch to a different database (such as from MySQL to PostgreSQL). But I would say consider this if you plan to scale your site.
Related
1) I have a textarea in my html. Inside the textarea I wrote: <i>ABC Enterprise</i>. When saving into the sql database it saved as <i>XYZ Enterprise</i>
2) Does anyone know how to retain < and </> when saving into the database without converting? If this is not possible, does anyone know how to convert <i>XYZ Enterprise</i> to <i>ABC Enterprise</i> in php? I need the string to maintain this form <i>ABC Enterprise</i> in php not html.
I have tried preg_replace("/&([a-z])[a-z]+;/i", "$1", htmlentities($company)), iconv('utf-8', 'ascii//TRANSLIT', $company), htmlspecialchars($compnay), many other ways I happened to stumble upon on stackoverflow but nothing seemed to work. Any help?
To specifically answer your question:
How to retain <> and </> when inserting into the DB? [paraphrased, emphasis added]
Simple: don't modify your data. As discussed below, however, be smart about it and insert the data using a prepared statement.
Why is your data being changed? Most likely because your code is doing some form of modification of the data before putting it in the database. In PHP, this generally means one of:
htmlentities
htmlspecialchars
The general advice for years was simply "escape all your data or suffer the XSS/CSRF/Sql Injection/other attack consequences!" The problem is that there are nuances of when and how to escape and in the zeal for security, many websites over do it. As you've described your situation, I would consider:
When inserting into the DB: use prepared statements, rather than manual escaping.
When pulling from the DB: be judicious when you apply escaping techniques.
A prepared statement is where you tell the database the format of what you're going to send, then send the data in a separate communication. If there's anything awry, the DB knows best how to find it. For example:
$pstmt = $dbh->prepare('INSERT INTO tab (html) VALUES (?)');
$pstmt->execute(array($_POST['my_textarea']));
Note the lack of any sanitization, using the $_POST variable directly. What the user sent to you is what you put in the DB, with zero modification. Because the DB server was sent a format first, it will not allow any ulterior SQL injection shenanigans.
However, when pulling data out of the DB, you need to be careful of exactly what data goes where. For example, to allow < and > characters inside of the content might be foolhardy, depending on your context. I'll leave it to you to decide whether you want to escape the output inside of your <textarea>:
echo "<textarea>$textarea_content_as_retrieved_from_db</textarea>";
or
echo '<textarea>' . htmlentities( $textarea_content_as_retrieved_from_db ) . '</textarea>';
I'm adding some xss protection to the website I'm working on, the platform is zendFrameWork 2 and therefor I'm using Zend\escaper. from zend documentation i knew that:
Zend\Escaper is meant to be used only for escaping data that is to be
output, and as such should not be misused for filtering input data.
For such tasks, the Zend\Filter component, HTMLPurifier.
but what are the riskes if i escaped the data before inserting it into the database, am i so wrong to do that? please explane to me as im somehow new to this topic.
thanks
When encoding data before storing it you will have to decode it before you can do anything sensible with it before outputting it. That's why I'd not do it.
Let's say you have an international application and you want to store the escaped value of a form field which might contain any NON-ASCII characters those might become escaped into HTML-Entities. So what if you have to quantify the content of that field? Like counting the characters? You will always have to de-escape the content before counting it. and then you have to re-escape it again. Much work done but nothing gained.
The same applies to search-operations in your database. You will have to escape the search-phrase the same way then your input for the database to understand what you are looking for.
I'd use one character-set throughout the application and database (I prefer UTF-8, beware of the MySQL-Connection....) and only escape content on output. Thant way I can then do whatever I like with the data and are on the safe side on output. And escaping is done in my view-layer automaticaly so I don't even have to think about it every time I handle data as it works automaticaly. That way you can't forget it.
That does not prevent me from filtering and sanitizing the input. And it doesn't prevent me from escaping the database-content using the appropriate database-escaping mechanisms like mysqli_real_escape_string or similar or using prepared statements!
But that's just my opinion, others might think otherwise!
"Output" here refers to the web page. A form field ( HTML tag) is an INPUT (from the webpage), any text is an OUTPUT (to the webpage). You need to ensure any output (to the webpage) does not contain dangerous characters that could be used to forge XSS attack vectors.
This said, if you have DANGEROUS_INPUT_X given by the user and then
$NOT_DANGEROUS_ANYMORE = ZED.HtmlPurifier(DANGEROUS_INPUT_X)
DBSave($NOT_DANGEROUS_ANYMORE)
and somewhere else
$OUTPUT = DBLoad($NOT_DANGEROUS_ANYMORE)
echo $OUTPUT
you should be fine, as long as you do not apply any additional encoding/decoding to this output. It will be displayed in the way it is saved, that was safe.
I would suggest to look at output encoding more than validation: HtmlPurifier cleans the HTML, while you could accept any kind of bad characters if you ensure your output is encoded in the page.
Here https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet some general rules, here the PHP example
echo htmlspecialchars($DANGEROUS_INPUT_X_NOW_OUTPUT, ENT_QUOTES, "UTF-8");
Remember to set the Character Set and be consistent with the same one throughout your pages/scripts/binaries and in the database as well.
Hi i'm planning to make users able to submit some pieces of code (php,java,javascript c++, etc... whatever they want i mean).
so does anyone can suggest me the best practice to make it safety for my site? :))
i mean which tags/chars/strings to escape in php once is submitted code string?
If your intent is to display the code on screen, you do not need to escape or replace anything before storing it in your database (if you intend to store it) . This doesn't apply, of course, to escaping for database insertion via something like mysql_real_escape_string(), for example (or your RDBMS' equivalent sanitization routine). That step is still absolutely necessary.
When displaying the code, just be sure that:
You DO NOT evaluate any submitted code via an eval() or system call.
When displaying code back to the browser, escape it with htmlspecialchars(). Never display it unescaped, or you will introduce cross site scripting vulnerabilities.
Use placeholders in your queries and you don't even have to escape the input.
Placeholders, binding, and prepared statements are definitely the preferred method.
It's faster for anything over 1 query as you can reuse the handles and just change the input.
It's safer. The string is not interpreted with the query... ever. What you store is what you get.
I'd need to know a bit more about your target sql to give pertinent examples, but here's some links:
PDO style binding: http://docs.php.net/pdo.prepared-statements
MySqli style binding: http://docs.php.net/manual/en/mysqli-stmt.bind-param.php
When you read it back, display with
htmlspecialchars($string, ENT_QUOTES);
ENT_QUOTES option ensures that both single and double quotes get escaped.
You don't need to escape anything (other then the usual mysql sanitation), if you don't intend to automatically run it.
I am no expert ( I only got told about this yesterday), but at least for HTML, you could try and use htmlentities (look at this ).
Once something has been converted using htmlentities, it becomes plain text, so if opened in a browser, you will see the tag and everything, (e.g. it will write <a href="blah blah">), if it's written to a log or something else, and then opened in a text based editor, you will some symbols and shnaz that represent the html entities.
If you need to convert back, you can use the html_entity_decode function, I think, but I am going to wager a guess and presume that you don't need to convert back.
For other languages, I have no idea what you should do.
I'm developing an application using Wordpress as a CMS.
I have a form with a lot of input fields which needs to be sanitized before stored in the database.
I want to prevent SQL injection, having javascript and PHP code injected and other harmful code.
Currently I'm using my own methods to sanitize data, but I feel that it might be better to use the functions which WP uses.
I have looked at Data Validation in Wordpress, but I'm unsure on how much of these functions I should use, and in what order. Can anyone tell what WP functions are best to use?
Currently I'm "sanitizing" my input by doing the following:
Because characters with accents (é, ô, æ, ø, å) got stored in a funny way in the Database (even though my tables are set to ENGINE=InnoDB, DEFAULT CHARSET=utf8 and COLLATE=utf8_danish_ci), I'm now converting input fields that can have accents, using htmlentities().
When creating the SQL string to input the data, I use mysql_real_escape_string().
I don't think this is enough to prevent attacks though. So suggestions to improvement is greatly appreciated.
Input “sanitisation” is bogus.
You shouldn't attempt to protect yourself from injection woes by filtering(*) or escaping input, you should work with raw strings until the time you put them into another context. At that point you need the correct escaping function for that context, which is mysql_real_escape_string for MySQL queries and htmlspecialchars for HTML output.
(WordPress adds its own escaping functions like esc_html, which are in principle no different.)
(*: well, except for application-specific requirements, like checking an e-mail address is really an e-mail address, ensuring a password is reasonable, and so on. There's also a reasonable argument for filtering out control characters at the input stage, though this is rarely actually done.)
I'm now converting input fields that can have accents, using htmlentities().
I strongly advise not doing that. Your database should contain raw text; you make it much harder to do database operations on the columns if you've encoded it as HTML. You're escaping characters such as < and " at the same time as non-ASCII characters too. When you get data from the database and use it for some other reason than copying it into the page, you've now got spurious HTML-escapes in the data. Don't HTML-escape until the final moment you're writing text to the page.
If you are having trouble getting non-ASCII characters into the database, that's a different problem which you should solve first instead of going for unsustainable workarounds like storing HTML-encoded data. There are a number of posts here all about getting PHP and databases to talk proper UTF-8, but the main thing is to make sure your HTML output pages themselves are correctly served as UTF-8 using the Content-Type header/meta. Then check your MySQL connection is set to UTF-8, eg using mysql_set_charset().
When creating the SQL string to input the data, I use mysql_real_escape_string().
Yes, that's correct. As long as you do this you are not vulnerable to SQL injection. You might be vulnerabile to HTML-injection (causing XSS) if you are HTML-escaping at the database end instead of the template output end. Because any string that hasn't gone through the database (eg. fetched directly from $_GET) won't have been HTML-escaped.
I am creating a forum software using php and mysql backend, and want to know what is the most secure way to escape user input for forum posts.
I know about htmlentities() and strip_tags() and htmlspecialchars() and mysql_real_escape_string(), and even javascript's escape() but I don't know which to use and where.
What would be the safest way to process these three different types of input (by process, I mean get, save in a database, and display):
A title of a post (which will also be the basis of the URL permalink).
The content of a forum post limited to basic text input.
The content of a forum post which allows html.
I would appreciate an answer that tells me how many of these escape functions I need to use in combination and why.
Thanks!
When generating HTLM output (like you're doing to get data into the form's fields when someone is trying to edit a post, or if you need to re-display the form because the user forgot one field, for instance), you'd probably use htmlspecialchars() : it will escape <, >, ", ', and & -- depending on the options you give it.
strip_tags will remove tags if user has entered some -- and you generally don't want something the user typed to just disappear ;-)
At least, not for the "content" field :-)
Once you've got what the user did input in the form (ie, when the form has been submitted), you need to escape it before sending it to the DB.
That's where functions like mysqli_real_escape_string become useful : they escape data for SQL
You might also want to take a look at prepared statements, which might help you a bit ;-)
with mysqli - and with PDO
You should not use anything like addslashes : the escaping it does doesn't depend on the Database engine ; it is better/safer to use a function that fits the engine (MySQL, PostGreSQL, ...) you are working with : it'll know precisely what to escape, and how.
Finally, to display the data inside a page :
for fields that must not contain HTML, you should use htmlspecialchars() : if the user did input HTML tags, those will be displayed as-is, and not injected as HTML.
for fields that can contain HTML... This is a bit trickier : you will probably only want to allow a few tags, and strip_tags (which can do that) is not really up to the task (it will let attributes of the allowed tags)
You might want to take a look at a tool called HTMLPUrifier : it will allow you to specify which tags and attributes should be allowed -- and it generates valid HTML, which is always nice ^^
This might take some time to compute, and you probably don't want to re-generate that HTML each time is has to be displayed ; so you can think about storing it in the database (either only keeping that clean HTML, or keeping both it and the not-clean one, in two separate fields -- might be useful to allow people editing their posts ? )
Those are only a few pointers... hope they help you :-)
Don't hesitate to ask if you have more precise questions !
mysql_real_escape_string() escapes everything you need to put in a mysql database. But you should use prepared statements (in mysqli) instead, because they're cleaner and do any escaping automatically.
Anything else can be done with htmlspecialchars() to remove HTML from the input and urlencode() to put things in a format for URL's.
There are two completely different types of attack you have to defend against:
SQL injection: input that tries to manipulate your DB. mysql_real_escape_string() and addslashes() are meant to defend against this. The former is better, but parameterized queries are better still
Cross-Site scripting (XSS): input that, when displayed on your page, tries to execute JavaScript in a visitor's browser to do all kinds of things (like steal the user's account data). htmlspecialchars() is the definite way to defend against this.
Allowing "some HTML" while avoiding XSS attacks is very, very hard. This is because there are endless possibilities of smuggling JavaScript into HTML. If you decided to do this, the safe way is to use BBCode or Markdown, i.e. a limited set of non-HTML markup that you then convert to HTML, while removing all real HTML with htmlspecialchars(). Even then you have to be careful not to allow javascript: URLs in links. Actually allowing users to input HTML is something you should only do if it's absolutely crucial for your site. And then you should spend a lot of time making sure you understand HTML and JavaScript and CSS completely.
The answer to this post is a good answer
Basically, using the pdo interface to parameterize your queries is much safer and less error prone than escaping your inputs manually.
I have a tendency to escape all characters that would be problematic in page display, Javascript and SQL all at the same time. It leaves it readable on the web and in HTML eMail and at the same time removes any problems with the code.
A vb.NET Line Of Code Would Be:
SafeComment = Replace( _
Replace(Replace(Replace( _
Replace(Replace(Replace( _
Replace(Replace(Replace( _
Replace(Replace(Replace( _
HttpUtility.HtmlEncode(Trim(strInput)), _
":", ":"), "-", "-"), "|", "|"), _
"`", "`"), "(", "("), ")", ")"), _
"%", "%"), "^", "^"), """", """), _
"/", "/"), "*", "*"), "\", "\"), _
"'", "'")
First of all, general advice: don't escape variables literally when inserting in the database. There are plenty of solutions that let you use prepared statements with variable binding. The reason to not do this explicitly is because it is only a matter of time then before you forget it just once.
If you're inserting plain text in the database, don't try to clean it on insert, but instead clean it on display. That is to say, use htmlentities to encode it as HTML (and pass the correct charset argument). You want to encode on display because then you're no longer trusting that the database contents are correct, which isn't necessarily a given.
If you're dealing with rich text (html), things get more complicated. Removing the "evil" bits from HTML without destroying the message is a difficult problem. Realistically speaking, you'll have to resort to a standardized solution, like HTMLPurifier. However, this is generally too slow to run on every page view, so you'll be forced to do this when writing to the database. You'll also have to ensure that the user can see their "cleaned up" html and correct the cleaned up version.
Definitely try to avoid "rolling your own" filter or encoding solution at any step. These problems are notoriously tricky, and you run a large risk of overlooking some minor detail that has big security implications.
I second Joeri, do not roll your own, go here to see some of the the many possible XSS attacks
http://ha.ckers.org/xss.html
htmlentities() -> turns text into html, converting characters to entities. If using UTF-8 encoding then use htmlspecialchars() instead as the other entities are not needed. This is the best defence against XSS. I use it on every variable I output regardless of type or origin unless I intend it to be html. There is only a tiny performance cost and it is easier than trying to work out what needs escaping and what doesn't.
strip_tags() - turns html into text by removing all html tags. Use this to ensure that there is nothing nasty in your input as a adjunct to escaping your output.
mysql_real_escape_string() - escapes a string for mysql and is your defence against SQL injections from little Bobby tables (better to use mysqli and prepare/bind as escaping is then done for you and you can avoid lots of messy string concatenations)
The advice given obve re avoiding HTML input unless it is essential and opting for BBCode or similar (make your own up if needs be) is very sound indeed.