For wont of avoiding SQL injection attacks, I'm looking to cleanse all of the text (and most other data) entered by the user of my website before sending it into the database for storage.
I was under the impression that the function inserted backslashes ( \ ) before all characters capable of being malicious ( \n , ' , " , etc ), and expected that the returned string would contain the newly added backslashes.
I performed a simple test on a made up string containing such potentially malicious characters and echo'd it to the document, seeing exactly what I expected: the string with backslashes escaping these characters.
So, I proceeded to add the cleansing function to the data before storing into the database. I inserted it (mysqli_real_escape_string( $link , $string)) into the query I build for data storage. Testing the script, I was surprised (a bit to my chagrin) to notice that the data stored in the database did not seem to contain the backslashes. I tested and tested and tested, but all to no avail, and I'm at a loss...
Any suggestions? Am I missing something? I was expecting to then have to remove the backslashes with the stripslashes($string) function, but there doesn't seem to be anything to strip...
When you view your data in the database after a successful insert, having escaped it with mysql_real_escape_string(), you will not see the backslashes in the database. This is because the escaping backslashes are only needed in the SQL query statement. mysql_real_escape_string() sanitizes it for insert (or update, or other query input) but doesn't result in a permanently modified version of the data when it is stored.
In general, you do not want to store modified or sanitized data in your database, but instead should be storing the data in its original version. For example, it is best practice to store complete HTML strings, rather than to store HTML that has been encoded with PHP's htmlspecialchars().
When you retrieve it back out from the database, there is no need for stripslashes() or other similar unescaping. There are some legacy (mis-)features of PHP like magic_quotes_gpc that had been designed to protect programmers from themselves by automatically adding backslashes to quoted strings, requiring `stripslashes() to be used on output, but those features have been deprecated and now mostly removed.
MySQL stores the data without the slashes (although it is passed to the RDBMS with the slashes). So you don't need to use stripslashes() later on.
You can be sure that the string was escaped, cause otherwise, the query would have failed.
I'm looking to cleanse all of the text (and most other data) entered by the user of my website
This is what you are doing wrong.
mysqli_real_escape_string does not "cleanse" anything. There is no word "cleanse" in it's name.
You should format, not "cleanse" your data. And different data require different formatting.
You should format ALL the data, not only data entered by the user of my website
In the current form you are leaving your site highly vulnerable to attacks and errors.
I was under the impression that the function inserted backslashes ( \ ) before all characters capable of being malicious ( \n , ' , " , etc ),
To let you know, there is nothing malicious in any character. There are some service characters, that can be misinterpreted in some circumstances.
But adding backslashes doesn't make your data automatically "safe". Some injections doesn't require any special characters. So, you need to properly format your data, not just use a some sort of magic that will make you magically safe
Related
I'm trying to sanitize a string to be saved in a db.
First step I took was to use addslashes(), but then I realized it didn't solve many security issues, so I added htmlspecialchars(), and now I have this line of code:
$val=htmlspecialchars(addslashes(trim($val)));
But then I was wondering if it makes any sense at all to use addslashes() on a string that will be processed by htmlspecialchars(), since the latter will "remove" any element that would cause problems, if I'm not mistaken.
In particular, I was wondering if that makes the server work twice without any real need.
You are wrong alltogether. addslashes() is no database escaping function, use the one that comes with your database access extension, like mysqli_real_escape_string().
htmlspecialchars() completey does not makes sense here. Only use it if you want to place a string within HTML - that should be when you output stuff, not when storing it in the database.
I wouldn't use either of those when saving the string to the database.
addslashes() escapes only quote characters and the backslash character (\). It's not adequate for avoiding SQL injection, because the DBMS may use other special characters which would have to be escaped as well. The best way to avoid SQL injection is to use PHP data objects and its support for bind parameters, which let you keep the parameter values out of the SQL string entirely. If PDO isn't an option for some reason, you should at least use a database-specific escaping function, e.g. mysqli_real_escape_string if you're using MySQL, to ensure that all the necessary characters are escaped.
htmlspecialchars() is for use when incorporating a non-HTML string into an HTML page; it escapes characters that are significant to a web browser, such as angle brackets, and has nothing to do with databases. Assuming that you're not generating and storing complete HTML documents in your database, you shouldn't be calling this function on values before putting them into the database. Store what the user actually entered, and call htmlspecialchars() when you retrieve the value from the database and you're about to actually put it into some HTML output.
I want to allow user to put his data into text filed . that text field will be stored in database . And on future steps , this text will be displayed in some pages . Of course in a same way , that user that created . OK, consider this stackoverflow example , i m allowed to put any code or text , anything ; and that code or anything is simple ignored it by its server . so how is this working .
My problem is , i cant trust on users .. user can put anything .. ( may be code -> sql or simple text ) . so i planned to use mysql_real_escape_string() but this function is putting some slash in malicious code. its good .. but i want to put user entered string into database so that i can use it later ( not that sanitized string ) . so how can i ?
Indeed , i am developing CMS which is using database class ( this ) I read about PDO , but making use of this concept may let me to change everything . i want a way except PDO approach . parametric approach favorable
mysql_real_escape_string() does not sanitize or mess up your input in any way, it just prepares your text to be a valid part of a SQL insert statement.
If you get duplicate backslashes before an apostrophe, check if you maybe have "magic quotes" enabled.
An option for you would also be to start using mysqli driver, then you can use prepared statements. This syntax works better against SQL injections. See responses on this SO post: Does mysqli class in PHP protect 100% against sql injections?
When inserting user-provided content into the database, use query parameters or at least escaping to prevent SQL injection. See also my answer to What is SQL injection?
Even if you get strings of code inserted safely into the database, you have a second possible vulnerability:
When displaying content, be aware of risks of Cross-Site Scripting (XSS). When you display the content from the database in an HTML output, it could contain HTML tags or Javascript code that is executed as part of the web page instead of displaying the code.
To help prevent XSS, you must convert tag-open characters with the HTML entity, for instance < should be output as <. This makes sure it is shown as a literal '<' and not interpreted by the user's browser as another tag.
How about encoding the entire string and then inserting it? I use Base64_encode to encode, and do the reverse when retrieving from the database. The characters are alphanumerics (with ==) and they aren't harmful.
You can push the entire encoded string to the client-side and decode it with Javascript.
Here is an example
if (isset($_POST['userdata'])) {
$safestring= base64_encode($_POST['userdata']);
mysql_query("UPDATE table_name SET value_name = '$safestring'
WHERE some_username = 'username'");
}
I usually escape user input by doing the following:
htmlspecialchars($str,ENT_QUOTES,"UTF-8");
as well as mysql_real_escape_string($str) whenever a mysql connection is available.
How can this be improved? I have not had any problems with this so far, but I am unsure about it.
Thank you.
Data should be escaped (sanitized) for storage and encoded for display. Data should never be encoded for storage. You want to store only the raw data. Note that escaping does not alter raw data at all as escape characters are not stored; they are only used to properly signal the difference between raw data and command syntax.
In short, you want to do the following:
$data = $_POST['raw data'];
//Shorthand used; you all know what a query looks like.
mysql_query("INSERT " . mysql_real_escape_string($data));
$show = mysql_query("SELECT ...");
echo htmlentities($show);
// Note that htmlentities() is usually overzealous.
// htmlspecialchars() is enough the majority of the time.
// You also don't have to use ENT_QUOTES unless you are using single
// quotes to delimit input (or someone please correct me on this).
You may also need to strip slashes from user input if magic quotes is enabled. stripslashes() is enough.
As for why you should not encode for storage, take the following example:
Say that you have a DB field that is char(5). The html input is also maxlength="5". If a user enters "&&&&&", which may be perfectly valid, this is stored as "&&." When it's retrieved and displayed back to the user, if you do not encode, they will see "&&," which is incorrect. If you do encode, they see "&&," which is also incorrect. You are not storing the data that the user intended to store. You need to store the raw data.
This also becomes an issue in a case where a user wants to store special characters. How do you handle the storage of these? You don't. Store it raw.
To defend against sql injection, at the very least escape input with mysql_real_escape_string, but it is recommended to use prepared statements with a DB wrapper like PDO. Figure out which one works best, or write your own (and test it thoroughly).
To defend against XSS (cross-site-scripting), encode user input before it is displayed back to them.
If you only use mysql_real_escape_string($str) to avoid sql injection, make sure you always add single quotes around it in your query.
The htmlspecialchars is fine when parsing unsafe output to the screen.
For the database switch to PDO.
It's much easier and does the escaping for you.
http://php.net/pdo
I read this tutorial about storing images in DB. In the tutorial, the author escapes special characters in the binary data before inserting: http://www.phpriot.com/articles/images-in-mysql/7 ( using addslashes although mysql_real_escape_string is preferable - but that is another issue ).
The point is, when displaying, he just displays the data as it is stored: http://www.phpriot.com/articles/images-in-mysql/8
My questions:
1) Do we need to escape special characters even for binary field type (blob)?
2) If so, then, do we not need to "unescape" the characters again in order to display the image correctly? (If so, what is the best way to do it. Any comments about efficiency? For large images: escaping and unescaping can be a big overhead?).
Or is it that my understanding about escaping is totally wrong (and escaping only affects the query and not the final data inserted/stored?).
thanks
JP
Your understanding of escaping is wrong. The data being inserted into the database is escaped, so that the query parser sees the information as intended.
Take the string "Jean-Luc 'Earl Grey' Picard".
Escaping results in: 'Jean-Luc \'Earl Grey\' Picard'
When MySQL receives this, it understands that the escaped quotes need to be taken literally, that is what escaping means, and will store them in the database. It will not store the escape-characters in the database. The \ indicates to MySQL that it should take the character following it literally.
When retrieving, the data is presented to your application without the escaping characters, as they are removed when parsing the query.
1) Do we need to escape special characters even for binary field type (blob)?
Yes, because mysql_real_escape_string() (which is indeed the one to use) provides protection against SQL injection attacks, which could easily be inside an image file as well. Any arbitrary data you feed into a database must be sanitized first.
I'm developing an application using Wordpress as a CMS.
I have a form with a lot of input fields which needs to be sanitized before stored in the database.
I want to prevent SQL injection, having javascript and PHP code injected and other harmful code.
Currently I'm using my own methods to sanitize data, but I feel that it might be better to use the functions which WP uses.
I have looked at Data Validation in Wordpress, but I'm unsure on how much of these functions I should use, and in what order. Can anyone tell what WP functions are best to use?
Currently I'm "sanitizing" my input by doing the following:
Because characters with accents (é, ô, æ, ø, å) got stored in a funny way in the Database (even though my tables are set to ENGINE=InnoDB, DEFAULT CHARSET=utf8 and COLLATE=utf8_danish_ci), I'm now converting input fields that can have accents, using htmlentities().
When creating the SQL string to input the data, I use mysql_real_escape_string().
I don't think this is enough to prevent attacks though. So suggestions to improvement is greatly appreciated.
Input “sanitisation” is bogus.
You shouldn't attempt to protect yourself from injection woes by filtering(*) or escaping input, you should work with raw strings until the time you put them into another context. At that point you need the correct escaping function for that context, which is mysql_real_escape_string for MySQL queries and htmlspecialchars for HTML output.
(WordPress adds its own escaping functions like esc_html, which are in principle no different.)
(*: well, except for application-specific requirements, like checking an e-mail address is really an e-mail address, ensuring a password is reasonable, and so on. There's also a reasonable argument for filtering out control characters at the input stage, though this is rarely actually done.)
I'm now converting input fields that can have accents, using htmlentities().
I strongly advise not doing that. Your database should contain raw text; you make it much harder to do database operations on the columns if you've encoded it as HTML. You're escaping characters such as < and " at the same time as non-ASCII characters too. When you get data from the database and use it for some other reason than copying it into the page, you've now got spurious HTML-escapes in the data. Don't HTML-escape until the final moment you're writing text to the page.
If you are having trouble getting non-ASCII characters into the database, that's a different problem which you should solve first instead of going for unsustainable workarounds like storing HTML-encoded data. There are a number of posts here all about getting PHP and databases to talk proper UTF-8, but the main thing is to make sure your HTML output pages themselves are correctly served as UTF-8 using the Content-Type header/meta. Then check your MySQL connection is set to UTF-8, eg using mysql_set_charset().
When creating the SQL string to input the data, I use mysql_real_escape_string().
Yes, that's correct. As long as you do this you are not vulnerable to SQL injection. You might be vulnerabile to HTML-injection (causing XSS) if you are HTML-escaping at the database end instead of the template output end. Because any string that hasn't gone through the database (eg. fetched directly from $_GET) won't have been HTML-escaped.