In my database in some fields the data is showing like as in following screenshots:
http://i31.tinypic.com/2637l9f.jpg
http://i27.tinypic.com/1ihh6d.jpg
http://i26.tinypic.com/2yklzb4.jpg
http://i31.tinypic.com/2vbshtf.jpg
I used mysql_real_escape_string while inserting my data into database and htmlspecialchars while displaying.
Can any one tell me why they looking like this, and whats the solution?
That's Mojibake. Your PHP and MySQL code are not ready for World Domination.
To fix it properly, go through this cheatsheet and ensure that every layer is using UTF-8.
The mysql_real_escape_string() basically only prevents you from SQL injection attacks and the htmlspecialchars() basically only prevents you from XSS attacks. They do not assist in encoding or decoding the characters in any way. The character set used is responsible for that. Your problem is that you're not consistent in using the charset and/or that the charset you've chosen/used does not support the characters which the client entered and/or you'd like to use.
Related
I was wondering what would be the best way to allow users with names that contain special characters to be able to register to website witout 'pre-converting' them into non-special character names before input, but still to keep my website secure (like to make it unable or to avoid registering with a name like "-.lčćo+'90'žž++'-.." or something like that) ?
Thanks a lot.
Assuming you're storing the user data in a database, simply make sure you're storing the data with a Unicode character encoding (as opposed to ASCII, which doesn't support special characters, or at least not as many as Unicode), secure against SQL injection (look up PDO and prepared statements - here's a good tutorial), and you should be good.
I have a textarea which will be available to users as comment box so any sort of inputs are acceptable but that should be accepted only as text and not code. Basically I want to protect my database. I don't want to strip tags or such thing, I just want that if any users even inputs a code that should be stored in database as text and shouldn't be causing any harm to database. So came across these two php functions now I am not sure which one ofthese I should use as I am not able to understand difference in them.
According to official PHP docs, htmlspecialchars() and FILTER_SANITIZE_FULL_SPECIAL_CHARS should be equivalent:
Equivalent to calling htmlspecialchars() with ENT_QUOTES set. Encoding quotes can be disabled by setting FILTER_FLAG_NO_ENCODE_QUOTES. Like htmlspecialchars(), this filter is aware of the default_charset and if a sequence of bytes is detected that makes up an invalid character in the current character set then the entire string is rejected resulting in a 0-length string. When using this filter as a default filter, see the warning below about setting the default flags to 0.
Taken from here - https://www.php.net/manual/en/filter.filters.sanitize.php
Going from here, I think it would be a matter of personal preference as to which function you prefer more.
From this : http://forums.phpfreaks.com/topic/275315-htmlspecialchars-vs-filter-sanitize-special-chars/
They are quite similar yes, but as the PHP manual states
htmlspecialchars escapes a bit more than just
FILTER_SANITIZE_SPECIAL_CHARS.
That brings us to the next point, SQL injection prevention. As stated
htmlspecialchars is for escaping output to a HTML-parser, not a
database engine. The DB engine doesn't understand HTML, and doesn't
care about it either. What it does understand, is SQL queries. SQL
queries and HTML use quite different meta-characters, with only a few
in common: Quotes being the most obvious, and even that is somewhat
conditional for HTML. However, due to the other meta-characters (which
HTML does not share) using HTML escaping methods for SQL queries will
not protect you. Those meta-characters will go through
htmlspecialchars unscathed, and thus be able to cause SQL injections.
Same the other way around, if you use SQL escaping methods to escape
output going to a browser. It will not escape the < and > signs,
meaning an attacker can easily perform HTML injection attacks (XSS
etc). Not only that, but you'll suddenly have a lot of slashes in
places where there shouldn't be any. Which is quite annoying, at best.
This is why it's so important to know, and use, the proper method for
the third party system you're sending the data to. If you don't, you
are still vulnerable
I'm adding some xss protection to the website I'm working on, the platform is zendFrameWork 2 and therefor I'm using Zend\escaper. from zend documentation i knew that:
Zend\Escaper is meant to be used only for escaping data that is to be
output, and as such should not be misused for filtering input data.
For such tasks, the Zend\Filter component, HTMLPurifier.
but what are the riskes if i escaped the data before inserting it into the database, am i so wrong to do that? please explane to me as im somehow new to this topic.
thanks
When encoding data before storing it you will have to decode it before you can do anything sensible with it before outputting it. That's why I'd not do it.
Let's say you have an international application and you want to store the escaped value of a form field which might contain any NON-ASCII characters those might become escaped into HTML-Entities. So what if you have to quantify the content of that field? Like counting the characters? You will always have to de-escape the content before counting it. and then you have to re-escape it again. Much work done but nothing gained.
The same applies to search-operations in your database. You will have to escape the search-phrase the same way then your input for the database to understand what you are looking for.
I'd use one character-set throughout the application and database (I prefer UTF-8, beware of the MySQL-Connection....) and only escape content on output. Thant way I can then do whatever I like with the data and are on the safe side on output. And escaping is done in my view-layer automaticaly so I don't even have to think about it every time I handle data as it works automaticaly. That way you can't forget it.
That does not prevent me from filtering and sanitizing the input. And it doesn't prevent me from escaping the database-content using the appropriate database-escaping mechanisms like mysqli_real_escape_string or similar or using prepared statements!
But that's just my opinion, others might think otherwise!
"Output" here refers to the web page. A form field ( HTML tag) is an INPUT (from the webpage), any text is an OUTPUT (to the webpage). You need to ensure any output (to the webpage) does not contain dangerous characters that could be used to forge XSS attack vectors.
This said, if you have DANGEROUS_INPUT_X given by the user and then
$NOT_DANGEROUS_ANYMORE = ZED.HtmlPurifier(DANGEROUS_INPUT_X)
DBSave($NOT_DANGEROUS_ANYMORE)
and somewhere else
$OUTPUT = DBLoad($NOT_DANGEROUS_ANYMORE)
echo $OUTPUT
you should be fine, as long as you do not apply any additional encoding/decoding to this output. It will be displayed in the way it is saved, that was safe.
I would suggest to look at output encoding more than validation: HtmlPurifier cleans the HTML, while you could accept any kind of bad characters if you ensure your output is encoded in the page.
Here https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet some general rules, here the PHP example
echo htmlspecialchars($DANGEROUS_INPUT_X_NOW_OUTPUT, ENT_QUOTES, "UTF-8");
Remember to set the Character Set and be consistent with the same one throughout your pages/scripts/binaries and in the database as well.
I'm developing an application using Wordpress as a CMS.
I have a form with a lot of input fields which needs to be sanitized before stored in the database.
I want to prevent SQL injection, having javascript and PHP code injected and other harmful code.
Currently I'm using my own methods to sanitize data, but I feel that it might be better to use the functions which WP uses.
I have looked at Data Validation in Wordpress, but I'm unsure on how much of these functions I should use, and in what order. Can anyone tell what WP functions are best to use?
Currently I'm "sanitizing" my input by doing the following:
Because characters with accents (é, ô, æ, ø, å) got stored in a funny way in the Database (even though my tables are set to ENGINE=InnoDB, DEFAULT CHARSET=utf8 and COLLATE=utf8_danish_ci), I'm now converting input fields that can have accents, using htmlentities().
When creating the SQL string to input the data, I use mysql_real_escape_string().
I don't think this is enough to prevent attacks though. So suggestions to improvement is greatly appreciated.
Input “sanitisation” is bogus.
You shouldn't attempt to protect yourself from injection woes by filtering(*) or escaping input, you should work with raw strings until the time you put them into another context. At that point you need the correct escaping function for that context, which is mysql_real_escape_string for MySQL queries and htmlspecialchars for HTML output.
(WordPress adds its own escaping functions like esc_html, which are in principle no different.)
(*: well, except for application-specific requirements, like checking an e-mail address is really an e-mail address, ensuring a password is reasonable, and so on. There's also a reasonable argument for filtering out control characters at the input stage, though this is rarely actually done.)
I'm now converting input fields that can have accents, using htmlentities().
I strongly advise not doing that. Your database should contain raw text; you make it much harder to do database operations on the columns if you've encoded it as HTML. You're escaping characters such as < and " at the same time as non-ASCII characters too. When you get data from the database and use it for some other reason than copying it into the page, you've now got spurious HTML-escapes in the data. Don't HTML-escape until the final moment you're writing text to the page.
If you are having trouble getting non-ASCII characters into the database, that's a different problem which you should solve first instead of going for unsustainable workarounds like storing HTML-encoded data. There are a number of posts here all about getting PHP and databases to talk proper UTF-8, but the main thing is to make sure your HTML output pages themselves are correctly served as UTF-8 using the Content-Type header/meta. Then check your MySQL connection is set to UTF-8, eg using mysql_set_charset().
When creating the SQL string to input the data, I use mysql_real_escape_string().
Yes, that's correct. As long as you do this you are not vulnerable to SQL injection. You might be vulnerabile to HTML-injection (causing XSS) if you are HTML-escaping at the database end instead of the template output end. Because any string that hasn't gone through the database (eg. fetched directly from $_GET) won't have been HTML-escaped.
I have a simple textbox in a form and I want to safely store special characters in the database after POST or GET and I use the code below.
$text=mysql_real_escape_string(htmlspecialchars_decode(stripslashes(trim($_GET["text"])),ENT_QUOTES));
When I read the text from the database and put it in the text value I use the code above.
$text=htmlspecialchars($text_from_DB,ENT_QUOTES,'UTF-8',false);
<input type="text" value="<?=$text?>" />
I am trying to save in the database with no special characters (meaning I don't want to write in database field " or ')
Actually when writing to the database do htmlspecialchars_decode to the text.
When writing to the form text box do htmlspecialchars to the text.
Is this the best approach for safe writing special chars to the database?
You have the right idea of keeping the text in the database as raw. Not sure what all the HTML entity stuff is for; you shouldn't need to be doing that for a database insertion.
[The only reason I can think of why you might try to entity-decode incoming input for the database would be if you find you are getting character references like Š in your form submission input. If that's happening, it's because the user is inputting characters that don't exist in the encoding used by the page with the form. This form of encoding is totally bogus because you then can't distinguish between the user typing Š and literally typing Š! You should avoid this by using the UTF-8 encoding for all your pages and content, as every possible character fits in this encoding.]
Strings in your script should always be raw text with no escaping. That means you don't do anything to them until the time you output them into a context that isn't plain-text. So for putting them into an SQL string:
$category= trim($_POST['category']);
mysql_query("SELECT * FROM things WHERE category='".mysql_real_escape_string($category)."'");
(or use parameterised queries to avoid having to manually escape it.) When putting content into HTML:
<input type="text" name="category" value="<?php echo htmlspecialchars($category); ?>" />
(you can define a helper function with a shorter name like function h($s) { echo htmlspecialchars($s, ENT_QUOTES); } if you want to cut down on the amount of typing you have to do in templates.)
And... that's pretty much it. You don't need to process strings that come out of the database, as they're already raw strings. You don't need to process input strings(*), other than any application-specific field validation you want to do.
*: well, except if magic_quotes_gpc is turned on, in which case you do either need to stripslashes() everything that comes in from get/post/cookie, or, my favoured option, just immediately fail:
if (get_magic_quotes_gpc())
die(
'Magic quotes are turned on. They are utterly bogus and no-one should use them. '.
'Turn them off, you idiot, or I refuse to run. So there!'
);
When you write to db, use htmlentities but when you read back, use html_entity_decode function.
As a sidenote, if you are looking for some security, then for strings use mysql_real_escape_string and for numbers use intval.
I'd like to point out a couple of things:
there is nothing wrong in saving characters like ' and " in a database, SQL injections are just a matter of string manipulation, they actually have nothing to do with SQL or databases -- the problem only relies in how the query string is built. If you want to write your own queries (not recommended) you don't have to encode every apostrophe or double quote: just escape them once to build a safe string, and save them in the database. A better approach is using PDO as mentioned, or using the mysqli extension which allows queries with prepared statements
htmlentities() and similar functions should be used when sending data as output to the browser, not for encoding data to be stored in a database for at least two reasons: first of all it's useless, the DB doesn't care about html entities, it just contains data; secondly you should always treat data coming from the database as potentially insecure, so you should save it in "raw" format and encode it when using it.
The best approach to safe write to a DB is to use the PDO abstraction layer and make use of prepared statements.
http://www.php.net/manual/en/intro.pdo.php
A good tutorial (I learned from this one) is
http://www.phpro.org/tutorials/Introduction-to-PHP-PDO.html
However, you might have to rewrite alot of your site just to implement this. But this is no doubt the most elegant method than having to make use of all those functions. Plus, prepared statements are becoming the de facto now. Another benefit of this is that you do not have to rewrite your queries if you switch to a different database (such as from MySQL to PostgreSQL). But I would say consider this if you plan to scale your site.