I was wondering what would be the best way to allow users with names that contain special characters to be able to register to website witout 'pre-converting' them into non-special character names before input, but still to keep my website secure (like to make it unable or to avoid registering with a name like "-.lčćo+'90'žž++'-.." or something like that) ?
Thanks a lot.
Assuming you're storing the user data in a database, simply make sure you're storing the data with a Unicode character encoding (as opposed to ASCII, which doesn't support special characters, or at least not as many as Unicode), secure against SQL injection (look up PDO and prepared statements - here's a good tutorial), and you should be good.
Related
I'm adding some xss protection to the website I'm working on, the platform is zendFrameWork 2 and therefor I'm using Zend\escaper. from zend documentation i knew that:
Zend\Escaper is meant to be used only for escaping data that is to be
output, and as such should not be misused for filtering input data.
For such tasks, the Zend\Filter component, HTMLPurifier.
but what are the riskes if i escaped the data before inserting it into the database, am i so wrong to do that? please explane to me as im somehow new to this topic.
thanks
When encoding data before storing it you will have to decode it before you can do anything sensible with it before outputting it. That's why I'd not do it.
Let's say you have an international application and you want to store the escaped value of a form field which might contain any NON-ASCII characters those might become escaped into HTML-Entities. So what if you have to quantify the content of that field? Like counting the characters? You will always have to de-escape the content before counting it. and then you have to re-escape it again. Much work done but nothing gained.
The same applies to search-operations in your database. You will have to escape the search-phrase the same way then your input for the database to understand what you are looking for.
I'd use one character-set throughout the application and database (I prefer UTF-8, beware of the MySQL-Connection....) and only escape content on output. Thant way I can then do whatever I like with the data and are on the safe side on output. And escaping is done in my view-layer automaticaly so I don't even have to think about it every time I handle data as it works automaticaly. That way you can't forget it.
That does not prevent me from filtering and sanitizing the input. And it doesn't prevent me from escaping the database-content using the appropriate database-escaping mechanisms like mysqli_real_escape_string or similar or using prepared statements!
But that's just my opinion, others might think otherwise!
"Output" here refers to the web page. A form field ( HTML tag) is an INPUT (from the webpage), any text is an OUTPUT (to the webpage). You need to ensure any output (to the webpage) does not contain dangerous characters that could be used to forge XSS attack vectors.
This said, if you have DANGEROUS_INPUT_X given by the user and then
$NOT_DANGEROUS_ANYMORE = ZED.HtmlPurifier(DANGEROUS_INPUT_X)
DBSave($NOT_DANGEROUS_ANYMORE)
and somewhere else
$OUTPUT = DBLoad($NOT_DANGEROUS_ANYMORE)
echo $OUTPUT
you should be fine, as long as you do not apply any additional encoding/decoding to this output. It will be displayed in the way it is saved, that was safe.
I would suggest to look at output encoding more than validation: HtmlPurifier cleans the HTML, while you could accept any kind of bad characters if you ensure your output is encoded in the page.
Here https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet some general rules, here the PHP example
echo htmlspecialchars($DANGEROUS_INPUT_X_NOW_OUTPUT, ENT_QUOTES, "UTF-8");
Remember to set the Character Set and be consistent with the same one throughout your pages/scripts/binaries and in the database as well.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
After reviewing this I realised I still have a few questions left regarding the topic.
Are there any characters that should be 'left out' for legitimate security purposes? This includes all characters, such as brackets, commas, apostrophes, and parentheses.
While on this subject, I admittedly don't understand why admins seem to enjoy enforcing the "you can only use the alphabet, numbers, and spaces" rule. Does anything else have the potential to be a security flaw or break something I'm not aware of (even in ASCII)? As far as I've seen during my coding days there is absolutely no reason that any character should be barred from being in a username.
There's no security reason to not use certain characters. If you're properly handling all input, it doesn't make any difference whether you're only handling alphanumeric characters or Chinese.
It is easier to handle only alphnum usernames. You don't need to think about ambiguity with collations in your database, encoding usernames in URLs and things like that. But again, if you're properly handling it, there's no technical reason against it.
For practical reasons passwords are often only alphanumeric. Most password inputs don't accept IME input for example, so it's almost impossible to have a Japanese password. There's no security reason for disallowing non-alphanum characters though. On the contrary, the larger the usable alphabet, the better.
If your application handles Unicode input properly throughout, I'd certainly allow non-ASCII characters in usernames and passwords, with a few caveats:
If you use HTTP Basic Authentication, you can't properly support non-ASCII characters in usernames and passwords, because the process of passing those details involves an encode-to-bytes-in-base64 step that, currently, browsers don't agree on:
Safari uses ISO-8859-1, and breaks if there are any non-8859-1 characters present;
Mozilla uses the low byte of each character encoded to UTF-16 code units (same as ISO-8859-1 for those characters);
Opera and Chrome use UTF-8
IE uses the ANSI code page on the system it's installed on, which could be anything, but neever ISO-8859-1 or UTF-8. Characters that don't fit the encoding are arbitrarily mangled.
If you use cookies, you must ensure any Unicode characters are encoded in some way (eg URL-encoding), as once again trying to send non-ASCII characters gives vastly different results in different browsers.
"you can only use the alphabet, numbers, and spaces"
You get spaces? Luxury!
It are often exactly those characters which can be used to inject malicious code in your program. For example SQL injection (quotes, dashes, etc), XSS/CSRF (quotes, fish braces, etc) or even programming language injection when eval() is used elsewhere in your code.
Those characters does usually not harm when you as being the developer sanitize the user-controlled input/output properly, i.e. everything which comes in with the HTTP request; the headers, parameters and body. E.g. parameterized queries or using mysql_real_escape_string() when inlining them in a SQL query to prevent SQL injections and htmlspecialchars() when inlining them in HTML to prevent XSS. But I can imagine that admins don't trust all developers, so they add those restrictions.
See also:
OWASP on PHP top 5 vulrenabilities
I don't think there is a reason to not allow unicode in username. Passwords are different story, since you don't usually see password when you type it into a form, allowing only ASCII makes sense to prevent possible confusion.
I think it makes sense to use email address as the login credential rather than requiring create a new username. Then user can select any nickname, using any unicode characters and have that nick displayed next to user's posts and comments.
Isn't this how it's done on Facebook?
I think that most of the time when things (usernames or passwords) are being forced down to ASCII, it's because someone is afraid that more complex character sets will cause breakage in some unknown component. Whether this fear is justified or not is case dependent, but trying to verify that your entire stack really does Unicode correctly in all cases might be difficult. It's getting better every day, but you can still find problems with Unicode in some places.
I personally keep my usernames and passwords all ASCII, and I even try not to use too much punctuation. One reason is that some input devices (like some mobile phones) make it kind of difficult to get to some of the more esoteric characters. Another reason is that I've more than once encountered a system that had no restrictions on the password contents, but then screwed up if you actually used something other than a letter or number.
There is a risk involved if some parts of your program assume strings with different bytes are different, but other parts of the program would compare strings according to unicode semantics and think they're the same.
For example filesystems on Mac OS X enforce uniform representation of Unicode characters, so two different filenames Ą ('A with ogonek') and A+̨ (latin A followed by 'combining ogonek') will refer to the same file.
Similarly one can produce invalid UTF-8 byte sequences where 1-byte codepoints are encoded usnig multiple bytes (called overlong sequences). If you normalize or reject UTF-8 input before processing it it'll be safe, but e.g. if you use Unicode-ignorant programming language and Unicode-aware database these two will see different inputs.
So to avoid that:
You should filter UTF-8 input as early as possible. Reject invalid/overlong sequences.
When comparing Unicode stings always convert both sides of comparison to the same Unicode Normal Form. For usernames you might want NFKD to reduce amount of homograph attacks possible.
In my database in some fields the data is showing like as in following screenshots:
http://i31.tinypic.com/2637l9f.jpg
http://i27.tinypic.com/1ihh6d.jpg
http://i26.tinypic.com/2yklzb4.jpg
http://i31.tinypic.com/2vbshtf.jpg
I used mysql_real_escape_string while inserting my data into database and htmlspecialchars while displaying.
Can any one tell me why they looking like this, and whats the solution?
That's Mojibake. Your PHP and MySQL code are not ready for World Domination.
To fix it properly, go through this cheatsheet and ensure that every layer is using UTF-8.
The mysql_real_escape_string() basically only prevents you from SQL injection attacks and the htmlspecialchars() basically only prevents you from XSS attacks. They do not assist in encoding or decoding the characters in any way. The character set used is responsible for that. Your problem is that you're not consistent in using the charset and/or that the charset you've chosen/used does not support the characters which the client entered and/or you'd like to use.
I'm developing an application using Wordpress as a CMS.
I have a form with a lot of input fields which needs to be sanitized before stored in the database.
I want to prevent SQL injection, having javascript and PHP code injected and other harmful code.
Currently I'm using my own methods to sanitize data, but I feel that it might be better to use the functions which WP uses.
I have looked at Data Validation in Wordpress, but I'm unsure on how much of these functions I should use, and in what order. Can anyone tell what WP functions are best to use?
Currently I'm "sanitizing" my input by doing the following:
Because characters with accents (é, ô, æ, ø, å) got stored in a funny way in the Database (even though my tables are set to ENGINE=InnoDB, DEFAULT CHARSET=utf8 and COLLATE=utf8_danish_ci), I'm now converting input fields that can have accents, using htmlentities().
When creating the SQL string to input the data, I use mysql_real_escape_string().
I don't think this is enough to prevent attacks though. So suggestions to improvement is greatly appreciated.
Input “sanitisation” is bogus.
You shouldn't attempt to protect yourself from injection woes by filtering(*) or escaping input, you should work with raw strings until the time you put them into another context. At that point you need the correct escaping function for that context, which is mysql_real_escape_string for MySQL queries and htmlspecialchars for HTML output.
(WordPress adds its own escaping functions like esc_html, which are in principle no different.)
(*: well, except for application-specific requirements, like checking an e-mail address is really an e-mail address, ensuring a password is reasonable, and so on. There's also a reasonable argument for filtering out control characters at the input stage, though this is rarely actually done.)
I'm now converting input fields that can have accents, using htmlentities().
I strongly advise not doing that. Your database should contain raw text; you make it much harder to do database operations on the columns if you've encoded it as HTML. You're escaping characters such as < and " at the same time as non-ASCII characters too. When you get data from the database and use it for some other reason than copying it into the page, you've now got spurious HTML-escapes in the data. Don't HTML-escape until the final moment you're writing text to the page.
If you are having trouble getting non-ASCII characters into the database, that's a different problem which you should solve first instead of going for unsustainable workarounds like storing HTML-encoded data. There are a number of posts here all about getting PHP and databases to talk proper UTF-8, but the main thing is to make sure your HTML output pages themselves are correctly served as UTF-8 using the Content-Type header/meta. Then check your MySQL connection is set to UTF-8, eg using mysql_set_charset().
When creating the SQL string to input the data, I use mysql_real_escape_string().
Yes, that's correct. As long as you do this you are not vulnerable to SQL injection. You might be vulnerabile to HTML-injection (causing XSS) if you are HTML-escaping at the database end instead of the template output end. Because any string that hasn't gone through the database (eg. fetched directly from $_GET) won't have been HTML-escaped.
I'm programming a user registration. However, I found a charactor limiting to 'username' from sample codes, such as just '.', ''', '-' are accepted, no space or other blank, etc.
Are those restrictions necessary?
I'm using MySQL+PHP. If I adopt the following several ways:
change the collation of the column to 'utf8_general_ci';
pull in the function 'mysql_escape_string' or 'mysql_real_escape_string' to PHP;
create a relation table about username <-> userID (the 'username' is what the client input, userID is a INT number.). As well as just use 'userID' in the database, but 'username' only display in HTMLs.
Do I really need a regular expression?
Thank you for your help.
PS: I'm a Chinese, so Chinese characters are required.
Since this is a web app, apart from using mysql_real_escape_string, I would also recommend stripping or disallowing anything that can construct HTML. Generally forbidding "<" and ">" is enough.
You really wouldn't want some user to enter their name as:
<script src=http://malicious/script.js></script>
The alternative solution is to use htmlspecialchars when outputting data to your page.
Those restrictions are not neccessary, however you must ensure, that username is a valid and unique string (with or without regex)
when it comes to stripping vulnerable characters from strings and mysql injection I would advice to use Mysqli extension for prepared statements, which takes care of escaping and you don't have to escape every string manually
it's up to you, but using mysql_real_escape_string should be fine in my opinion (and I'm pretty paranoid)