How can I sanitize a string that receives a hash+random salt?
I can remove the white spaces, check the length and use mysqli_real_escape_string, but is it sufficient? The filter_var is really useful but it can't help in this case, right?
If you are going to put the variable in an SQL query, then you either need to call mysqli_read_escape_string or (even better!) use prepared statements.
There's no other sanitization you need to do. However, if the value will be coming from freeform user input (e.g. a text box instead of a drop down menu) then you may also want to trim whitespace and lowercase it as a courtesy to the user (to correct accidental mistakes they might make). It really depends on the application.
Just to be clear, you're receiving from an un-trusted source a hash (effectively random data) + salt (actually random data), and you want to 'sanitize' it? There is probably a definition of sanity that applies (a data format like base64 encoding, a maximum / expected length), but I strongly suspect there is a functional security mistake in there somewhere.
Most notably, why are you accepting a hash+salt from an un-trusted source, rather than accepting a password and doing the transformation within your trusted environment? Accepting a hash+salt from an un-trusted source probably turns them into plain-text equivalents (you lose the benefit you got from hashing and salting the original password).
First validate that the password matches your given validation rules. You can use a regular expression for this. Often passwords may consistent of a-z, 0-9, perhaps some punctuation and must be within a certain length - say 6-12 characters. Use preg_match() to validate the string for its contents and length. Something like preg_match('/^[a-z0-9]{6,12}$/i',$pass) might be a start.
Next you can hash the password. You may use the function crypt() to do so. This will create a one-way encrypted string that you can use to compare against later when the user attemps to authenticate.
Finally, to store the password, yes using mysqli_real_escape_string() will do the trick to prepare it for use in your SQL insert or update statement.
Related
I have a dynamic PHP web app which gets input params in the url (no surprise here). However, bingbot sometimes requests etremely long URLs from the site. E.g. > 10000 characters long urls. One of the inputs is an UTF name and bingbot somehow submits sketchy input names, thousands of characters long like this: \xc2\x83\xc3\x86... (goes on for thousands of characters).
Obviously, it gets a 404, because there is no such name in the database (and therefore no such page), but it occurred to me whether it's worth it to check the input length before querying the db (e.g. a name cannot be more than 100 characters long) and return a 404 instantly if it's too long. Is it standard practice? Or it's not worth the trouble, because the db handles it?
I'm thinking of not putting extra load on the db unnecessarily. Is this long input submitted as is by the db client interface (two calls: first a prepare for sanitizing the input and then the actual query) or the php db client knows the column size and truncates the input string before sending it down the wire?
Not only what you're asking is more than legit, but I'd say it's something that you should be doing as part of the input filtering/validation. If you expect your input to be always shorter than 100 characters, everything that's longer should be filtered.
Also, it appears that you're getting UTF-8 strings: if you're not expecting them, you could simply filter out all characters that are not part of the standard ASCII set (even reduced, filtering all control characters away. For example $string = filter_var($input, FILTER_SANITIZE_FULL_SPECIAL_CHARS, FILTER_FLAG_STRIP_LOW).
This is not just a matter of DB performance, but also security!
PS: I hardly doubt that bot is actually Bing. Seems like a bot trying to hack your website.
Addendum: some suggestions about input validation
As I wrote above in some comments (and as others have written too), you should always validate every input. No matter what is that or where it comes from: if it comes from outside, it has to be validated.
The general idea is to validate your input accordingly to what you're expecting. With $input any input variable (anything coming from $_GET, $_POST, $_COOKIE, from external API's and from many $_SERVER variables as well - plus anything more that could be altered by a user, use your judgement and in doubt be overly cautious).
If you're requesting an integer or float number, then it's easy: just cast the input to (int) or (float)
$filtered = (int)$input;
$filtered = (float)$input;
If you're requesting a string, then it's more complicated. You should think about what kind of string you are requesting, and filter it accordingly. For example:
If you're expecting a string like a hexadecimal id (like some databases use), then filter all characters outside the 0-9A-Fa-f range: $filtered = preg_replace('/[^0-9A-Fa-f]/', '', $input);
If you're expecting an alphanumeric ID, filter it, removing all characters that are not part of that ASCII range. You can use the code posted above: $string = filter_var($input, FILTER_SANITIZE_FULL_SPECIAL_CHARS, FILTER_FLAG_STRIP_LOW);. This one removes all control characters too.
If you're expecting your input to be Unicode UTF-8, validate it. For example, see this function: https://stackoverflow.com/a/1523574/192024
In addition to this:
Always encode HTML tags. FILTER_SANITIZE_FULL_SPECIAL_CHARS will do that as well on filter_var. If you don't do that, you risk XSS (Cross-Site Scripting) attacks.
If you want to remove control characters and encode HTML entities but without removing the newline chracters (\n and \r), then you can use: $filtered = preg_replace('/[\x00-\x09\x0B\x0C\x0E-\x1F\x7F]/u', '', htmlspecialchars($input, ENT_COMPAT, 'UTF-8'));
And much more. Use your judgement always.
PS: My approach to input filtering is to prefer sanitization. That is, remove everything "dangerous" and accept the sanitized input as if that was what the user wrote. Other persons will instead argue that input should only be accepted or refused.
Personally, I prefer the "sanitize and use" approach for web applications, as your users still may want to see something more than an error web page; on desktop/mobile apps I go with the "accept or refuse" method instead.
However, that's just a matter of personal preference, backed only by what my guts tell me about UX. You're free to follow the approach you prefer.
There should be some sort of validation done on any data before it is used in a query. If you have a limit on the length of the name, then you could use that as part of the validation when checking the input. If it's over the limit, it can't be in there and then handle it accordingly. Whether it's a 404 or a page that displays an error message.
The load will go down if you are bypassing queries because a name is too long. Depending on how you are querying the database, LIKE or MATCH AGAINST and how your indexes are set up, will determine just how much load will go down.
I dont know; may be it seems crazy or totally unprofessional newbie question. However is it good choice to convert four spaces to tab for a password field?
Here is what I want to do- whenever the user put password in the password field; I want to trim the left and right whitespace (if any)! and in the middle of string if user put four spaces convert it to TAB key value (or vice versa??) and then hash the value..
I want to mention that the password field will accept whitespace and the password field is not only restricted to English character set.
Is it good practice?
Trimming the start and end is definitely good practice.
However converting whitespace characters to a tab would be a very bad idea. How would the user be able to log in? When they press the Tab button in the password box the browser will move the focus out of the password box to the next control on the page. There is no way for them to be able to type a Tab into the password!
Leave any spaces in the middle of the password as they are.
One can discuss the trimming of the password, i myself think it is a good idea.
Altering the password although, wont give you any benefit and can even be harmful. Assuming that you are properly hashing passwords before storing them, you can see the alteration as just an additional part of the hashing algorithm. Whatever changes you make, the entropy of the password cannot be increased by an algorithm, the password cannot become any stronger. On the other side it can decrease the entropy. An easy example:
The same password, once with 4 spaces, once with a single tab will result in the same hash-value.
So go with trimmed passwords for convenience if you like, but leave the content of the password unaltered.
great that you plan on removing the leading/trailing spaces, however I don't see a reason to change those spaces to tabs, since it's just an extra step before encrypting them.
If there's no good reason to put something in it's generally better to... not put it in
(edit: I'm assuming the same check would be in place on login)
(ps: this type of question isn't really fit for StackOverflow though since it involves personal opinions)
I'm trying to use mcrypt_create_iv to generate random salts. When I test to see if the salt is generated by echo'ing it out, it checks out but it isn't the required length which I pass as a parameter to it (32), instead its less than that.
When I store it in my database table however, it shows up as something like this K??5P?M???4?o???"?0??
I'm sure it's something to do with the database, but I tried to change the collation of it to correspond with the config settings of CI, which is utf8_general_ci, but it doesn't solve the problem, instead it generates a much smaller salt.
Does anyone know of what may be wrong? Thanks for any feedback/help
The function mcrypt_create_iv() will return a binary string, containing \0 and other unreadable characters. Depending on how you want to use the salts, you first have to encode those byte strings, to an accepted alphabet. It is also possible to store binary strings in the database, but of course you will have a problem to display them.
Since salts are normally used for password storing, i would recommend to have a look at PHP's function password_hash(), it will generate a salt automatically and includes it in the resulting hash-value, so you don't need a separate database field for the salt.
I'm having an issue with validating chinese characters against other chinese characters, for example I'm creating a simple password script which gets data from a database, and gets the user input through get.
The issue I'm having is for some reason, even though the characters look exactly the same when you echo them out, my if statement still thinks they are different.
I have tried using the htmlentities() function to encode the characters, the password from the database encodes nicely, giving me a working '& #35441;' (I've put a space in it to stop it from converting to a chinese character!).
The other user input value gives me a load of funny characters. The only thing which I believe must be breaking it, is it encodes in a different way and therefore the php thinks it's 2 completely different strings.
Does anybody have any ideas?
Thanks in advance,
Will
Edit:
Thanks for the quick responses guys, I'm gonna look around setting the database encoding to UTF-8, however at the moment, the results from the database are not the problem, they are encoding correctly using htmlentities, it's the results I get from $_GET which is causing the problems.
Cheers,
Will
For passwords my advice is don't do a direct comparison, because that means you're storing passwords in the clear. At least run them through a hash like MD5 or SHA (preferably with a salt value as well) before storing them. Then you just have to compare the hash values, which are typically Hex values, so shouldn't cause any encoding problems.
For non-password values it sounds like your database and PHP are not on the same encoding, so they are not matching properly. If MySQL is storing them the way you want, have it do the comparison (instead of having it return the values first), that should avoid 1 of the passes through an encoding change which seems likely to be the problem.
If you want to store passwords, read this : what you need to know about secure password schemes.
After reading it, your root problem seem to be some character encoding missmatch between what you receive from the user and what you get from your database.
If you are using Mysql and utf-8 encoding, do you first use the SET names "utf-8" query ?
Saving the values using SHA1 and MD5 may solve your problem as the other stated it. It is also a secure process. Here's a code snippet to help out.
public function getHashedPassword()
{
$salt = 'mysalt';
return sprintf( "%d%s",$salt,sha1( sprintf( "%d%s", $salt,$this->_rawPassword) ));
}
Upon comparison, rehash the password input and compare it to the save hashed password in your database. Doing so may remove the encoding issue.
Since you anyway ought to store hashes of passwords rather than the passwords themselves, this might be a part of the solution. You store the hash rather than the password and thus have no problems with the database.
That said, there might be differences to how different browsers encode the strings they submit. It's not something I'm very much into, but you better make sure that you find a solution that makes the exact same string on all browsers. Setting the accept-charset to utf-8 is a nobrainer, you might also want to mess with the enctype.
The problem is you can't tell the user how many characters are allowed in the field because the escaped value has more characters than the unescaped one.
I see a few solutions, but none looks very good:
One whitelist for each field (too much work and doesn't quite solve the problem)
One blacklist for each field (same as above)
Use a field length that could hold the data even if all characters are escaped (bad)
Uncap the size for the database field (worse)
Save the data hex-unescaped and pass the responsibility entirely to output filtering (not very good)
Let the user guess the maximum size (worst)
Are there other options? Is there a "best practice" for this case?
Sample code:
$string = 'javascript:alert("hello!");';
echo strlen($string);
// outputs 27
$escaped_string = filter_var('javascript:alert("hello!");', FILTER_SANITIZE_ENCODED);
echo strlen($escaped_string);
// outputs 41
If the length of the database field is, say, 40, the escaped data will not fit.
Don't build your application around the database - build the database for the application!
Design how you want the interface to work for the user first, work out the longest acceptable field length, and use that.
In general, don't escape before storing in the database - store raw data in the database and format it for display.
If something is going to be output many times, then store the processed version.
Remember disk space is relatively cheap - don't waste effort trying to make your database compact.
making some wild assumptions about the context here:
if the field can hold 32 characters, that is 32 unescaped characters
let the user enter 32 characters
escape/unescape is not the user's problem
why is this an issue?
if this is form data-entry it won't matter, and
if you are for some reason escaping the data and passing it back then unescape it before storage
without further context, it looks like you are fighting a problem that doesn't really exist, or that doesn't need to exist
This is an interesting problem.
I think the solution will be a problem if you assign any responsibility to them because of the sanitization. If they are responsible for guessing the maximum length, then they may well give up and pick something else (and not understand why their input was invalid).
Here's my idea: make the database field 150% the size of the input. This extra size serves as "padding" for the space of the hex-sanitization, and the maximum size shown to the user and validator is the actual desired size. Thus if you check the input length before sanitization and it is below that 66% limit on the length your sanitized data should be good to go. If they exceed that extra 34% field space for the buffer, then the input probably should not be accepted.
The only trouble is that your database tables will be larger. If you want to avoid this, well, you could always escape only the SQL sensitive characters and handle everything else on output.
Edit: Given your example, I think you're escaping far too much. Either use a smaller range of sanitization with HTMLSpecialChars() on output, or make your database fields as much as 200% of their present size. That's just bloated if you ask me.
Why are you allowing users to type in escaped characters?
If you do need to allow explicitly escaped characters, then interpolate the escaped character before sanity-checking it
You should pretty much never do any significant work on any string if it is somehow still encoded. Decode it first, then do your work.
I find some people have a tendancy to use escaping functions like addSlashes() (or whatever it is in PHP) too early, or decode stuff (like removing HTML-entities) too late. Decode first, do your stuff, then apply any encoding you need to store/output/etc.