I'm having an issue with validating chinese characters against other chinese characters, for example I'm creating a simple password script which gets data from a database, and gets the user input through get.
The issue I'm having is for some reason, even though the characters look exactly the same when you echo them out, my if statement still thinks they are different.
I have tried using the htmlentities() function to encode the characters, the password from the database encodes nicely, giving me a working '& #35441;' (I've put a space in it to stop it from converting to a chinese character!).
The other user input value gives me a load of funny characters. The only thing which I believe must be breaking it, is it encodes in a different way and therefore the php thinks it's 2 completely different strings.
Does anybody have any ideas?
Thanks in advance,
Will
Edit:
Thanks for the quick responses guys, I'm gonna look around setting the database encoding to UTF-8, however at the moment, the results from the database are not the problem, they are encoding correctly using htmlentities, it's the results I get from $_GET which is causing the problems.
Cheers,
Will
For passwords my advice is don't do a direct comparison, because that means you're storing passwords in the clear. At least run them through a hash like MD5 or SHA (preferably with a salt value as well) before storing them. Then you just have to compare the hash values, which are typically Hex values, so shouldn't cause any encoding problems.
For non-password values it sounds like your database and PHP are not on the same encoding, so they are not matching properly. If MySQL is storing them the way you want, have it do the comparison (instead of having it return the values first), that should avoid 1 of the passes through an encoding change which seems likely to be the problem.
If you want to store passwords, read this : what you need to know about secure password schemes.
After reading it, your root problem seem to be some character encoding missmatch between what you receive from the user and what you get from your database.
If you are using Mysql and utf-8 encoding, do you first use the SET names "utf-8" query ?
Saving the values using SHA1 and MD5 may solve your problem as the other stated it. It is also a secure process. Here's a code snippet to help out.
public function getHashedPassword()
{
$salt = 'mysalt';
return sprintf( "%d%s",$salt,sha1( sprintf( "%d%s", $salt,$this->_rawPassword) ));
}
Upon comparison, rehash the password input and compare it to the save hashed password in your database. Doing so may remove the encoding issue.
Since you anyway ought to store hashes of passwords rather than the passwords themselves, this might be a part of the solution. You store the hash rather than the password and thus have no problems with the database.
That said, there might be differences to how different browsers encode the strings they submit. It's not something I'm very much into, but you better make sure that you find a solution that makes the exact same string on all browsers. Setting the accept-charset to utf-8 is a nobrainer, you might also want to mess with the enctype.
Related
This question is different from UTF-8 all the way through as it asks for how safe and is it a good practice to use the mb_convert_encoding function.
Lets say that a user can upload the files using the PHP API. Each filename and path gets stored in a PostgreSQL database table which has UTF-8 as default encoding.
Sometimes user uploads files which names aren't UTF-8 encoded and they get imported into the database. The problem is that the characters that are not UTF-8 encoded are scrambled and do not display as they should in the table columns.
I was thinking of adding the following to the PHP code before import:
if ( ! mb_check_encoding($output, 'UTF-8') {
$output = mb_convert_encoding($content, 'UTF-8');
}
Does this look like a good practice and will it be displayed and converted by the user's client correctly if I return UTF-8 as the output? Is there a potential loss to the bytes by using mb_convert_encoding?
Thanks
If you're going to convert an encoding, you need to know what you're converting from. You can check whether the encoding is or isn't valid UTF-8, but if it tells you it's not valid UTF-8 then you still have no clue what it is. Omitting the $from_encoding parameter from mb_convert_encoding just makes it assume some preset encoding for that parameter, but that doesn't mean that $content actually is in that encoding.
In other words: if you don't know what encoding a string is in, you cannot meaningfully convert it to anything else either, and just trying to convert it from ¯\_(ツ)_/¯ is a crapshoot with the result being equally likely to be something useful and utter garbage.
If you encounter unknown encodings, you only have a few choices:
Reject the input value.
Test whether it's one of a handful of other expected encodings and then explicitly convert from your best guess; but that is pretty much a crapshoot as well.
Just use bin2hex or something similar on the value, essentially giving up on trying to interpret it correctly, but still leaving some semblance to the original value.
I use UTF-8 across all my websites and so when I count string lengths I use mb_strlen. It occurred to me though someone could easily create a page in another encoding like iso-8859-1 and submit it to my script that is expecting it in UTF-8.
Are there any scenarios where this could cause serious (i.e. security) issues with a website? I'm guessing the worst that can happen is you get some characters that won't display correctly but thought I'd check if anything more sinister can happen.
I'm trying to use mcrypt_create_iv to generate random salts. When I test to see if the salt is generated by echo'ing it out, it checks out but it isn't the required length which I pass as a parameter to it (32), instead its less than that.
When I store it in my database table however, it shows up as something like this K??5P?M???4?o???"?0??
I'm sure it's something to do with the database, but I tried to change the collation of it to correspond with the config settings of CI, which is utf8_general_ci, but it doesn't solve the problem, instead it generates a much smaller salt.
Does anyone know of what may be wrong? Thanks for any feedback/help
The function mcrypt_create_iv() will return a binary string, containing \0 and other unreadable characters. Depending on how you want to use the salts, you first have to encode those byte strings, to an accepted alphabet. It is also possible to store binary strings in the database, but of course you will have a problem to display them.
Since salts are normally used for password storing, i would recommend to have a look at PHP's function password_hash(), it will generate a salt automatically and includes it in the resulting hash-value, so you don't need a separate database field for the salt.
This is a little out of the blue and it's mostly curiosity. I hope it's not a waste pf time and space.
I was writing a little script to validate accounts with a link so I decided to send an email with a link to the php script and in the link I would put two variables to get with the _GET array. A key and the email. Then I would just search the database with that email and key and change it's activated status to true... No prob. Easy enough even though it may not be very elegant..
I used a script for the generation of the key that I used elsewhere in the site for generating a new password (to reset it for instance) but sometimes it didn't work and after a lot of tries I noticed (and I felt stupid then) that the array my password generation function drew from was this:
'0123456789_!##$%&*()-=+abcdfghjkmnpqrstvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
So naturally I deleted the & character that is used for separating variables in the url... Then in another try I noticed that the link in the email was not recognized whole and stopped after the '#' character as well which I then remembered is used for references in an html so I deleted that as well. In the end I decided to leave only alphanumeric characters to be sure but I am curious; Are ther any more characters that are not 'valid' for url's using utilizing _GET and is there any way to use those characters anyway (maybe ulr encode or somwething)
There are plenty of characters that are invalid. Use urlencode to convert them to URL safe encodings. (Always run that function over any data you are inserting into a URL).
You have to use urlencode() before sending the values to $_GET.
You could use url_encode and url_decode but I would stay away from & # ? these are normal URL characters.
Also when it comes to passwords : dont stress about an algorithm, use sha1 crypt or something along those lines with a salt. These algorithms will be much stronger than your homemade ones.
When encoding newline of textarea before storing into mysql using PHP with rawurlencode function encodes newline as %0D%0A.
For Example:
textarea text entered by user:
a
b
encoding using rawurlencode and store into database will store value as a%0D%0Ab
When retrieving from database and decoding using rawurldecode does not work and code gives error. How to overcome this situation and what is the best way to store and retrieve and display textarea values.
can you first encode this textarea string using base64_encode and then perform a base64_decode on the same, if the above does not work for you.
If the textarea does not contain URLs, you should rather use base64_encode then rawurlencode and then store as normal.
You simply should not use rawurlencode for escaping data for your database.
Each target format has it's own escaping method which in general terms makes sure it is stored/display/transferred safely from one place to another, and it doesn't need decoding at the other end.
For instance:
displaying text in HTML, use htmlentities or htmlspecialchars
storing in database, use mysqli_real_escape_string, pg_escape_string, etc...
transferring variablename, use urlencode
transferring variablecontent, use rawurlencode
etc...
You should notice that decoding these things is often done by the browser/database. So no data is actually stored escaped. And decoding doesn't need te be done by your code.
The problem is probably because you escape a sequence with rawurlencode, but your database expected the escaped format for the specific brand of database. And de-escaped it using that assumption, which was wrong, which messed up your string.
Conclusion: find out what brand database you are using, look up the specific escape function for that database, and use the proper escaping function on all your content "transferral".
P.S.: some definition may not be correct, please comment on that. I wanted to make the idea stick but am probably not using all the right terms.
First of all it is very uncommon to run textarea through urlencode()
urlencode was not designed for this purpose.
Second, if you still want to do this, then maybe the problem comes from database. First you need to tell us what database you using and what TYPE you using for storing this data: do you store it as TEXT or as BINARY data? Have you setup the correct charset in database?