Goal: Find the most cryptographically secure random string generator. Using Alphabetic, numeric and if possible special characters in the string.
I have been reading on here and other places, but I still hear so many different answers/opinions. Can people who are up to date and knowledgeable about security and cryptography chime in here.
The following functions will be used to generate a 8 character random password and also generate a 128 character random token.
Function 1:
/**
* Used for generating a random string.
*
* #param int $_Length The lengtyh of the random string.
* #return string The random string.
*/
function gfRandomString($_Length) {
$alphabet = "abcdefghijklmnopqrstuwxyzABCDEFGHIJKLMNOPQRSTUWXYZ0123456789";
$pass = array(); //remember to declare $pass as an array
$alphaLength = strlen($alphabet) - 1; //put the length -1 in cache
for ($i = 0; $i < $_Length; $i++) {
$n = rand(0, $alphaLength);
$pass[] = $alphabet[$n];
}
return implode($pass); //turn the array into a string
}
Function 2:
The PHP.net docs say:
crypto_strong:
If passed into the function, this will hold a boolean value that determines if the algorithm used was "cryptographically strong", e.g., safe for usage with GPG, passwords, etc. TRUE if it did, otherwise FALSE.
So is that based on the the server? If I test it once, and it is able to generate a crypto_strong string, will it always be able to? Or would I need to check each time and create a loop until it generates a crypto_strong string.
/**
* Used for generating a random string.
*
* #param int $_Length The length of bits.
* #return string The random string.
*/
function gfSecureString($_Length) {
$Str = bin2hex(openssl_random_pseudo_bytes($_Length));
return $Str;
}
I welcome any suggestions to improve the cryptographic strength.
So you want to securely generate random strings in PHP. Neither of the two functions in the question will give you what you want, but the rand() solution is the worst of the two. rand() is not secure, while bin2hex(openssl_random_pseudo_bytes()) limits your output character set.
Also, openssl_random_pseudo_bytes() might not be reliable under extreme conditions or exotic setups.
From what I understand, crypto_strong will only be set to false if RAND_pseudo_bytes() fails to return any data. If OpenSSL is not seeded when it's invoked, it will silently return weak (and possibly predictable) pseudorandom bytes. You have no way, from PHP, to determine if it's random either.
How to generate secure random strings today
If you want a solution that has received substantial review for PHP 5.x, use RandomLib.
$factory = new RandomLib\Factory;
$generator = $factory->getMediumStrengthGenerator();
$randomPassword = $generator->generateString(20, $alphabet);
Alternative solutions
If you'd rather not use RandomLib (even if, purely, because you want to have alternative options available), you can also use random_int() when PHP 7 comes out. If you can't wait until then, take a look at our random_compat project.
If you happen to be using the cryptography library, libsodium, you can generate random numbers like so:
/**
* Depends on the PECL extension libsodium
*
* #link https://stackoverflow.com/a/31498051/2224584
*
* #param int $length How long should the string be?
* #param string $alphabet Contains all of the allowed characters
*
* #return string
*/
function sodium_random_str($length, $alphabet = 'abcdefghijklmnopqrstuvwxyz')
{
$buf = '';
$alphabetSize = strlen($alphabet);
for ($i = 0; $i < $length; ++$i) {
$buf .= $alphabet[\Sodium\randombytes_uniform($alphabetSize)];
}
return $buf;
}
See this answer for example code that uses random_int(). I'd rather not duplicate the effort of updating the code in the future, should the need ever arise.
openssl_random_pseudo_bytes has a pretty large chance of being a cryptographically secure generator, while rand certainly isn't. However, it will only return binary data which you revert to hexadecimals. Hexadecimals are not enough to generate a password string. Neither function includes special characters as you seem to require.
So neither one of the code snippets fits your purpose.
Related
Use case: the "I forgot my password" button. We can't find the user's original password because it's stored in hashed form, so the only thing to do is generate a new random password and e-mail it to him. This requires cryptographically unpredictable random numbers, for which mt_rand is not good enough, and in general we can't assume a hosting service will provide access to the operating system to install a cryptographic random number module etc. so I'm looking for a way to generate secure random numbers in PHP itself.
The solution I've come up with so far involves storing an initial seed, then for each call,
result = seed
seed = sha512(seed . mt_rand())
This is based on the security of the sha512 hash function (the mt_rand call is just to make life a little more difficult for an adversary who obtains a copy of the database).
Am I missing something, or are there better known solutions?
I strongly recommend targeting /dev/urandom on unix systems or the crypto-api on the windows platform as an entropy source for passwords.
I can't stress enough the importance of realizing hashes are NOT magical entropy increasing devices. Misusing them in this manner is no more secure than using the seed and rand() data before it had been hashed and I'm sure you recognize that is not a good idea. The seed cancels out (deterministic mt_rand()) and so there is no point at all in even including it.
People think they are being smart and clever and the result of their labor are fragile systems and devices which put the security of their systems and the security of other systems (via poor advice) in unecessary jeopardy.
Two wrongs don't make a right. A system is only as strong as its weakest part. This is not a license or excuse to accept making even more of it insecure.
Here is some PHP code to obtain a secure random 128-bit string, from this comment at php.net by Mark Seecof:
"If you need some pseudorandom bits for security or cryptographic purposes (e.g.g., random IV for block cipher, random salt for password hash) mt_rand() is a poor source. On most Unix/Linux and/or MS-Windows platforms you can get a better grade of pseudorandom bits from the OS or system library, like this:
<?php
// get 128 pseudorandom bits in a string of 16 bytes
$pr_bits = '';
// Unix/Linux platform?
$fp = #fopen('/dev/urandom','rb');
if ($fp !== FALSE) {
$pr_bits .= #fread($fp,16);
#fclose($fp);
}
// MS-Windows platform?
if (#class_exists('COM')) {
// http://msdn.microsoft.com/en-us/library/aa388176(VS.85).aspx
try {
$CAPI_Util = new COM('CAPICOM.Utilities.1');
$pr_bits .= $CAPI_Util->GetRandom(16,0);
// if we ask for binary data PHP munges it, so we
// request base64 return value. We squeeze out the
// redundancy and useless ==CRLF by hashing...
if ($pr_bits) { $pr_bits = md5($pr_bits,TRUE); }
} catch (Exception $ex) {
// echo 'Exception: ' . $ex->getMessage();
}
}
if (strlen($pr_bits) < 16) {
// do something to warn system owner that
// pseudorandom generator is missing
}
?>
NB: it is generally safe to leave both the attempt to read /dev/urandom and the attempt to access CAPICOM in your code, though each will fail silently on the other's platform. Leave them both there so your code will be more portable."
You can also consider using OpenSSL openssl_random_pseudo_bytes, it's available since PHP 5.3.
string openssl_random_pseudo_bytes ( int $length [, bool &$crypto_strong ] )
Generates a string of pseudo-random bytes, with the number of bytes determined by the length parameter.
It also indicates if a cryptographically strong algorithm was used to produce the pseudo-random bytes, and does this via the optional crypto_strong parameter. It's rare for this to be FALSE, but some systems may be broken or old.
http://www.php.net/manual/en/function.openssl-random-pseudo-bytes.php
Since PHP 7 there is also random_bytes function available
string random_bytes ( int $length )
http://php.net/manual/en/function.random-bytes.php
PHP ships with a new set of CSPRNG functions (random_bytes() and random_int()). It's trivial to turn the latter function into a string generator function:
<?php
/**
* Generate a random string, using a cryptographically secure
* pseudorandom number generator (random_int)
*
* For PHP 7, random_int is a PHP core function
* For PHP 5.x, depends on https://github.com/paragonie/random_compat
*
* #param int $length How many characters do we want?
* #param string $keyspace A string of all possible characters
* to select from
* #return string
*/
function random_str(
$length,
$keyspace = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
) {
$str = '';
$max = mb_strlen($keyspace, '8bit') - 1;
if ($max < 1) {
throw new Exception('$keyspace must be at least two characters long');
}
for ($i = 0; $i < $length; ++$i) {
$str .= $keyspace[random_int(0, $max)];
}
return $str;
}
If you need to use this in a PHP 5 project, feel free to grab a copy of random_compat, which is a polyfill for these functions.
How about something like
<?
$length = 100;
$random = substr(hash('sha512', openssl_random_pseudo_bytes(128)), 0, $length);
I just noticed its about numbers, so heres the solution for numbers:
<?
$max = 1000;
$random = (unpack('n', openssl_random_pseudo_bytes(2))[1] * time()) % $max;
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I wrote a function in PHP to generate a random password (only 0-9a-zA-Z) for my app. The resulting password must be cryptography secure, and as random as possible. I.E. the passwords are sensitive.
The big trick I do is shuffle $possible characters everytime, so even if mt_rand() is not truely random, it should not be predictable.
Any recommended changes or security issues in my function? Is using openssl_random_pseudo_bytes() instead of mt_rand() really going to make the algorithm stronger and more secure?
public function generate_random($length = 15) {
$random = "";
$possible = "1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
$possible = str_shuffle($possible);
$maxlength = strlen($possible);
if ($length > $maxlength) {
$length = $maxlength;
}
$i = 0;
while ($i < $length) {
$random .= substr($possible, mt_rand(0, $maxlength-1), 1);
$i++;
}
return $random;
}
Thanks.
To generate something really random, you have to use the random source of the operating system. After reading from this source, you need to encode the bytes to an alphabet of your choice.
An easy conversion is base64 encoding, but this will include '+' and '/' characters. To only get characters from the alphabet and digits, you need a base62 encoding, or you can simply replace those characters with other characters.
/**
* Generates a random string of a given length, using the random source of
* the operating system. The string contains only characters of this
* alphabet: +/0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
* #param int $length Number of characters the string should have.
* #return string A random base64 encoded string.
*/
function generateRandomBase64String($length)
{
if (!defined('MCRYPT_DEV_URANDOM')) die('The MCRYPT_DEV_URANDOM source is required (PHP 5.3).');
// Generate random bytes, using the operating system's random source.
// Since PHP 5.3 this also uses the random source on a Windows server.
// Unlike /dev/random, the /dev/urandom does not block the server, if
// there is not enough entropy available.
$binaryLength = (int)($length * 3 / 4 + 1);
$randomBinaryString = mcrypt_create_iv($binaryLength, MCRYPT_DEV_URANDOM);
$randomBase64String = base64_encode($randomBinaryString);
return substr($randomBase64String, 0, $length);
}
The code is part of this class, have a look at the function generateRandomBase62String() for a complete example.
Adding pseudo-randomness to a pseudo-random string won't increase the entropy at all. The only way is to use a better random number generator.
Possible duplicate: Secure random number generation in PHP
If by cryptographically secure, you mean you intend to use the password as a key somewhere, it is important to realize that your space isn't nearly large enough. 15 characters with 62 possibilities each is less than 90 bits, which is about as strong as RSA-1024, and is considered unsafe today.
You should, however, not be doing such a thing in the first place. If you do require a human-readable string that maps to something that can be used as a cryptographic key, use something like PBKDF2.
Lastly, shuffling the string does not increase effective randomness. As long as you do not use it directly as a key, your function is fine - remember to first check the output against a dictionary of common passwords
(like a password list from a password cracker) and reject those.
This is not an answer of your question but it seems feasible to use this function to generate random random password containing only (0-9a-z,A-Z)
$password = base64_encode(openssl_random_pseudo_bytes(20, $strong));
$newstr = preg_replace('/[^a-zA-Z0-9\']/', '', $password);
echo $newstr;
I need a big (like, say, 128-bit big) random number generator in PHP. I was thinking in storing this number in a string as hexadecimal.
Note that this is meant for a login system that mentioned the need for a "random" number, so I'm guessing I really need it to be "random-enough" (because I know pseudo-random is never truly random).
The algorithm I was thinking was generating the number one hex digit at a time, then concatenating it all. Like this:
$random = '';
for ($i = 0; $i < 32; ++$i) {
$digit = rand(0, 15);
$random .= ($digit < 10 ? $digit : ($digit - 10 + 'a'));
}
return $random;
Can I trust this function to return good pseudo-random numbers or am I messing with something I really shouldn't?
Try:
for ($str = '', $i = 0; $i < $len; $i++) {
$str .= dechex(mt_rand(0, 15));
}
I asked this question several years ago and, since then, my knowledge of this topic has improved.
First of all, I mentioned I wanted random numbers for a login system. Login systems are security mechanisms.
This means that any random number generators that the login system relies on should be cryptographically secure.
PHP's rand and mt_rand are not cryptographically secure.
In these cases, it's best to be safe than sorry. There are random number generators designed specifically to be secure, notably openssl_random_pseudo_bytes (which is unfortunately not always available -- you must enable the OpenSSL extension for it to work). On *NIX systems (such as Linux), bytes read from /dev/urandom can be used as well.
Unfortunately (for the purposes of this question), both of these approaches return binary data instead of hexadecimal. Fortunately, PHP already has a function to fix this for us, bin2hex, which works for strings of any length.
So here's how the code would look like:
function generate_secure_random_hex_string($length) {
// $length should be an even, non-negative number.
// Because each byte is represented as two hex digits, we'll need the binary
// string to be half as long as the hex string.
$binary_length = $length / 2;
// First, we'll generate the random binary string.
$random_result = openssl_random_pseudo_bytes($binary_length, $cstrong);
if (!$cstrong) {
// The result is not cryptographically secure. Abort.
// die() is just a placeholder.
// There might be better ways to handle this error.
die();
}
//Convert the result to hexadecimal
return bin2hex($random_result);
}
// Example:
echo generate_secure_random_hex_string(32);
I've often seen this handled in login systems by just doing something like:
$salt = "big string of random stuff"; // you can generate this once like above
$token = md5( $salt . time()); // this will be your "unique" number
MD5 hashes can have collisions, but this is pretty effective and very simple.
As of PHP 5.3:
function getRandomHex($num_bytes=4) {
return bin2hex(openssl_random_pseudo_bytes($num_bytes));
}
For your example of 128 bits:
$rand128 = getRandomHex(16);
I want to create a token generator that generates tokens that cannot be guessed by the user and that are still unique (to be used for password resets and confirmation codes).
I often see this code; does it make sense?
md5(uniqid(rand(), true));
According to a comment uniqid($prefix, $moreEntopy = true) yields
first 8 hex chars = Unixtime, last 5 hex chars = microseconds.
I don't know how the $prefix-parameter is handled..
So if you don't set the $moreEntopy flag to true, it gives a predictable outcome.
QUESTION: But if we use uniqid with $moreEntopy, what does hashing it with md5 buy us? Is it better than:
md5(mt_rand())
edit1: I will store this token in an database column with a unique index, so I will detect columns. Might be of interest/
rand() is a security hazard and should never be used to generate a security token: rand() vs mt_rand() (Look at the "static" like images). But neither of these methods of generating random numbers is cryptographically secure. To generate secure secerts an application will needs to access a CSPRNG provided by the platform, operating system or hardware module.
In a web application a good source for secure secrets is non-blocking access to an entropy pool such as /dev/urandom. As of PHP 5.3, PHP applications can use openssl_random_pseudo_bytes(), and the Openssl library will choose the best entropy source based on your operating system, under Linux this means the application will use /dev/urandom. This code snip from Scott is pretty good:
function crypto_rand_secure($min, $max) {
$range = $max - $min;
if ($range < 0) return $min; // not so random...
$log = log($range, 2);
$bytes = (int) ($log / 8) + 1; // length in bytes
$bits = (int) $log + 1; // length in bits
$filter = (int) (1 << $bits) - 1; // set all lower bits to 1
do {
$rnd = hexdec(bin2hex(openssl_random_pseudo_bytes($bytes)));
$rnd = $rnd & $filter; // discard irrelevant bits
} while ($rnd >= $range);
return $min + $rnd;
}
function getToken($length=32){
$token = "";
$codeAlphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
$codeAlphabet.= "abcdefghijklmnopqrstuvwxyz";
$codeAlphabet.= "0123456789";
for($i=0;$i<$length;$i++){
$token .= $codeAlphabet[crypto_rand_secure(0,strlen($codeAlphabet))];
}
return $token;
}
This is a copy of another question I found that was asked a few months before this one. Here is a link to the question and my answer: https://stackoverflow.com/a/13733588/1698153.
I do not agree with the accepted answer. According to PHPs own website "[uniqid] does not generate cryptographically secure tokens, in fact without being passed any additional parameters the return value is little different from microtime(). If you need to generate cryptographically secure tokens use openssl_random_pseudo_bytes()."
I do not think the answer could be clearer than this, uniqid is not secure.
I know the question is old, but it shows up in Google, so...
As others said, rand(), mt_rand() or uniqid() will not guarantee you uniqueness... even openssl_random_pseudo_bytes() should not be used, since it uses deprecated features of OpenSSL.
What you should use to generate random hash (same as md5) is random_bytes() (introduced in PHP7). To generate hash with same length as MD5:
bin2hex(random_bytes(16));
If you are using PHP 5.x you can get this function by including random_compat library.
Define "unique". If you mean that two tokens cannot have the same value, then hashing isn't enough - it should be backed with a uniqueness test. The fact that you supply the hash algorithm with unique inputs does not guarantee unique outputs.
To answer your question, the problem is you can't have a generator that is guaranteed random and unique as random by itself, i.e., md5(mt_rand()) can lead to duplicates. What you want is "random appearing" unique values. uniqid gives the unique id, rand() affixes a random number making it even harder to guess, md5 masks the result to make it yet even harder to guess. Nothing is unguessable. We just need to make it so hard that they wouldn't even want to try.
I ran into an interesting idea a couple of years ago.
Storing two hash values in the datebase, one generated with md5($a) and the other with sha($a). Then chek if both the values are corect. Point is, if the attacker broke your md5(), he cannot break your md5 AND sha in the near future.
Problem is: how can that concept be used with the token generating needed for your problem?
First, the scope of this kind of procedure is to create a key/hash/code, that will be unique for one given database. It is impossible to create something unique for the whole world at a given moment.
That being said, you should create a plain, visible string, using a custom alphabet, and checking the created code against your database (table).
If that string is unique, then you apply a md5() to it and that can't be guessed by anyone or any script.
I know that if you dig deep into the theory of cryptographic generation you can find a lot of explanation about this kind of code generation, but when you put it to real usage it's really not that complicated.
Here's the code I use to generate a simple 10 digit unique code.
$alphabet = "aA1!bB2#cC3#dD5%eE6^fF7&gG8*hH9(iI0)jJ4-kK=+lL[mM]nN{oO}pP\qQ/rR,sS.tT?uUvV>xX~yY|zZ`wW$";
$code = '';
$alplhaLenght = strlen($alphabet )-1;
for ($i = 1; $i <= 10; $i++) {
$n = rand(1, $alplhaLenght );
$code .= $alphabet [$n];
}
And here are some generated codes, although you can run it yourself to see it work:
SpQ0T0tyO%
Uwn[MU][.
D|[ROt+Cd#
O6I|w38TRe
Of course, there can be a lot of "improvements" that can be applied to it, to make it more "complicated", but if you apply a md5() to this, it'll become, let's say "unguessable" . :)
MD5 is a decent algorithm for producing data dependent IDs. But in case you have more than one item which has the same bitstream (content), you will be producing two similar MD5 "ids".
So if you are just applying it to a rand() function, which is guaranteed not to create the same number twice, you are quite safe.
But for a stronger distribution of keys, I'd personally use SHA1 or SHAx etc'... but you will still have the problem of similar data leads to similar keys.
I have some strings that have been encrypted using the PHP function crypt().
The outputs look something like this:
$1$Vf/.4.1.$CgCo33ebiHVuFhpwS.kMI0
$1$84..vD4.$Ps1PdaLWRoaiWDKCfjLyV1
$1$or1.RY4.$v3xo04v1yfB7JxDj1sC/J/
While I believe crypt() is using the MD5 algorithm, the outputs are not valid MD5 hashes.
Is there a way of converting the produced hashes into valid MD5 hashes (16-byte hex values)?
Update:
Thanks for the replies so answers so far. I'm pretty sure the crypt function used is using some sort of MD5 algorithm. What I'm looking to do is convert the ouput that I have into an MD5 hash that looks something like the following:
9e107d9d372bb6826bd81d3542a419d6
e4d909c290d0fb1ca068ffaddf22cbd0
d41d8cd98f00b204e9800998ecf8427e
(taken from Wikipedia)
Is there a way of converting from the hashes I have to ones like the above?
OK, so maybe this answer is a year late, but I'll give it a shot. In your own answer, you note that crypt() is using the FreeBSD MD5, which also does some interesting transformations on the salt before running the hash, so the result of what I'm about to give you will never quite match up with the results of a call to md5(). That said, the only difference between the output you are seeing and the format you are used to is that the output you are seeing is encoded as follows
$1$ # this indicates that it is MD5
Vf/.4.1. # these eight characters are the significant portion of the salt
$ # this character is technically part of the salt, but it is ignored
CgCo33eb # the last 22 characters are the actual hash
iHVuFhpw # they are base64 encoded (to be printable) using crypt's alphabet
S.kMI0 # floor(22 * 6 / 8) = 16 (the length in bytes of a raw MD5 hash)
To my knowledge, the alphabet used by crypt looks like this:
./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
So, with all of this borne in mind, here is how you can convert the 22 character crypt-base64 hash into a 32 character base16 (hexadecimal) hash:
First, you need something to convert the base64 (with custom alphabet) into a raw 16-byte MD5 hash.
define('CRYPT_ALPHA','./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz');
/**
* Decodes a base64 string based on the alphabet set in constant CRYPT_ALPHA
* Uses string functions rather than binary transformations, because said
* transformations aren't really much faster in PHP
* #params string $str The string to decode
* #return string The raw output, which may include unprintable characters
*/
function base64_decode_ex($str) {
// set up the array to feed numerical data using characters as keys
$alpha = array_flip(str_split(CRYPT_ALPHA));
// split the input into single-character (6 bit) chunks
$bitArray = str_split($str);
$decodedStr = '';
foreach ($bitArray as &$bits) {
if ($bits == '$') { // $ indicates the end of the string, to stop processing here
break;
}
if (!isset($alpha[$bits])) { // if we encounter a character not in the alphabet
return false; // then break execution, the string is invalid
}
// decbin will only return significant digits, so use sprintf to pad to 6 bits
$decodedStr .= sprintf('%06s', decbin($alpha[$bits]));
}
// there can be up to 6 unused bits at the end of a string, so discard them
$decodedStr = substr($decodedStr, 0, strlen($decodedStr) - (strlen($decodedStr) % 8));
$byteArray = str_split($decodedStr, 8);
foreach ($byteArray as &$byte) {
$byte = chr(bindec($byte));
}
return join($byteArray);
}
Now that you've got the raw data, you'll need a method to convert it to the base-16 format you're expecting, which couldn't be easier.
/**
* Takes an input in base 256 and encodes it to base 16 using the Hex alphabet
* This function will not be commented. For more info:
* #see http://php.net/str-split
* #see http://php.net/sprintf
*
* #param string $str The value to convert
* #return string The base 16 rendering
*/
function base16_encode($str) {
$byteArray = str_split($str);
foreach ($byteArray as &$byte) {
$byte = sprintf('%02x', ord($byte));
}
return join($byteArray);
}
Finally, since the output of crypt includes a lot of data we don't need (and, in fact, cannot use) for this process, a short and sweet function to not only tie these two together but to allow for direct input of output from crypt.
/**
* Takes a 22 byte crypt-base-64 hash and converts it to base 16
* If the input is longer than 22 chars (e.g., the entire output of crypt()),
* then this function will strip all but the last 22. Fails if under 22 chars
*
* #param string $hash The hash to convert
* #param string The equivalent base16 hash (therefore a number)
*/
function md5_b64tob16($hash) {
if (strlen($hash) < 22) {
return false;
}
if (strlen($hash) > 22) {
$hash = substr($hash,-22);
}
return base16_encode(base64_decode_ex($hash));
}
Given these functions, the base16 representation of your three examples are:
3ac3b4145aa7b9387a46dd7c780c1850
6f80dba665e27749ae88f58eaef5fe84
ec5f74086ec3fab34957d3ef0f838154
Of course, it is important to remember that they were always valid, just formatted differently.
$1$ indeed means that this is a MD5 hash, but crypt generates a random salt. This is why you find a different MD5 value. If you include the generated salt you will find the same result.
The salt is base64 encoded in the output, as the hash.
The algorithm used is a system wide parameter. Generally this is MD5, you are right.
I believe the answer to my original question is no, you can't convert from one format to the other.
The hashes generated by php crypt() appear to be generate by a version of the FreeBSD MD5 hash implementation created by Poul-Henning Kamp.
http://people.freebsd.org/~phk/
From the documentation, this depends on the system. You can force the algorithm used by setting the salt parameter. From the docs:
The encryption type is triggered by
the salt argument. At install time,
PHP determines the capabilities of the
crypt function and will accept salts
for other encryption types. If no salt
is provided, PHP will auto-generate a
standard two character salt by
default, unless the default encryption
type on the system is MD5, in which
case a random MD5-compatible salt is
generated.
From http://php.net/crypt:
crypt() will return an encrypted string using the standard Unix DES-based encryption algorithm or alternative algorithms that may be available on the system.
You want the md5() function:
Calculates the MD5 hash of str using the ยป RSA Data Security, Inc. MD5 Message-Digest Algorithm, and returns that hash.
If the optional raw_output is set to TRUE, then the md5 digest is instead returned in raw binary format with a length of 16.
Defaults to FALSE.