I know about random_bytes() in PHP 7, and I want to use it for generating a cryptographically secure (e.g. hard to guess) random string for use as a one-time token or for longer term storage in a cookie.
Unfortunately, I don't know how to convert the output of random_bytes() to a string consisting only of human readable characters, so browsers don't get confused. I know about bin2hex(), but I'd prefer to use the full ASCII-range instead of hex numbers, for the sake of more bits per length.
Any ideas?
Unfortunately Peter O. deleted his answer after receiving negative attention in a review queue, perhaps because he phrased it as a question. I believe it is legitimate answer so I will reprise it.
One easy solution is to encode your random data into the base64 alphabet using base64_encode(). This will not produce the "full ASCII-range" as you have requested but it will give you most of it. An even larger ASCII range is output by a suitable base85 encoder, but php does not have a built-in one. You can probably find plenty of open-source base85 encoders for php though. In my opinion the decrease in length of base85 over base64 is unlikely to be worth the extra code you have to maintain.
I personally just use a GUID library and concatenate a couple of GUIDs to get a long unique token string. You also have the option to remove the dashes to keep it difficult to know the source and if you want to make it even more complex you can randomly cut back the string by up to 10 char to add complexity to its unknown length.
I use this library for generating my GUIDs
https://packagist.org/packages/ramsey/uuid
use Ramsey\Uuid\Uuid;
$token = Uuid::uuid4() . '-' . Uuid::uuid4();
Sorry, I overlooked the part about you wanting to use the full scope of 26 alpha char with numeric... Not sure I have an answer for you in this respect but you should have faith in the difficulty of guessing a UUID4, especially when you add a couple together and obfuscate the length by a factor of 10 to make guessing more complex.
Actually, if you could safely generate an array of random numbers in the range of valid ascii char codes then you could convert the entire random array of codes into the respective ascii char and implode them together as a single string.
function randomAsciiString($length) {
return implode('', array_map(
function($value) {
return chr($value);
},
array_map(
function($value) {
return random_int(33, 126);
},
array_fill(0, $length - 1, null)
)
));
}
echo randomAsciiString(128); // Normal 128 char string
echo randomAsciiString(random_int(118, 128)); // obfuscated length char string for extra complexity.
of course though... you should be mindful that you're using all the standard keys on the keyboard and some of those characters are going to upset things that are sensitive ( eg quotes etc.. )
Let's consider the letters to be used. For the sake of simplicity I will assume that you intend only big and small English letters to be used. This means that you have 26 big letters and 26 small letters, 52 different possible values. If we view a byte array of n elements as a number of n digits in base 256 and we convert this number into a base 52 number, where A is 0, B is 1, C is 2, ..., a is 26, ..., z is 51, then converting these digits into the corresponding letters will yield the text you wanted.
Related
Consider:
$tag = "4F";
$tag is a string containing two characters, '4' and 'F'. I want to be able to treat these as the upper and lower nibbles respectively of a whole byte (4F) so that I can go on to compute the bit-patterns (01001111)
As these are technically characters, they can be treated in their own right as a byte each - 4 on the ASCII table is 0x52 and F is 0x70.
Pretty much all the PHP built-in functions that allow manipulation of bytes (that I've seen so far) are variations on the latter description: '4' is 0x52, and not the upper nibble of a byte.
I don't know of any quick or built-in way to get PHP to handle this the way I want, but it feels like it should be there.
How do I convert a string "4F" to the byte 4F, or treat each char as a nibble in a nibble-pair. Are there any built in functions to get PHP to handle a string like "4F" or "3F0E" as pairs of nibbles?
Thanks.
If you're wanting "the decimal representation of a hex digit", hexdec is one way to go.
If you're wanting "bit pattern for hex digit", then use base_convert. The docs even show an example of this maneuver:
Example #1 base_convert() example
$hexadecimal = 'a37334';
echo base_convert($hexadecimal, 16, 2);
The above example will output:
101000110111001100110100
I would like to prepare simple regular expression for php's uniqid. I checked uniqid manual looking for set of chars used as return value. But the documentation only mention that:
#return string the unique identifier, as a string.
And
With an empty prefix, the returned string will be 13 characters long. If more_entropy is true, it will be 23 characters.
I would like to know what characters can I expect in the return value. Is it a hex string? How to know for sure? Where to find something more about the uniqid function?
The documentation doesn't specify the string contents; only its length. Generally, you shouldn't depend on it. If you print the value between a pair of delimiters, like quotation marks, you could use them in the regular expression:
"([^"]+)" ($1 contains the value)
As long as you develop for a particular PHP version, you can inspect its implementation and assume, that it doesn't change. If you upgrade, you should check, if the assumption is still valid.
A comment in uniqid documentation describes, that it is essentially a hexadecimal number with an optional numeric suffix:
if (more_entropy) {
uniqid = strpprintf(0, "%s%08x%05x%.8F", prefix, sec, usec, php_combined_lcg() * 10);
} else {
uniqid = strpprintf(0, "%s%08x%05x", prefix, sec, usec);
}
Which gives you two possible output formats:
uniqid() - 13 characters, hexadecimal number
uniqid('', true) - 14 - 23 characters, hexadecimal number with floating number suffix
computed elsewhere
If you use other delimiters than alphanumeric characters and dot, you could use one of these simple regular expressions to grab the value in either of the two formats:
[0-9a-f]+
[.0-9a-f]+
If you need 100% format guarantee for any PHP version, you could write your own function based on sprintf.
I admit, that it is unlikely, that the uniqid would significantly change; I would expect creating other extensions to provide different formats. Another comment in uniqid documentation shows a RFC 4211 compliant UUID implementation. There was also a discussion on stackoverflow about it.
I found this on the php site: http://www.php.net/manual/en/function.uniqid.php#95001
If this is to be believed then the 13 character version is entirely hex.
However the 23 character version has:
14 characters (hex)
then a dot
then another 8 characters (decimal)
If you need to be entirely sure, you can verify this yourself: http://sandbox.onlinephpfunctions.com/code/c04c7854b764faee2548180eddb8c23288dcb5f7
I have an image upload form. After user submitted the form, my script will process the image and clean the image filename (im appending a unique number series at the end of the filename to prevent possible duplicate filename.
Often Im receiving filenames (after processing) such as
"c-id-1333-l-id-1298491-aid-3951-id-13995346097186883-im-193-1.jpg"
How can I preg_replace the numbers if its more than 5 digits, if less than 5 digit or less it will be retain. The above example should give "c-id-1333-l-id--aid-3951-id--im-193-1.jpg" (Dont mind the multiple consecutive dash[-], my script can handle this.
You can use the following to do this.
$str = 'c-id-1333-l-id-1298491-aid-3951-id-13995346097186883-im-193-1.jpg';
$str = preg_replace('/\d{5,}/', '', $str);
var_dump($str);
Explanation:
\d{5,} # digits (0-9) (at least 5 times)
Output:
string(41) "c-id-1333-l-id--aid-3951-id--im-193-1.jpg"
If you want to retain 5 digits or less than 5 then you can use \d{6,} instead.
Is the user supplied name important?
If it is not, one technique i like to do to normalize file names in that case is to simply hash them with something like sha1 or even md5. Then add your timestamp, and ids or what not to that, this takes care of a lot of issues with special characters such as ".", "\" and "/" ( dot dot dash and directory traversal ) in the file names.
#hwnd's regular expression should do the trick, but I though I would throw this out there.
so with hashing you'd get cleaner ( but less meaningful ) names like this
da39a3ee5e6b4b0d3255bfef95601890afd80709.jpg
then you can add your unique numbers on
da39a3ee5e6b4b0d3255bfef95601890afd80709-1234568764564558.jpg
you could even salt the filename with the timestamp first and then hash them to get filenames all 40 characters long, and the chance of hash collisions is very minimal unless your dealing with 10's of thousands of files, in which case just up the hashing to sha256 etc.
I want to generate random alphanumeric strings in PHP. They will be used in places where the strength of random numbers is important (publicly visible IDs in URLs and the like).
As I understand, in PHP the main source of cryptographically strong randomness is openssl_random_pseudo_bytes(). This however returns an array of bytes, not alphanumeric characters.
To convert them to alphanumerics I could either hash them (which would produce a longer-than-necessary string of a limited set of hex characters), or base64_encode() them (which would produce a string with +, / and = in it - not alphanumerics).
So I think that instead I could use the random bytes as a source of entropy and generated my own string consisting only of the characters 0-9a-zA-Z.
The problem then becomes - how to translate from 256 distinct values (one byte of input) to 62 distinct value (one character of output). And in a way, that all 62 characters are equally as likely. (Otherwise there will be 8 characters that appear more often than the rest).
Or perhaps I should use another approach entirely? I would like my string to be as short as possible (say, 20 characters or so - shorter URLs are better) and consist only of alphanumeric characters (so that it doesn't need to be specially escaped anywhere).
You can implement your own base64 encoding, sort of. If you can allow two specific symbols - these can be anything, for example . and -, it doesn't really matter. It can even be a space for one of them. In any case, what you would do is this:
$alphabet = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ.-";
// using . and - for the two symbols here
$input = [123,193,21,13]; // whatever your input it, I'm assuming an array of bytes
$output = "";
foreach($input as $byte) {
$output .= $alphabet[$byte%64];
}
Assuming random input, all characters have equal probability of appearing.
That being said, if you can't allow anything except pure alphanumeric, cut the symbols from the $alphabet and use %62 instead of %64. While this does mean you have a small bias towards the chracters 0 through 7, I don't think it's significant enough to worry about.
I found this function on php.net in the user comments.
function crypto_rand($min,$max) {
$range = $max - $min;
if ($range == 0) return $min; // not so random...
$length = (int) (log($range,2) / 8) + 1;
return $min + (hexdec(bin2hex(openssl_random_pseudo_bytes($length,$s))) % $range);
}
Then do something like
for($i=0; $i<20; $i++)
{
$string.= chr(crypto_rand(1,26)+96); //or +64 for upper case
}
Or similar.
note: THIS IS WRONG! I leave this attempted answer for reference only.
(31 * 256) % 62 = 0
For each output alphanumeric character, generate 31 random values. Sum these 31 values and take the modulo 62.
Kind of brutal, but this is the only "mathematicaly correct" option I can think of :)
Basically, I'm looking for a function to perform the following
generateToken(128)
which will return a 128-bit string consisting of integers or alphabet characters.
Clarification: From the comments, I had to change the question. Apparently, I am looking for a string that is 16 characters long if it needs to be 128 bits.
Is there a reason you must restrict the string to integers? That actually makes the problem a lot harder because each digit gives you 3.3 bits (because 2^3.3 ~= 10). It's tricky to generate exactly 128 bits of token in this manner.
Much easier is to allow hexadecimal encoding (4 bits per character). You can then generate 128 genuine random bits, then encode them in hex for use in your application. Base64 encoding (6 bits per character) is also useful for this kind of thing.
openssl_random_pseudo_bytes will give you a string of random bytes that you can use bin2hex to encode, otherwise you can use mt_rand in your own token-generation routine.
EDIT: After reading the updates to the question it seems that you want to generate a token that represents 128 bits of data and the actual string length (in characters) is not so important. If I guess your intention correctly (that this is a unique ID, possibly for identification/authentication purposes) then I'd suggest you use openssl_random_pseudo_bytes to generate the right number of bits for your problem, in this case 128 (16 bytes). You can then encode those bits in any way you see fit: hex and base64 are two possibilities.
Note that hex encoding will use 32 characters to encode 128 bits of data since each character only encodes 4 bits (128 / 4 = 32). Base64 will use 22 characters (128 / 6 = 21.3). Each character takes up 8 bits of storage but only encodes 4 or 6 bits of information.
Be very careful not to confuse encoded string length with raw data length. If you choose a 16-character string using alphanumeric characters (a-z, A-Z, 0-9) then you only get 6 bits of information per character (log base 2 of 62 is nearly 6), so your 16-character string will only encode 96 bits of information. You should think of your token as an opaque byte array and only worry about turning it into / from a character string when you actually try to send it over the wire or put it in a cookie or whatever.
As of PHP 5.3:
$rand128 = bin2hex(openssl_random_pseudo_bytes(16));
What is your purpose?
If you just want a unique id, then use uniqid:
http://www.php.net/manual/en/function.uniqid.php
Its not random, its essentially a hex string based on microtime. If you do uniqid('', true), then it will return a hex string based on microtime as well as tack on a bunch of random numbers on the end of the id (so even if two calls come in on the same microsecond, it is unlikely that they'll share a unique id).
If you need a 16-character string exactly, then what purpose? Are you salting passwords? How random should the string be? All in all, you can always just do:
$toShow = array();
for($i = 0; $i<16; $i++){
$toShow[] = chr(mt_rand(ord('a'), ord('z')));
}
return $toShow
Now this creates a string of characters that are between 'a' and 'z'. You can change "ord('a')" to 0, and "ord('z')" to 255 to get a fully random binary string... or any other range you need.