Match an unique string to a shorter unique string - php

Let say I have to manage 1,000,000 phone numbers which are in 12-digit format. Certainly, they are distinguish. Now I want to assign to each number a shorter string (7 alphanumeric - case sensitive characters) that must be also distinguish. What would be the best solution using Php?

You can use PHP's base_convert() function to change integers into strings.
From integer to a-z0-9 base 36 string: $shortString = base_convert($phoneNumber, 10, 36);
From base 36 string to integer: $phoneNumber = base_convert($shortString, 36, 10);
If that's not short enough and you want to use the full gamut of a-zA-Z0-9 characters, you'll need to use a custom function to convert to base 62. There are some great ones at http://php.net/base_convert.

Related

Cryptographically secure random ASCII-string in PHP

I know about random_bytes() in PHP 7, and I want to use it for generating a cryptographically secure (e.g. hard to guess) random string for use as a one-time token or for longer term storage in a cookie.
Unfortunately, I don't know how to convert the output of random_bytes() to a string consisting only of human readable characters, so browsers don't get confused. I know about bin2hex(), but I'd prefer to use the full ASCII-range instead of hex numbers, for the sake of more bits per length.
Any ideas?
Unfortunately Peter O. deleted his answer after receiving negative attention in a review queue, perhaps because he phrased it as a question. I believe it is legitimate answer so I will reprise it.
One easy solution is to encode your random data into the base64 alphabet using base64_encode(). This will not produce the "full ASCII-range" as you have requested but it will give you most of it. An even larger ASCII range is output by a suitable base85 encoder, but php does not have a built-in one. You can probably find plenty of open-source base85 encoders for php though. In my opinion the decrease in length of base85 over base64 is unlikely to be worth the extra code you have to maintain.
I personally just use a GUID library and concatenate a couple of GUIDs to get a long unique token string. You also have the option to remove the dashes to keep it difficult to know the source and if you want to make it even more complex you can randomly cut back the string by up to 10 char to add complexity to its unknown length.
I use this library for generating my GUIDs
https://packagist.org/packages/ramsey/uuid
use Ramsey\Uuid\Uuid;
$token = Uuid::uuid4() . '-' . Uuid::uuid4();
Sorry, I overlooked the part about you wanting to use the full scope of 26 alpha char with numeric... Not sure I have an answer for you in this respect but you should have faith in the difficulty of guessing a UUID4, especially when you add a couple together and obfuscate the length by a factor of 10 to make guessing more complex.
Actually, if you could safely generate an array of random numbers in the range of valid ascii char codes then you could convert the entire random array of codes into the respective ascii char and implode them together as a single string.
function randomAsciiString($length) {
return implode('', array_map(
function($value) {
return chr($value);
},
array_map(
function($value) {
return random_int(33, 126);
},
array_fill(0, $length - 1, null)
)
));
}
echo randomAsciiString(128); // Normal 128 char string
echo randomAsciiString(random_int(118, 128)); // obfuscated length char string for extra complexity.
of course though... you should be mindful that you're using all the standard keys on the keyboard and some of those characters are going to upset things that are sensitive ( eg quotes etc.. )
Let's consider the letters to be used. For the sake of simplicity I will assume that you intend only big and small English letters to be used. This means that you have 26 big letters and 26 small letters, 52 different possible values. If we view a byte array of n elements as a number of n digits in base 256 and we convert this number into a base 52 number, where A is 0, B is 1, C is 2, ..., a is 26, ..., z is 51, then converting these digits into the corresponding letters will yield the text you wanted.

How to treat two chars in a string as a byte?

Consider:
$tag = "4F";
$tag is a string containing two characters, '4' and 'F'. I want to be able to treat these as the upper and lower nibbles respectively of a whole byte (4F) so that I can go on to compute the bit-patterns (01001111)
As these are technically characters, they can be treated in their own right as a byte each - 4 on the ASCII table is 0x52 and F is 0x70.
Pretty much all the PHP built-in functions that allow manipulation of bytes (that I've seen so far) are variations on the latter description: '4' is 0x52, and not the upper nibble of a byte.
I don't know of any quick or built-in way to get PHP to handle this the way I want, but it feels like it should be there.
How do I convert a string "4F" to the byte 4F, or treat each char as a nibble in a nibble-pair. Are there any built in functions to get PHP to handle a string like "4F" or "3F0E" as pairs of nibbles?
Thanks.
If you're wanting "the decimal representation of a hex digit", hexdec is one way to go.
If you're wanting "bit pattern for hex digit", then use base_convert. The docs even show an example of this maneuver:
Example #1 base_convert() example
$hexadecimal = 'a37334';
echo base_convert($hexadecimal, 16, 2);
The above example will output:
101000110111001100110100

PHP encoding to 64 char length

To generate some pretty URLs I would take two strings
-UNIX timestamp
-string (length 16 chars)
Is there some way to use both and encode them to a string with a fixed length of 64 chars?
You can use the str_pad() function along with base64_encode for the string:
http://php.net/manual/en/function.str-pad.php
This functions returns the input string padded on the left, the right, or both sides to the specified padding length. If the optional argument pad_string is not supplied, the input is padded with spaces, otherwise it is padded with characters from pad_string up to the limit.
Example:
$string = urlencode(base64_encode($string));
$padded = str_pad($tstamp."_".$string."_", 64, "0");
Your url could then look like:
/order/timestamp_base64string_00000..../
Each _ separator gets you the part of the URL you need, you will have to urldecode() then base64_decode() the string.
I suggest that you rather use sessions (cookies) to maintain such information from a page to another, or use an intermediary storage system whose IDs could be used in URLs.

What set of chars is php's uniqid composed of?

I would like to prepare simple regular expression for php's uniqid. I checked uniqid manual looking for set of chars used as return value. But the documentation only mention that:
#return string the unique identifier, as a string.
And
With an empty prefix, the returned string will be 13 characters long. If more_entropy is true, it will be 23 characters.
I would like to know what characters can I expect in the return value. Is it a hex string? How to know for sure? Where to find something more about the uniqid function?
The documentation doesn't specify the string contents; only its length. Generally, you shouldn't depend on it. If you print the value between a pair of delimiters, like quotation marks, you could use them in the regular expression:
"([^"]+)" ($1 contains the value)
As long as you develop for a particular PHP version, you can inspect its implementation and assume, that it doesn't change. If you upgrade, you should check, if the assumption is still valid.
A comment in uniqid documentation describes, that it is essentially a hexadecimal number with an optional numeric suffix:
if (more_entropy) {
uniqid = strpprintf(0, "%s%08x%05x%.8F", prefix, sec, usec, php_combined_lcg() * 10);
} else {
uniqid = strpprintf(0, "%s%08x%05x", prefix, sec, usec);
}
Which gives you two possible output formats:
uniqid() - 13 characters, hexadecimal number
uniqid('', true) - 14 - 23 characters, hexadecimal number with floating number suffix
computed elsewhere
If you use other delimiters than alphanumeric characters and dot, you could use one of these simple regular expressions to grab the value in either of the two formats:
[0-9a-f]+
[.0-9a-f]+
If you need 100% format guarantee for any PHP version, you could write your own function based on sprintf.
I admit, that it is unlikely, that the uniqid would significantly change; I would expect creating other extensions to provide different formats. Another comment in uniqid documentation shows a RFC 4211 compliant UUID implementation. There was also a discussion on stackoverflow about it.
I found this on the php site: http://www.php.net/manual/en/function.uniqid.php#95001
If this is to be believed then the 13 character version is entirely hex.
However the 23 character version has:
14 characters (hex)
then a dot
then another 8 characters (decimal)
If you need to be entirely sure, you can verify this yourself: http://sandbox.onlinephpfunctions.com/code/c04c7854b764faee2548180eddb8c23288dcb5f7

How can I randomize an entire string of 62 characters?

I have 62 base64 characters that I want to randomize. How can I do this using PHP? The string would be all letters, upper and lower case as well as numbers from 0-9.
The thing that is most important to me is that the entire string be evaluated before a return value is given. In other words, if I request a string of 8 characters in length and my string starts out like:
1234567890ABCDE..... I don't want to get the first 8 numbers randomized. It should randomize the entire string first, then return 8 characters from that.
Try this:
$string = '1234567890ABCDE...';
$string = substr(str_shuffle($string), 0, 8);
str_shuffle randomizes the string, then substr takes the first 8 characters from it.
Take a look at str_shuffle.

Categories