What set of chars is php's uniqid composed of? - php

I would like to prepare simple regular expression for php's uniqid. I checked uniqid manual looking for set of chars used as return value. But the documentation only mention that:
#return string the unique identifier, as a string.
And
With an empty prefix, the returned string will be 13 characters long. If more_entropy is true, it will be 23 characters.
I would like to know what characters can I expect in the return value. Is it a hex string? How to know for sure? Where to find something more about the uniqid function?

The documentation doesn't specify the string contents; only its length. Generally, you shouldn't depend on it. If you print the value between a pair of delimiters, like quotation marks, you could use them in the regular expression:
"([^"]+)" ($1 contains the value)
As long as you develop for a particular PHP version, you can inspect its implementation and assume, that it doesn't change. If you upgrade, you should check, if the assumption is still valid.
A comment in uniqid documentation describes, that it is essentially a hexadecimal number with an optional numeric suffix:
if (more_entropy) {
uniqid = strpprintf(0, "%s%08x%05x%.8F", prefix, sec, usec, php_combined_lcg() * 10);
} else {
uniqid = strpprintf(0, "%s%08x%05x", prefix, sec, usec);
}
Which gives you two possible output formats:
uniqid() - 13 characters, hexadecimal number
uniqid('', true) - 14 - 23 characters, hexadecimal number with floating number suffix
computed elsewhere
If you use other delimiters than alphanumeric characters and dot, you could use one of these simple regular expressions to grab the value in either of the two formats:
[0-9a-f]+
[.0-9a-f]+
If you need 100% format guarantee for any PHP version, you could write your own function based on sprintf.
I admit, that it is unlikely, that the uniqid would significantly change; I would expect creating other extensions to provide different formats. Another comment in uniqid documentation shows a RFC 4211 compliant UUID implementation. There was also a discussion on stackoverflow about it.

I found this on the php site: http://www.php.net/manual/en/function.uniqid.php#95001
If this is to be believed then the 13 character version is entirely hex.
However the 23 character version has:
14 characters (hex)
then a dot
then another 8 characters (decimal)
If you need to be entirely sure, you can verify this yourself: http://sandbox.onlinephpfunctions.com/code/c04c7854b764faee2548180eddb8c23288dcb5f7

Related

Cryptographically secure random ASCII-string in PHP

I know about random_bytes() in PHP 7, and I want to use it for generating a cryptographically secure (e.g. hard to guess) random string for use as a one-time token or for longer term storage in a cookie.
Unfortunately, I don't know how to convert the output of random_bytes() to a string consisting only of human readable characters, so browsers don't get confused. I know about bin2hex(), but I'd prefer to use the full ASCII-range instead of hex numbers, for the sake of more bits per length.
Any ideas?
Unfortunately Peter O. deleted his answer after receiving negative attention in a review queue, perhaps because he phrased it as a question. I believe it is legitimate answer so I will reprise it.
One easy solution is to encode your random data into the base64 alphabet using base64_encode(). This will not produce the "full ASCII-range" as you have requested but it will give you most of it. An even larger ASCII range is output by a suitable base85 encoder, but php does not have a built-in one. You can probably find plenty of open-source base85 encoders for php though. In my opinion the decrease in length of base85 over base64 is unlikely to be worth the extra code you have to maintain.
I personally just use a GUID library and concatenate a couple of GUIDs to get a long unique token string. You also have the option to remove the dashes to keep it difficult to know the source and if you want to make it even more complex you can randomly cut back the string by up to 10 char to add complexity to its unknown length.
I use this library for generating my GUIDs
https://packagist.org/packages/ramsey/uuid
use Ramsey\Uuid\Uuid;
$token = Uuid::uuid4() . '-' . Uuid::uuid4();
Sorry, I overlooked the part about you wanting to use the full scope of 26 alpha char with numeric... Not sure I have an answer for you in this respect but you should have faith in the difficulty of guessing a UUID4, especially when you add a couple together and obfuscate the length by a factor of 10 to make guessing more complex.
Actually, if you could safely generate an array of random numbers in the range of valid ascii char codes then you could convert the entire random array of codes into the respective ascii char and implode them together as a single string.
function randomAsciiString($length) {
return implode('', array_map(
function($value) {
return chr($value);
},
array_map(
function($value) {
return random_int(33, 126);
},
array_fill(0, $length - 1, null)
)
));
}
echo randomAsciiString(128); // Normal 128 char string
echo randomAsciiString(random_int(118, 128)); // obfuscated length char string for extra complexity.
of course though... you should be mindful that you're using all the standard keys on the keyboard and some of those characters are going to upset things that are sensitive ( eg quotes etc.. )
Let's consider the letters to be used. For the sake of simplicity I will assume that you intend only big and small English letters to be used. This means that you have 26 big letters and 26 small letters, 52 different possible values. If we view a byte array of n elements as a number of n digits in base 256 and we convert this number into a base 52 number, where A is 0, B is 1, C is 2, ..., a is 26, ..., z is 51, then converting these digits into the corresponding letters will yield the text you wanted.

How to treat two chars in a string as a byte?

Consider:
$tag = "4F";
$tag is a string containing two characters, '4' and 'F'. I want to be able to treat these as the upper and lower nibbles respectively of a whole byte (4F) so that I can go on to compute the bit-patterns (01001111)
As these are technically characters, they can be treated in their own right as a byte each - 4 on the ASCII table is 0x52 and F is 0x70.
Pretty much all the PHP built-in functions that allow manipulation of bytes (that I've seen so far) are variations on the latter description: '4' is 0x52, and not the upper nibble of a byte.
I don't know of any quick or built-in way to get PHP to handle this the way I want, but it feels like it should be there.
How do I convert a string "4F" to the byte 4F, or treat each char as a nibble in a nibble-pair. Are there any built in functions to get PHP to handle a string like "4F" or "3F0E" as pairs of nibbles?
Thanks.
If you're wanting "the decimal representation of a hex digit", hexdec is one way to go.
If you're wanting "bit pattern for hex digit", then use base_convert. The docs even show an example of this maneuver:
Example #1 base_convert() example
$hexadecimal = 'a37334';
echo base_convert($hexadecimal, 16, 2);
The above example will output:
101000110111001100110100

Preg replace numbers if more than 5 digits

I have an image upload form. After user submitted the form, my script will process the image and clean the image filename (im appending a unique number series at the end of the filename to prevent possible duplicate filename.
Often Im receiving filenames (after processing) such as
"c-id-1333-l-id-1298491-aid-3951-id-13995346097186883-im-193-1.jpg"
How can I preg_replace the numbers if its more than 5 digits, if less than 5 digit or less it will be retain. The above example should give "c-id-1333-l-id--aid-3951-id--im-193-1.jpg" (Dont mind the multiple consecutive dash[-], my script can handle this.
You can use the following to do this.
$str = 'c-id-1333-l-id-1298491-aid-3951-id-13995346097186883-im-193-1.jpg';
$str = preg_replace('/\d{5,}/', '', $str);
var_dump($str);
Explanation:
\d{5,} # digits (0-9) (at least 5 times)
Output:
string(41) "c-id-1333-l-id--aid-3951-id--im-193-1.jpg"
If you want to retain 5 digits or less than 5 then you can use \d{6,} instead.
Is the user supplied name important?
If it is not, one technique i like to do to normalize file names in that case is to simply hash them with something like sha1 or even md5. Then add your timestamp, and ids or what not to that, this takes care of a lot of issues with special characters such as ".", "\" and "/" ( dot dot dash and directory traversal ) in the file names.
#hwnd's regular expression should do the trick, but I though I would throw this out there.
so with hashing you'd get cleaner ( but less meaningful ) names like this
da39a3ee5e6b4b0d3255bfef95601890afd80709.jpg
then you can add your unique numbers on
da39a3ee5e6b4b0d3255bfef95601890afd80709-1234568764564558.jpg
you could even salt the filename with the timestamp first and then hash them to get filenames all 40 characters long, and the chance of hash collisions is very minimal unless your dealing with 10's of thousands of files, in which case just up the hashing to sha256 etc.

PHP is_numeric or preg_match 0-9 validation

This isn't a big issue for me (as far as I'm aware), it's more of something that's interested me. But what is the main difference, if any, of using is_numeric over preg_match (or vice versa) to validate user input values.
Example One:
<?php
$id = $_GET['id'];
if (!preg_match('/^[0-9]*$/', $id)) {
// Error
} else {
// Continue
}
?>
Example Two:
<?php
$id = $_GET['id'];
if (!is_numeric($id)) {
// Error
} else {
// Continue
}
?>
I assume both do exactly the same but is there any specific differences which could cause problems later somehow? Is there a "best way" or something I'm not seeing which makes them different.
is_numeric() tests whether a value is a number. It doesn't necessarily have to be an integer though - it could a decimal number or a number in scientific notation.
The preg_match() example you've given only checks that a value contains the digits zero to nine; any number of them, and in any sequence.
Note that the regular expression you've given also isn't a perfect integer checker, the way you've written it. It doesn't allow for negatives; it does allow for a zero-length string (ie with no digits at all, which presumably shouldn't be valid?), and it allows the number to have any number of leading zeros, which again may not be the intended.
[EDIT]
As per your comment, a better regular expression might look like this:
/^[1-9][0-9]*$/
This forces the first digit to only be between 1 and 9, so you can't have leading zeros. It also forces it to be at least one digit long, so solves the zero-length string issue.
You're not worried about negatives, so that's not an issue.
You might want to restrict the number of digits, because as things stand, it will allow strings that are too big to be stored as integers. To restrict this, you would change the star into a length restriction like so:
/^[1-9][0-9]{0,15}$/
This would allow the string to be between 1 and 16 digits long (ie the first digit plus 0-15 further digits). Feel free to adjust the numbers in the curly braces to suit your own needs. If you want a fixed length string, then you only need to specify one number in the braces.
According to http://www.php.net/manual/en/function.is-numeric.php, is_numeric alows something like "+0123.45e6" or "0xFF". I think this not what you expect.
preg_match can be slow, and you can have something like 0000 or 0051.
I prefer using ctype_digit (works only with strings, it's ok with $_GET).
<?php
$id = $_GET['id'];
if (ctype_digit($id)) {
echo 'ok';
} else {
echo 'nok';
}
?>
is_numeric() allows any form of number. so 1, 3.14159265, 2.71828e10 are all "numeric", while your regex boils down to the equivalent of is_int()
is_numeric would accept "-0.5e+12" as a valid ID.
Not exactly the same.
From the PHP docs of is_numeric:
'42' is numeric
'1337' is numeric
'1e4' is numeric
'not numeric' is NOT numeric
'Array' is NOT numeric
'9.1' is numeric
With your regex you only check for 'basic' numeric values.
Also is_numeric() should be faster.
is_numeric checks whether it is any sort of number, while your regex checks whether it is an integer, possibly with leading 0s. For an id, stored as an integer, it is quite likely that we will want to not have leading 0s. Following Spudley's answer, we can do:
/^[1-9][0-9]*$/
However, as Spudley notes, the resulting string may be too large to be stored as a 32-bit or 64-bit integer value. The maximum value of an signed 32-bit integer is 2,147,483,647 (10 digits), and the maximum value of an signed 64-bit integer is 9,223,372,036,854,775,807 (19 digits). However, many 10 and 19 digit integers are larger than the maximum 32-bit and 64-bit integers respectively. A simple regex-only solution would be:
/^[1-9][0-9]{0-8}$/
or
/^[1-9][0-9]{0-17}$/
respectively, but these "solutions" unhappily restrict each to 9 and 19 digit integers; hardly a satisfying result. A better solution might be something like:
$expr = '/^[1-9][0-9]*$/';
if (preg_match($expr, $id) && filter_var($id, FILTER_VALIDATE_INT)) {
echo 'ok';
} else {
echo 'nok';
}
is_numeric checks more:
Finds whether the given variable is numeric. Numeric strings consist
of optional sign, any number of digits, optional decimal part and
optional exponential part. Thus +0123.45e6 is a valid numeric value.
Hexadecimal notation (0xFF) is allowed too but only without sign,
decimal and exponential part.
You can use this code for number validation:
if (!preg_match("/^[0-9]+$/i", $phone)) {
$errorMSG = 'Invalid Number!';
$error = 1;
}
If you're only checking if it's a number, is_numeric() is much much better here. It's more readable and a bit quicker than regex.
The issue with your regex here is that it won't allow decimal values, so essentially you've just written is_int() in regex. Regular expressions should only be used when there is a non-standard data format in your input; PHP has plenty of built in validation functions, even an email validator without regex.
PHP's is_numeric function allows for floats as well as integers. At the same time, the is_int function is too strict if you want to validate form data (strings only). Therefore, you had usually best use regular expressions for this.
Strictly speaking, integers are whole numbers positive and negative, and also including zero. Here is a regular expression for this:
/^0$|^[-]?[1-9][0-9]*$/
OR, if you want to allow leading zeros:
/^[-]?[0]|[1-9][0-9]$/
Note that this will allow for values such as -0000, which does not cause problems in PHP, however. (MySQL will also cast such values as 0.)
You may also want to confine the length of your integer for considerations of 32/64-bit PHP platform features and/or database compatibility. For instance, to limit the length of your integer to 9 digits (excluding the optional - sign), you could use:
/^0$|^[-]?[1-9][0-9]{0,8}$/
Meanwhile, all the values above will only restrict the values to integer,
so i use
/^[1-9][0-9\.]{0,15}$/
to allow float values too.
You can use filter_var() to check for integers in strings
<?php
$intnum = "1000022";
if (filter_var($intnum, FILTER_VALIDATE_INT) !== false){
echo $intnum.' is an int now';
}else{
echo "$intnum is not an int.";
}
// will output 1000022 is an int now

inconsistency in converting string to integer, when string is hex, prefixed with '0x'

Using PHP 5.3.5. Not sure how this works on other versions.
I'm confused about using strings that hold numbers, e.g., '0x4B0' or '1.2e3'. The way how PHP works with such strings seems inconsistent to me. Is it only me? Or is it a bug? Or undocumented feature? Or am I just missing some magic sentence in docs?
<?php
echo $str = '0x4B0', PHP_EOL;
echo "is_numeric() -> ", var_dump(is_numeric($str)); // bool(true)
echo "*1 -> ", var_dump($str * 1); // int(1200)
echo "(int) -> ", var_dump((int)$str); // int(0)
echo "(float) -> ", var_dump((float)$str); // float(0)
echo PHP_EOL;
echo $str = '1.2e3', PHP_EOL;
echo "is_numeric() -> ", var_dump(is_numeric($str)); // bool(true)
echo "*1 -> ", var_dump($str * 1); // float(1200)
echo "(int) -> ", var_dump((int)$str); // int(1)
echo "(float) -> ", var_dump((float)$str); // float(1200)
echo PHP_EOL;
In both cases, is_numeric() returns true. Also, in both cases, $str * 1 parses string and returns valid number (integer in one case, float in another case).
Casting with (int)$str and (float)$str gives unexpected results.
(int)$str in any case is able to parse only digits, with optional "+" or "-" in front of them.
(float)$str is more advanced and can parse something like ^[+-]?\d*(\.\d*)?(e[+-]?\d*)?, i.e., optional "+" or "-", followed by optional digits, followed by optional decimal point with optional digits, followed by optional exponent which consists of "e" with optional "+" or "-" followed by optional digits. Fails on hex data though.
Related docs:
is_numeric() - states that "Hexadecimal notation (0xFF) is allowed too but only without sign, decimal and exponential part". If function, meant to test if a string holds numeric data, returns true, I expect PHP to be able to convert such string to a number. This seems to work with $str * 1, but not with casting. Why?
Converting to integer - states that "in most cases the cast is not needed, since a value will be automatically converted if an operator, function or control structure requires an integer argument". After such statement, I expect both $s * 10 and (int)$s * 10 expressions to work the same way and to return the same result. Though, as shown in example, those expressions are evaluated differently.
String conversion to numbers - states that "Valid numeric data is an optional sign, followed by one or more digits (optionally containing a decimal point), followed by an optional exponent". "Exponent" is "e" or "E", followed by digits, e.g., 1.2e3 is valid numeric data. Sign ("+" or "-") is not mentioned. It does not mention hexidecimal values. This conflicts with definition of "numeric data" used in is_numeric(). Then, there is suggestion "For more information on this conversion, see the Unix manual page for strtod(3)", and man strtod describes additional numeric values (including HEX notation). So, after reading this, is hexidecimal data supposed to be valid or invalid numeric data?
So...
Is there (or, rather, should there be) any relation between is_numeric() and the way how PHP treats strings when they are used as numbers?
Why do (int)$s, (float)$s and $s * 1 work differently, i.e,. give completely different results, when $s is 0x4B0 or 1.2e3?
Is there any way to convert a string to a number and keep its value, if it is written as 0x4B0 or as 1.2e3? floatval() does not work with HEX at all, intval() needs $base to be set to 16 to work with HEX, typecasting with (int)$str and (float)$str sometimes works, sometimes does not work, so these are not valid options. I'm also not considering $n *= 1;, as it looks more like data manipulation rather than converting. Self-written functions also are not considered in this case, as I'm looking for native solution.
The direct casts (int)$str and (float)$str don't really work differently at all: They both read as many characters from the string as they can interpret as a number of the respective type.
For "0x4B0", the int-conversion reads "0" (OK), then "x" and stops, because it cannot convert "x" into an integer. Likewise for the float-conversion.
For "1.2e3", the int-conversion reads "1" (OK), then "." and stops. The float-conversion recognises the entire string as valid float notation.
The automatic type recognition for an expression like $str * 1 is simply more flexible than the explicit casts. The explicit casts require the integers and floats to be in the format produced by %i and %f in printf, essentially.
Perhaps you can use intval and floatval rather than explicit casts-to-int for more flexibility, though.
Finally, your question "is hexidecimal data supposed to be valid or invalid numeric data?" is awkward. There is no such thing as "hexadecimal data". Hexadecimal is just a number base. What you can do is take a string like "4B0" and use strtoul etc. to parse it as an integer in any number base between 2 and 36.[Sorry, that was BS. There's no strtoul in PHP. But intval has the equivalent functionality, see above.]
intval uses strtol which recognizes oct/hex prefixes when the base parameter is zero, so
var_dump(intval('0xef')); // int(0)
var_dump(intval('0xff', 0)); // int(255)
Is there (or, rather, should there be) any relation between is_numeric() and the way how PHP treats strings when they are used as numbers?
There is no datatype called numeric in PHP, the is_numeric() function is more of a test for something that can be interpreted as number by PHP.
As far as such number interpreting is concerned, adding a + in front of the value will actually make PHP to convert it into a number:
$int = +'0x4B0';
$float = +'1.2e3';
You find this explained in the manual for string, look for the section String conversion to numbers.
As it's triggered by an operator, I don't see any need why there should be a function in PHP that does the same. That would be superfluous.
Internally PHP uses a function called zendi_convert_scalar_to_number for the add operator (assumable +) that will make use of is_numeric_string to obtain the number.
The exact same function is called internally by is_numeric() when used with strings.
So to trigger the native conversion function, I would just use the + operator. This will ensure that you'll get back the numeric pseudo-type (int or float).
Ref: /Zend/zend_operators.c; /ext/standard/type.c

Categories