A random-string function produces many duplicates - php

I am using the function below to generate random strings for filenames. I got no problems on Unix machines but I have many duplicates on Windows. I just made a test and generated 100.000 strings with the result, that each string occurs 227 (??) times. Could anyone explain this? Even with rand() I got duplicates, but srand() seems to work.
function generateRandomString($length = 6)
{
$rows = array();
array_push($rows, range('A', 'Z'));
array_push($rows, range('a', 'z'));
array_push($rows, range(0, 9));
$signs = array();
foreach ($rows as $row) {
$signs = array_merge($signs, $row);
}
shuffle($signs);
shuffle($signs);
$password = '';
for ($i = 0; $i < $length; $i++) {
$password .= $signs[array_rand($signs, 1)];
}
return $password;
}

Well, for one, Windows file names aren't case sensitive, so you should only use either lowercase or uppercase characters in your source arrays, not both.
The rest is basically upto the actual implementation of the PHP processor you're using, rather than Windows vs. *nix. Perhaps they rely on timer resolution and keep no internal state - that would be bad given that usually the timer resolution is about 15ms, which isn't a whole lot. In any case, it's most likely an issue with the PHP implementation, not Windows itself, as this code in C# nicely illustrates:
Random rnd = new Random();
byte[] buf = new byte[10 * 100000];
rnd.NextBytes(buf);
buf
.Select((val, idx) => new { Index = idx, Value = val })
.ToLookup(i => i.Index / 10)
.Select(i => string.Join(string.Empty,
i.Select(j => j.Value.ToString("X2")).ToArray()))
.GroupBy(i => i)
.Where(i => i.Count() > 1)
.Dump();
This creates 100 000 random strings of 20 characters (0 to F) and looks for duplicates. In a few hundred tests, I haven't had a single collision. So if you've got trouble with the random generator in PHP, go look at the particular implementation of PHP you're using, rather than blaming Windows :)
It's interesting how your code does a few passes of the randomness (shuffling the $signs array twice and then picking randomly from that?). Doing this most likely reduces the randomness of the data rather than increasing it. Seems just like a stupid attempt at hiding the password generation mechanism behind layers of obscurity (and then open sourcing it... eh :D).
As for passwords (your code seems to indicate that's what it was used for), you should probably use the crypto-secure randoms anyway - they're far less predictable, more random and less prone to bias.

Related

How to decrease runtime for generating permutations of a string?

I have written a function that takes in a MD5 hashvalue and finds its input/original value by permuting all possible combinations of a string. As per BIT_CHEETAH's answer on a SO question:
... you cannot decrypt MD5 without attempting something like brute force hacking which is extremely resource intensive, not practical, and unethical.
(Source: encrypt and decrypt md5)
I'm well aware of this, however, I am using this scenario to implement a string permutation function. I would also like to stick to the recursive methodology as opposed to others. The best summary of doing this is probably summarised by Mark Byers post:
- Try each of the letters in turn as the first letter and then find all
the permutations of the remaining letters using a recursive call.
- The base case is when the input is an empty string the only permutation is the empty string.
(Generating all permutations of a given string)
Anyway, so I implemented this and got the following:
function matchMD5($possibleChars, $md5, $concat, $length) {
for($i = 0; $i < strlen($possibleChars); $i++) {
$ch = $possibleChars[$i];
$concatSubstr = $concat.$ch;
if(strlen($concatSubstr) != $length) {
matchMD5($possibleChars, $md5, $concatSubstr, $length);
}
else if(strlen($concatSubstr) == $length) {
$tryHash = hash('md5', $concatSubstr);
if ($tryHash == $md5) {
echo "Match! $concatSubstr ";
return $concatSubstr;
}
}
}
}
Works 100%, however when I pass in a four character array, my server runs 10.7 seconds to generate a match where the match lies approximately 1/10th of the way of all possible permutations. My valid characters in which the functions permutes, called, $possibleChars, contains all alphanumeric characters plus a few selected punctionations:
0123456789.,;:abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
Question: Can the above code be written to run faster somehow?
When doing brute-force, you have to run through all the possibilities, there is not way of cutting a corner there. So you are left with profiling your code to find out what the application spends the most time doing and then trying to optimize that.

Need an array-like structure in PHP with minimal memory usage

In my PHP script I need to create an array of >600k integers. Unfortunately my webservers memory_limit is set to 32M so when initializing the array the script aborts with message
Fatal error: Allowed memory size of 33554432 bytes exhausted (tried to allocate 71 bytes) in /home/www/myaccount/html/mem_test.php on line 8
I am aware of the fact, that PHP does not store the array values as plain integers, but rather as zvalues which are much bigger than the plain integer value (8 bytes on my 64-bit system). I wrote a small script to estimate how much memory each array entry uses and it turns out, that it's pretty exactly 128 bytes. 128!!! I'd need >73M just to store the array. Unfortunately the webserver is not under my control so I cannot increase the memory_limit.
My question is, is there any possibility in PHP to create an array-like structure that uses less memory. I don't need this structure to be associative (plain index-access is sufficient). It also does not need to have dynamic resizing - I know exactly how big the array will be. Also, all elements would be of the same type. Just like a good old C-array.
Edit:
So deceze's solution works out-of-the-box with 32-bit integers. But even if you're on a 64-bit system, pack() does not seem to support 64-bit integers. In order to use 64-bit integers in my array I applied some bit-manipulation. Perhaps the below snippets will be of help for someone:
function push_back(&$storage, $value)
{
// split the 64-bit value into two 32-bit chunks, then pass these to pack().
$storage .= pack('ll', ($value>>32), $value);
}
function get(&$storage, $idx)
{
// read two 32-bit chunks from $storage and glue them back together.
return (current(unpack('l', substr($storage, $idx * 8, 4)))<<32 |
current(unpack('l', substr($storage, $idx * 8+4, 4))));
}
The most memory efficient you'll get is probably by storing everything in a string, packed in binary, and use manual indexing to it.
$storage = '';
$storage .= pack('l', 42);
// ...
// get 10th entry
$int = current(unpack('l', substr($storage, 9 * 4, 4)));
This can be feasible if the "array" initialisation can be done in one fell swoop and you're just reading from the structure. If you need a lot of appending to the string, this becomes extremely inefficient. Even this can be done using a resource handle though:
$storage = fopen('php://memory', 'r+');
fwrite($storage, pack('l', 42));
...
This is very efficient. You can then read this buffer back into a variable and use it as string, or you can continue to work with the resource and fseek.
A PHP Judy Array will use significantly less memory than a standard PHP array, and an SplFixedArray.
I quote "An array with 1 million entries using regular PHP array data structure takes 200MB. SplFixedArray uses around 90 megabytes. Judy uses 8 megs. Tradeoff is in performance, Judy takes about double the time of regular php array implementation."
You could use an object if possible. These often use less memory than array's.
Also SplFixedArray is an good option.
But it really depends on the implementation that you need to do. If you need an function to return an array and are using PHP 5.5. You could use the generator yield to stream the array back.
You can try to use a SplFixedArray, it's faster and take less memory (the doc comment say ~30% less). Test here and here.
Use a string - that's what I'd do. Store it in a string on fixed offsets (16 or 20 digits should do it I guess?) and use substr to get the one needed. Blazing fast write / read, super easy, and 600.000 integers will only take ~12M to store.
base_convert() - if you need something more compact but with minimum effort, convert your integers to base-36 instead of base-10; in this case, a 14-digit number would be stored in 9 alphanumeric characters. You'll need to make 2 pieces of 64-bit ints, but I'm sure that's not a problem. (I'd split them to 9-digit chunks where conversion gives you a 6-char version.)
pack()/unpack() - binary packing is the same thing with a bit more efficiency. Use it if nothing else works; split your numbers to make them fit to two 32-bit pieces.
600K is a lot of elements. If you are open to alternative methods, I personally would use a database for that. Then use standard sql/nosql select syntax to pull things out. Perhaps memcache or redis if you have an easy host for that, such as garantiadata.com. Maybe APC.
Depending on how you are generate the integers, you could potentially use PHP's generators, assuming you are traversing the array and doing something with individual values.
I took the answer by #deceze and wrapped it in a class that can handle 32-bit integers. It is append-only, but you can still use it as a simple, memory-optimized PHP Array, Queue, or Heap. AppendItem and ItemAt are both O(1), and it has no memory overhead. I added currentPosition/currentSize to avoid unnecessary fseek function calls. If you need to cap memory usage and switch to a temporary file automatically, use php://temp instead.
class MemoryOptimizedArray
{
private $_storage;
private $_currentPosition;
private $_currentSize;
const BYTES_PER_ENTRY = 4;
function __construct()
{
$this->_storage = fopen('php://memory', 'rw+');
$this->_currentPosition = 0;
$this->_currentSize = 0;
}
function __destruct()
{
fclose($this->_storage);
}
function AppendItem($value)
{
if($this->_currentPosition != $this->_currentSize)
{
fseek($this->_storage, SEEK_END);
}
fwrite($this->_storage, pack('l', $value));
$this->_currentSize += self::BYTES_PER_ENTRY;
$this->_currentPosition = $this->_currentSize;
}
function ItemAt($index)
{
$itemPosition = $index * self::BYTES_PER_ENTRY;
if($this->_currentPosition != $itemPosition)
{
fseek($this->_storage, $itemPosition);
}
$binaryData = fread($this->_storage, self::BYTES_PER_ENTRY);
$this->_currentPosition = $itemPosition + self::BYTES_PER_ENTRY;
$unpackedElements = unpack('l', $binaryData);
return $unpackedElements[1];
}
}
$arr = new MemoryOptimizedArray();
for($i = 0; $i < 3; $i++)
{
$v = rand(-2000000000,2000000000);
$arr->AddToEnd($v);
print("added $v\n");
}
for($i = 0; $i < 3; $i++)
{
print($arr->ItemAt($i)."\n");
}
for($i = 2; $i >=0; $i--)
{
print($arr->ItemAt($i)."\n");
}

Calculate a % b for very large numbers - php

I have to calculate a % b for two very large numbers.
I can not use the default modulo operator, because a and b are larger then PHP_INT_MAX, so I have to handle them as "strings".
I know that there exists special math libraries like BC or GMP but I can't use them, because my app probably will hosted on a shared host, where these are not enabled.
I have to write a function in php that will do the job. The function will take two strings (the two number) as parameters and have to return a % b, but I don't know how to start?
How to solve this problem?
Since PHP 4.0.4, libbcmath is bundled with PHP. You don't need any external libraries for this extension. These functions are only available if PHP was configured with --enable-bcmath .
The Windows version of PHP has built-in support for this extension. You do not need to load any additional extensions in order to use these functions. You should be able to enable these functions yourself, without any action on the part of the hosting company.
I though of this solution:
$n represents a huge number, $m the (not so huge) modulus.
function getModulus($n, $m)
{
$a = str_split($n);
$r = 0;
foreach($a as $v)
{
$r = ((($r * 10) + intval($v)) % $m);
}
return $r;
}
Hope it helps someone,
Depending on your processor, if using 64 bit machine 2^63-1 and if 32 bit machine 2^31-1 should give you the length of your decimal your machine can compute. above that you will get wrong values.
You can do the same by splitting your number into chunks.
Example:
my number is 18 decimal long thus, split into chunks of 9/7/2 = 18.
calculate the mod of the first chunk.
Append the mod of the first one to the front of the second chunk.
Example: result of the first mod = 23, thus 23XXXXXXX. find the mod of the resulting 23XXXXXXX. add the mod to the last chunk. Example: mod = 15 then 15XX.
$string = '123456789123456789'; // 18 decimal long
$chunk[0] = '123456789'; // 9 decimal long
$chunk[1] = '1234567'; // 7 decimal long
$chunk[2] = '89'; // 2 decimal long
$modulus = null;
foreach($chunk as $value){
$modulus = (int)($modulus.$value) % 45;
}
The result $modulus above should be same as
$modulus = $tring % 45
Better late than even.
Hope this will help. anyone with similar approach?
You can use fmod for values larger than MAX_INT
Read more about it here
http://php.net/manual/en/function.fmod.php

php5 pack is broken on x84_64 env

pack('H*', dechex(12345678900)) /* on 32bit */
!= pack('H*', dechex(12345678900)) /* on 64bit */
why ?
I don't know how to fix it, but I think I know why this is happening. No bug here - straigt out from the manual http://php.net/manual/en/function.dechex.php
The largest number that can be converted is 4294967295 in decimal resulting to "ffffffff"
I do not know what exactly is happening "inside" php, but you probably are causing 32 bit unsigned integer to overflow (12,345,678,900 > 4,294,967,295). Since on 64 bit this limit should be 18,446,744,073,709,551,615, dechex is returning "correct" values (32 vs 64 bit diffirence doesn't seem to be documented and I might be wrong since I don't have 64 bit system for testing).
//Edit:
As a last resort you could use GMP extesion to make your own hecdex function for 32 bit system, but that is going to produce lots and lots of overhead. Probably going to be one of the slowest implementations known to the modern programming.
//Edit2:
Wrote a function using BCMath, I'm on a Windows at the moment and was struggling finding correct dll for GMP.
function dechex32($i) {
//Cast string
$i = (string)$i;
//Initialize result string
$r = NULL;
//Map hex values 0-9, a-f to array keys
$hex = array_merge(range(0, 9), range('a', 'f'));
//While input is lagrer than 0
while(bccomp($i, '0') > 0) {
//Modulo 16 and append hex char to result
$r.= $hex[$mod = bcmod($i, '16')];
//i = (i - mod) / 16
$i = bcdiv(bcsub($i, $mod), '16');
}
//Reverse result and return
return strrev($r);
}
var_dump(dechex32(12345678900));
/*string(9) "2dfdc1c34"*/
Didn't test thoroughly but seems to work. Use as a last resort - rough benchmarking with 100,000 iterations did show, that it's ~40 times slower than native implemetation.

prime generator optimization

I'm starting out my expedition into Project Euler. And as many others I've figured I need to make a prime number generator. Problem is: PHP doesn't like big numbers. If I use the standard Sieve of Eratosthenes function, and set the limit to 2 million, it will crash. It doesn't like creating arrays of that size. Understandable.
So now I'm trying to optimize it. One way, I found, was to instead of creating an array with 2 million variable, I only need 1 million (only odd numbers can be prime numbers). But now it's crashing because it exceeds the maximum execution time...
function getPrimes($limit) {
$count = 0;
for ($i = 3; $i < $limit; $i += 2) {
$primes[$count++] = $i;
}
for ($n = 3; $n < $limit; $n += 2) {
//array will be half the size of $limit
for ($i = 1; $i < $limit/2; $i++) {
if ($primes[$i] % $n === 0 && $primes[$i] !== $n) {
$primes[$i] = 0;
}
}
}
return $primes;
}
The function works, but as I said, it's a bit slow...any suggestions?
One thing I've found to make it a bit faster is to switch the loop around.
foreach ($primes as $value) {
//$limitSq is the sqrt of the limit, as that is as high as I have to go
for ($n = 3; $n = $limitSq; $n += 2) {
if ($value !== $n && $value % $n === 0) {
$primes[$count] = 0;
$n = $limitSq; //breaking the inner loop
}
}
$count++;
}
And in addition setting the time and memory limit (thanks Greg), I've finally managed to get an answer. phjew.
Without knowing much about the algorithm:
You're recalculating $limit/2 each time around the $i loop
Your if statement will be evaluated in order, so think about (or test) whether it would be faster to test $primes[$i] !== $n first.
Side note, you can use set_time_limit() to give it longer to run and give it more memory using
ini_set('memory_limit', '128M');
Assuming your setup allows this, of course - on a shared host you may be restricted.
From Algorithmist's proposed solution
This is a modification of the standard
Sieve of Eratosthenes. It would be
highly inefficient, using up far too
much memory and time, to run the
standard sieve all the way up to n.
However, no composite number less than
or equal to n will have a factor
greater than sqrt{n},
so we only need to know all primes up
to this limit, which is no greater
than 31622 (square root of 10^9). This
is accomplished with a sieve. Then,
for each query, we sieve through only
the range given, using our
pre-computed table of primes to
eliminate composite numbers.
This problem has also appeared on UVA's and Sphere's online judges. Here's how it's enunciated on Sphere.
You can use a bit field to store your sieve. That is, it's roughly identical to an array of booleans, except you pack your booleans into a large integer. For instance if you had 8-bit integers you would store 8 bits (booleans) per integer which would further reduce your space requirements.
Additionally, using a bit field allows the possibility of using bit masks to perform your sieve operation. For example, if your sieve kept all numbers (not just odd ones), you could construct a bit mask of b01010101 which you could then AND against every element in your array. For threes you could use three integers as the mask: b00100100 b10010010 b01001001.
Finally, you do not need to check numbers that are lower than $n, in fact you don't need to check for numbers less than $n*$n-1.
Once you know the number is not a prime, I would exit the enter loop. I don't know php, but you need a statement like a break in C or a last in Perl.
If that is not available, I would set a flag and use it to exit the inter loop as a condition of continuing the interloop. This should speed up your execution as you are not checking $limit/2 items if it is not a prime.
if you want speed, don’t use PHP on this one :P
no, seriously, i really like PHP and it’s a cool language, but it’s not suited at all for such algorithms

Categories