Create indexable non-repeating combinations with fixed length - php

Based on this question
Ordered Fixed Length Combination of a String
I created a PHP algorithm that creates combinations of characters on a fixed length (basically a rewrite of the Java-answer)
private function getCombination($length, $input) {
$result = array();
if ($length == 0) {
return $result;
}
$first = substr($input, 0, $length);
$result[] = $first;
if (strlen($input) == $length) {
return $result;
}
$tails = $this->getCombination($length - 1, substr($input, 1));
foreach ($tails as $tail) {
$tmp = substr($input, 0, 1) . $tail;
if (!in_array($tmp, $result)) {
$result[] = $tmp;
}
}
return array_merge($result, $this->getCombination($length, substr($input, 1)));
}
For another question, Create fixed length non-repeating permutation of larger set, I was given a (brilliant) algorithm that would make permutations indexable, effectively making them adressable by providing a "key" that would always produce the exact same permutation, when given the same set of characters and the same length.
Well, now I basically need the same but for combinations, in contrast to permutations as in my other question.
Can the algorithm above be modified in the same way? Meaning to create a function like
public function getCombinationByIndex($length, $index);
That will return one combination out of the thousand possible that is created with the algorithm without creating them beforehand?

I have written a class in C# to handle common functions for working with the binomial coefficient, which is the type of problem that your problem appears to fall under - assuming that you working with combinations instead of permutations. It performs the following tasks:
Outputs all the K-indexes in a nice format for any N choose K to a file. The K-indexes can be substituted with more descriptive strings or letters.
Converts the K-indexes to the proper index of an entry in the sorted binomial coefficient table. This technique is much faster than older published techniques that rely on iteration. It does this by using a mathematical property inherent in Pascal's Triangle.
Converts the index in a sorted binomial coefficient table to the corresponding K-indexes. I believe it is also faster than older iterative solutions.
Uses Mark Dominus method to calculate the binomial coefficient, which is much less likely to overflow and works with larger numbers.
The class is written in .NET C# and provides a way to manage the objects related to the problem (if any) by using a generic list. The constructor of this class takes a bool value called InitTable that when true will create a generic list to hold the objects to be managed. If this value is false, then it will not create the table. The table does not need to be created in order to use the 4 above methods. Accessor methods are provided to access the table.
There is an associated test class which shows how to use the class and its methods. It has been extensively tested with 2 cases and there are no known bugs.
To read about this class and download the code, see Tablizing The Binomial Coeffieicent.
It should be pretty straight forward to port this class over to php. You probably will not have to port over the generic part of the class to accomplish your goals. Denending on the number of combinations you are working with, you might need to use a bigger word size than 4 byte ints.

Related

How to decrease runtime for generating permutations of a string?

I have written a function that takes in a MD5 hashvalue and finds its input/original value by permuting all possible combinations of a string. As per BIT_CHEETAH's answer on a SO question:
... you cannot decrypt MD5 without attempting something like brute force hacking which is extremely resource intensive, not practical, and unethical.
(Source: encrypt and decrypt md5)
I'm well aware of this, however, I am using this scenario to implement a string permutation function. I would also like to stick to the recursive methodology as opposed to others. The best summary of doing this is probably summarised by Mark Byers post:
- Try each of the letters in turn as the first letter and then find all
the permutations of the remaining letters using a recursive call.
- The base case is when the input is an empty string the only permutation is the empty string.
(Generating all permutations of a given string)
Anyway, so I implemented this and got the following:
function matchMD5($possibleChars, $md5, $concat, $length) {
for($i = 0; $i < strlen($possibleChars); $i++) {
$ch = $possibleChars[$i];
$concatSubstr = $concat.$ch;
if(strlen($concatSubstr) != $length) {
matchMD5($possibleChars, $md5, $concatSubstr, $length);
}
else if(strlen($concatSubstr) == $length) {
$tryHash = hash('md5', $concatSubstr);
if ($tryHash == $md5) {
echo "Match! $concatSubstr ";
return $concatSubstr;
}
}
}
}
Works 100%, however when I pass in a four character array, my server runs 10.7 seconds to generate a match where the match lies approximately 1/10th of the way of all possible permutations. My valid characters in which the functions permutes, called, $possibleChars, contains all alphanumeric characters plus a few selected punctionations:
0123456789.,;:abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
Question: Can the above code be written to run faster somehow?
When doing brute-force, you have to run through all the possibilities, there is not way of cutting a corner there. So you are left with profiling your code to find out what the application spends the most time doing and then trying to optimize that.

PHP SourceCode Random

The function for rand() should be something like (SEED * A + C) mod M.
How can I find the values of A, C, and M? And if I find those values, can I predict the next number in the sequence?
I know that I can find the values of these variables in the PHP source code. But after looking around I really cannot find them...
Does anybody know what file it would be in? Or who else I could contact (I've tried email internals#lists.php.net but haven't got a response)
Also I'm doing all this in PHP versions prior to 7, where rand() and mt_rand() became synonymous.
EDIT: I have seen Is it possible to predict rand(0,10) in PHP? but those answers aren't about the constant values in PHP's rand() value by themselves.
Thank you!
I believe that the old school rand() function used a linear congruential generator.
This generator is system dependent. One algorithm employed by glibc was:
next = next * 1103515245 + 12345;
return next & 0x7fffffff;
so there you have your constants. The state, of course, is the initial value of 'next', which is zero unless set differently by srand().
There are ways of attacking a linear congruence; one possibility - the slowest, but the easiest to explain - is to brute force it. Say that you four consecutive values: a0, a1, a2, a3 from your rand() implementation. You can check all values of seed that would yield that same sequence.
Note that if your a0 value is produced by, say, rand() % 7172, then your initial seed must obey the rule that "seed % 7172 === a0". This immediately reduces the space you need to brute force, speeding up operations proportionately. Also, you don't need to check all four numbers.
This would be the efficient equivalent of running (in PHP)
for ($seed = 0; $seed < MAX_SEED; $seed++) {
srand($seed);
if ($a0 !== [RAND() FORMULA]) return false;
if ($a1 !== [RAND() FORMULA]) return false;
if ($a2 !== [RAND() FORMULA]) return false;
if ($a3 !== [RAND() FORMULA]) return false;
return true;
}
Experiments
By checking with a reference trivial C source code
#include <stdio.h>
int main() {
srand(1);
printf("%ld\n", rand());
}
I determined that PHP and C do indeed share the same underlying function (I tabulated different values for srand()).
I also found out that srand(0) and srand(1) yield the same result, which isn't consistent with my linear model.
And that's because glibc rand() is not so trivial a linear congruential generator. More info here. Actually it is quoted in a SO answer and the code I had was for the old, TYPE_0 generator.

PHP built in functions complexity (isAnagramOfPalindrome function)

I've been googling for the past 2 hours, and I cannot find a list of php built in functions time and space complexity. I have the isAnagramOfPalindrome problem to solve with the following maximum allowed complexity:
expected worst-case time complexity is O(N)
expected worst-case space complexity is O(1) (not counting the storage required for input arguments).
where N is the input string length. Here is my simplest solution, but I don't know if it is within the complexity limits.
class Solution {
// Function to determine if the input string can make a palindrome by rearranging it
static public function isAnagramOfPalindrome($S) {
// here I am counting how many characters have odd number of occurrences
$odds = count(array_filter(count_chars($S, 1), function($var) {
return($var & 1);
}));
// If the string length is odd, then a palindrome would have 1 character with odd number occurrences
// If the string length is even, all characters should have even number of occurrences
return (int)($odds == (strlen($S) & 1));
}
}
echo Solution :: isAnagramOfPalindrome($_POST['input']);
Anyone have an idea where to find this kind of information?
EDIT
I found out that array_filter has O(N) complexity, and count has O(1) complexity. Now I need to find info on count_chars, but a full list would be very convenient for future porblems.
EDIT 2
After some research on space and time complexity in general, I found out that this code has O(N) time complexity and O(1) space complexity because:
The count_chars will loop N times (full length of the input string, giving it a start complexity of O(N) ). This is generating an array with limited maximum number of fields (26 precisely, the number of different characters), and then it is applying a filter on this array, which means the filter will loop 26 times at most. When pushing the input length towards infinity, this loop is insignificant and it is seen as a constant. Count also applies to this generated constant array, and besides, it is insignificant because the count function complexity is O(1). Hence, the time complexity of the algorithm is O(N).
It goes the same with space complexity. When calculating space complexity, we do not count the input, only the objects generated in the process. These objects are the 26-elements array and the count variable, and both are treated as constants because their size cannot increase over this point, not matter how big the input is. So we can say that the algorithm has a space complexity of O(1).
Anyway, that list would be still valuable so we do not have to look inside the php source code. :)
A probable reason for not including this information is that is is likely to change per release, as improvements are made / optimizations for a general case.
PHP is built on C, Some of the functions are simply wrappers around the c counterparts, for example hypot a google search, a look at man hypot, in the docs for he math lib
http://www.gnu.org/software/libc/manual/html_node/Exponents-and-Logarithms.html#Exponents-and-Logarithms
The source actually provides no better info
https://github.com/lattera/glibc/blob/a2f34833b1042d5d8eeb263b4cf4caaea138c4ad/math/w_hypot.c (Not official, Just easy to link to)
Not to mention, This is only glibc, Windows will have a different implementation. So there MAY even be a different big O per OS that PHP is compiled on
Another reason could be because it would confuse most developers.
Most developers I know would simply choose a function with the "best" big O
a maximum doesnt always mean its slower
http://www.sorting-algorithms.com/
Has a good visual prop of whats happening with some functions, ie bubble sort is a "slow" sort, Yet its one of the fastest for nearly sorted data.
Quick sort is what many will use, which is actually very slow for nearly sorted data.
Big O is worst case - PHP may decide between a release that they should optimize for a certain condition and that will change the big O of the function and theres no easy way to document that.
There is a partial list here (which I guess you have seen)
List of Big-O for PHP functions
Which does list some of the more common PHP functions.
For this particular example....
Its fairly easy to solve without using the built in functions.
Example code
function isPalAnagram($string) {
$string = str_replace(" ", "", $string);
$len = strlen($string);
$oddCount = $len & 1;
$string = str_split($string);
while ($len > 0 && $oddCount >= 0) {
$current = reset($string);
$replace_count = 0;
foreach($string as $key => &$char) {
if ($char === $current){
unset($string[$key]);
$len--;
$replace_count++;
continue;
}
}
$oddCount -= ($replace_count & 1);
}
return ($len - $oddCount) === 0;
}
Using the fact that there can not be more than 1 odd count, you can return early from the array.
I think mine is also O(N) time because its worst case is O(N) as far as I can tell.
Test
$a = microtime(true);
for($i=1; $i<100000; $i++) {
testMethod("the quick brown fox jumped over the lazy dog");
testMethod("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa");
testMethod("testest");
}
printf("Took %s seconds, %s memory", microtime(true) - $a, memory_get_peak_usage(true));
Tests run using really old hardware
My way
Took 64.125452041626 seconds, 262144 memory
Your way
Took 112.96145009995 seconds, 262144 memory
I'm fairly sure that my way is not the quickest way either.
I actually cant see much info either for languages other than PHP (Java for example).
I know a lot of this post is speculating about why its not there and theres not a lot drawing from credible sources, I hope its an partially explained why big O isnt listed in the documentation page though

How to generate unique secure random string in PHP?

I want to add random string as token for form submission which is generated unique forever. I have spent to much time with Google but I am confused which combination to use?
I found so many ways to do this when I googled:
1) Combination of character and number.
2) Combination of character, number and special character.
3) Combination of character, number, special character and date time.
Which combination may i use?
How many character of random string may I generate.?
Any other method which is secure then please let me know.?
Here are some considerations:
Alphabet
The number of characters can be considered the alphabet for the encoding. It doesn't affect the string strength by itself but a larger alphabet (numbers, non-alpha-number characters, etc.) does allow for shorter strings of similar strength (aka keyspace) so it's useful if you are looking for shorter strings.
Input Values
To guarantee your string to be unique, you need to add something which is guaranteed to be unique.
Random value is a good seed value if you have a good random number generator
Time is a good seed value to add but it may not be unique in a high traffic environment
User ID is a good seed value if you assume a user isn't going to create sessions at the exact same time
Unique ID is something the system guarantees is unique. This is often something that the server will guarantee / verify is unique, either in a single server deployment or distributed deployment. A simple way to do this is to add a machine ID and machine unique ID. A more complicated way to do this is to assign key ranges to machines and have each machine manage their key range.
Systems that I've worked with that require absolute uniqueness have added a server unique id which guarantees a item is unique. This means the same item on different servers would be seen as different, which was what was wanted here.
Approach
Pick one more input values that matches your requirement for uniqueness. If you need absolute uniqueness forever, you need something that you control that you are sure is unique, e.g. a machine associated number (that won't conflict with others in a distributed system). If you don't need absolute uniqueness, you can use a random number with other value such as time. If you need randomness, add a random number.
Use an alphabet / encoding that matches your use case. For machine ids, encodings like hexadecimal and base 64 are popular. For machine-readable ids, for case-insensitive encodings, I prefer base32 (Crockford) or base36 and for case-sensitive encodings, I prefer base58 or base62. This is because these base32, 36, 58 and 62 produce shorter strings and (vs. base64) are safe across multiple uses (e.g. URLs, XML, file names, etc.) and don't require transformation between different use cases.
You can definitely get a lot fancier depending on your needs, but I'll just throw this out there since it's what I use frequently for stuff like what you are describing:
md5(rand());
It's quick, simple and easy to remember. And since it's hexadecimal it plays nicely with others.
Refer to this SO Protected Question. This might be what you are looking.
I think its better to redirect you to a previously asked question which has more substantive answers.You will find a lot of options.
Try the code, for function getUniqueToken() which returns you unique string of length 10 (default).
/*
This function will return unique token string...
*/
function getUniqueToken($tokenLength = 10){
$token = "";
//Combination of character, number and special character...
$combinationString = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789*#&$^";
for($i=0;$i<$tokenLength;$i++){
$token .= $combinationString[uniqueSecureHelper(0,strlen($combinationString))];
}
return $token;
}
/*
This helper function will return unique and secure string...
*/
function uniqueSecureHelper($minVal, $maxVal) {
$range = $maxVal - $minVal;
if ($range < 0) return $minVal; // not so random...
$log = log($range, 2);
$bytes = (int) ($log / 8) + 1; // length in bytes
$bits = (int) $log + 1; // length in bits
$filter = (int) (1 << $bits) - 1; // set all lower bits to 1
do {
$rnd = hexdec(bin2hex(openssl_random_pseudo_bytes($bytes)));
$rnd = $rnd & $filter; // discard irrelevant bits
} while ($rnd >= $range);
return $minVal + $rnd;
}
Use this code (two function), you can increase string length by passing int parameter like getUniqueToken(15).
I use your 2nd idea (Combination of character, number and special character), which you refine after googling. I hope my example will help you.
You should go for 3 option. Because it has date and time so it become every time unique.
And for method have you tried
str_shuffle($string)
Every time it generates random string from $string.
End then use substr
($string , start , end)
to cut it down.
End if you want date and time then concatenate the result string with it.
An easily understandable and effective code to generate random strings in PHP. I do not consider predictability concerns important in this connection.
<?php
$d = str_shuffle('0123456789');
$C = str_shuffle('ABCDEFGHIJKLMNOPQRSTUVWXYZ');
$m = str_shuffle('abcdefghijklmnopqrstuvwxyz');
$s = str_shuffle('#!$&()*+-_~');
$l=9; //min 4
$r=substr(str_shuffle($d.$C.$m.$s),0,$l);echo $r.'<br>';
$safe=substr($d,0,1).substr($C,0,1).substr($m,0,1).mb_substr($s,0,1);
$r=str_shuffle($safe.substr($r,0,$l-4));//always at least one digit, special, small and capital
// this also allows for 0,1 or 2 of each available characters in string
echo $r;
exit;
?>
For unique string use uniqid().
And to make it secure, use hashing algorithms
for example :
echo md5(uniqid())

How do I go about converting a math equation into php?

I am not so good at maths and I'm looking to transfer 3 math equations to php functions.
I've tried looking up how to individually do each part of the equation in php but I keep getting strange results so I must be doing something wrong.
Is there a php function for exponential growth?
The image with the equations are here:
http://i.imgur.com/zIhMEEu.jpg
Thanks
For the second equation this is what I have:
$rank = 50;
$xp = log(24000 * (2^($rank/6) - 1));
echo $xp;
The number is too small for this to be correct. I'm also not sure how to convert the 'ln 2' into PHP. The log() function seemed to come up under 'natural logarithm to php' search.
There are various functions that need to be combined in order to create these equations. The log function performs logarithm operations in a base of your choice (or ln if you do not provide a base). The pow function performs exponentiation.
Your equations would be:
function rank($xp) {
return floor(6 * log((xp * log(2) / 24000 + 1), 2));
}
function xp($rank) {
return floor(24000 * (pow(2, (rank / 6)) - 1) / log(2));
}
function kills($rank) {
return floor(xp($rank) / 200);
}
There are a few more parentheses there than absolutely needed, for clarity's sake.
Mathematical notations in general are considerably more compact and expressive than most programming languages (not just PHP) due to the fact that you can use any symbol you can think of to represent various concepts. In programming, you're stuck calling functions.
Also, I'm not sure what the various hardcoded numbers represent, or if it makes sense to change them, in the context of the formula, but you might want to think about setting them up as extra parameters to the function. For example:
function kills($rank, $killsPerXp = 200) {
return floor(xp($rank) / $killsPerXp);
}
This adds clarity to the code, because it lets you know what the numbers represent. At the same time, it allows you to change the numbers more easily in case you are using them in multiple places.

Categories