php - help improve the efficiency of this youtube style url generator - php

After some searching I found this youtube style url generator with encryption to hide the original id... however I am hoping to improve the efficiency as it will be used a lot. So far I have improved it by 20%... can anyone help me improve it more.
This is the original:
function alphaID($in, $to_num = false, $pad_up = false, $passKey = null)
{
$index = "abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
if ($passKey !== null) {
// Although this function's purpose is to just make the
// ID short - and not so much secure,
// with this patch by Simon Franz (http://blog.snaky.org/)
// you can optionally supply a password to make it harder
// to calculate the corresponding numeric ID
for ($n = 0; $n<strlen($index); $n++) {
$i[] = substr( $index,$n ,1);
}
$passhash = hash('sha256',$passKey);
$passhash = (strlen($passhash) < strlen($index))
? hash('sha512',$passKey)
: $passhash;
for ($n=0; $n < strlen($index); $n++) {
$p[] = substr($passhash, $n ,1);
}
array_multisort($p, SORT_DESC, $i);
$index = implode($i);
}
$base = strlen($index);
if ($to_num) {
// Digital number <<-- alphabet letter code
$in = strrev($in);
$out = 0;
$len = strlen($in) - 1;
for ($t = 0; $t <= $len; $t++) {
$bcpow = bcpow($base, $len - $t);
$out = $out + strpos($index, substr($in, $t, 1)) * $bcpow;
}
if (is_numeric($pad_up)) {
$pad_up--;
if ($pad_up > 0) {
$out -= pow($base, $pad_up);
}
}
} else {
// Digital number -->> alphabet letter code
if (is_numeric($pad_up)) {
$pad_up--;
if ($pad_up > 0) {
$in += pow($base, $pad_up);
}
}
$out = "";
for ($t = floor(log10($in) / log10($base)); $t >= 0; $t--) {
$a = floor($in / bcpow($base, $t));
$out = $out . substr($index, $a, 1);
$in = $in - ($a * bcpow($base, $t));
}
$out = strrev($out); // reverse
}
return $out;
}
Here is my modified code so far:
function alphaID($in, $to_num = false, $pad_up = false, $passKey = null)
{
$index = 'abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ';
$i = array('a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z');
if ($passKey !== null) {
// Although this function's purpose is to just make the
// ID short - and not so much secure,
// with this patch by Simon Franz (http://blog.snaky.org/)
// you can optionally supply a password to make it harder
// to calculate the corresponding numeric ID
$len = strlen($index);
$passhash = hash('sha256',$passKey);
$passhash = (strlen($passhash) < $len)
? hash('sha512',$passKey)
: $passhash;
for ($n=0; $n < $len; $n++) {
$p[] = substr($passhash, $n ,1);
}
array_multisort($p, SORT_DESC, $i);
$index = implode($i);
}
$base = strlen($index);
if ($to_num) {
// Digital number <<-- alphabet letter code
$in = strrev($in);
$out = 0;
$len = strlen($in) - 1;
for ($t = 0; $t <= $len; $t++) {
$bcpow = bcpow($base, $len - $t);
$out = $out + strpos($index, substr($in, $t, 1)) * $bcpow;
}
if (is_numeric($pad_up)) {
$pad_up--;
if ($pad_up > 0) {
$out -= pow($base, $pad_up);
}
}
} else {
// Digital number -->> alphabet letter code
if (is_numeric($pad_up)) {
$pad_up--;
if ($pad_up > 0) {
$in += pow($base, $pad_up);
}
}
$out = "";
for ($t = floor(log10($in) / log10($base)); $t >= 0; $t--) {
$a = floor($in / bcpow($base, $t));
$out = $out . substr($index, $a, 1);
$in = $in - ($a * bcpow($base, $t));
}
$out = strrev($out); // reverse
}
return $out;
}
As you can see no major difference, only I removed the strlen from the for loops and stored it in a variable and precalculated the array for the index (a bit clumsy... but the array generation is what was making up the bulk of the computation).
Credit where its due: here is the original authors info:
* #author Kevin van Zonneveld
* #author Simon Franz
* #copyright 2008 Kevin van Zonneveld (kevin dot vanzonneveld dot net)
* #license www dot opensource dot org/licenses/bsd-license dot php New BSD Licence
* #version SVN: Release: $Id: alphaID.inc.php 344 2009-06-10 17:43:59Z kevin $
* #link kevin dot vanzonneveld dot net
I can't post the url's as I have to low reputation :S

I've made some slight optimizations to remove some extra CPU cycles here and there. Mostly things like unnecessary assignments, extra comparisons, etc. Also, strings can be treated as arrays, so I worked that in, too:
function alphaID($in, $to_num = false, $pad_up = false, $passKey = null)
{
static $passcache;
if(empty($passcache))
$passcache = array();
$index = 'abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ';
$i = array('a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z');
if (!empty($passKey)) {
// Although this function's purpose is to just make the
// ID short - and not so much secure,
// with this patch by Simon Franz (http://blog.snaky.org/)
// you can optionally supply a password to make it harder
// to calculate the corresponding numeric ID
if(isset($passcache[$passKey]))
$index = $passcache[$passKey];
else {
if(strlen($passhash = hash('sha256',$passKey)) < strlen($index))
$passhash = hash('sha512',$passKey);
$p = str_split($passhash);
array_multisort($p, SORT_DESC, $i);
$index = implode($i);
$passcache = $index;
}
}
$base = strlen($index);
if ($to_num) {
// Digital number <<-- alphabet letter code
$in = strrev($in);
$out = 0;
$len = strlen($in) - 1;
for ($t = 0; $t <= $len; $t++) {
$bcpow = bcpow($base, $len - $t);
$out += strpos($index, $in[$t]) * $bcpow;
}
if (is_numeric($pad_up)) {
$pad_up--;
if ($pad_up > 0) {
$out -= pow($base, $pad_up);
}
}
} else {
// Digital number -->> alphabet letter code
if (is_numeric($pad_up)) {
$pad_up--;
if ($pad_up > 0) {
$in += pow($base, $pad_up);
}
}
$out = "";
for ($t = floor(log10($in) / log10($base)); $t >= 0; $t--) {
$bcp = bcpow($base, $t);
$a = floor($in / $bcp);
$out .= $index[$a];
$in -= $a * $bcp;
}
$out = strrev($out); // reverse
}
return $out;
}
Edit: I've updated to include a cache. Try it now!

It looks like you've done a good job getting rid of the redundant calls.
It looks like there are a couple calls to bcpow($base, $t) that could be reduced to one...
$out = "";
for ($t = floor(log10($in) / log10($base)); $t >= 0; $t--) {
$newbase = bcpow($base, $t);
$a = floor($in / $newbase);
$out = $out . substr($index, $a, 1);
$in = $in - ($a * $newbase);
}
$out = strrev($out); // reverse
You probably won't notice a difference.

I think you should just generate an index and use it site wide.. let me explain why...
The hashing and everything above $base = strlen($index); is just to generate the index... you will have to use the same $passkey to decode a url. As you have no way of knowing which passkey was used you would have to make it site wide. Which would mean each call to generate the url will use the same passkey and thus generate the same index.
And so... I would strip the index generating code and generate a unique index and store it in the $index variable.

Related

Compare similarity between two string with PHP

Hey guys :) I want to ask for some solution. Now, I have dictionary words.txt, here some example:
happy
laugh
sad
And I have the slang string:
hppy
I want to search & match that slang string to my dictionary wich means it will return "happy" because those string refer to "happy" in dictionary.
Recently I've been using similar_text() but doesn't confident about its validity. Can you guys recommend better solution for my problem? Thank you :)
And here i put my codes:
function searchwords($tweet){
//echo $tweet;
$find = false;
$handle = #fopen("words.txt", "r");
if ($handle)
{
while (!feof($handle))
{
$buffer = fgets($handle);
similar_text(trim($tweet),trim($buffer),$percent);
if ($percent == 100){ // this exact match
$find = true;
}else if ($percent >= 90){ //there is the possibility of errors
$find = true;
}
}
fclose($handle);
}
if ($find == true){
unset($tweet);
}else{
return $tweet;
}
}
As answered here
I've found that to calculate a similarity percentage between strings,
the Levenshtein and Jaro Winkler algorithms work well for
spelling mistakes and small changes between strings, while the Smith
Waterman Gotoh algorithm works well for strings where significant
chunks of the text would be the same but with "noise" around the
edges. This answer to a similar question shows more detail on
this.
I included the php examples of using each of these three examples to return a similarity percentage between two strings:
Levenshtein
echo levenshtein("LEGENDARY","BARNEY STINSON");
Jaro Winkler
class StringCompareJaroWinkler
{
public function compare($str1, $str2)
{
return $this->JaroWinkler($str1, $str2, $PREFIXSCALE = 0.1 );
}
private function getCommonCharacters( $string1, $string2, $allowedDistance ){
$str1_len = mb_strlen($string1);
$str2_len = mb_strlen($string2);
$temp_string2 = $string2;
$commonCharacters='';
for( $i=0; $i < $str1_len; $i++){
$noMatch = True;
// compare if char does match inside given allowedDistance
// and if it does add it to commonCharacters
for( $j= max( 0, $i-$allowedDistance ); $noMatch && $j < min( $i + $allowedDistance + 1, $str2_len ); $j++){
if( $temp_string2[$j] == $string1[$i] ){
$noMatch = False;
$commonCharacters .= $string1[$i];
$temp_string2[$j] = '';
}
}
}
return $commonCharacters;
}
private function Jaro( $string1, $string2 ){
$str1_len = mb_strlen( $string1 );
$str2_len = mb_strlen( $string2 );
// theoretical distance
$distance = (int) floor(min( $str1_len, $str2_len ) / 2.0);
// get common characters
$commons1 = $this->getCommonCharacters( $string1, $string2, $distance );
$commons2 = $this->getCommonCharacters( $string2, $string1, $distance );
if( ($commons1_len = mb_strlen( $commons1 )) == 0) return 0;
if( ($commons2_len = mb_strlen( $commons2 )) == 0) return 0;
// calculate transpositions
$transpositions = 0;
$upperBound = min( $commons1_len, $commons2_len );
for( $i = 0; $i < $upperBound; $i++){
if( $commons1[$i] != $commons2[$i] ) $transpositions++;
}
$transpositions /= 2.0;
// return the Jaro distance
return ($commons1_len/($str1_len) + $commons2_len/($str2_len) + ($commons1_len - $transpositions)/($commons1_len)) / 3.0;
}
private function getPrefixLength( $string1, $string2, $MINPREFIXLENGTH = 4 ){
$n = min( array( $MINPREFIXLENGTH, mb_strlen($string1), mb_strlen($string2) ) );
for($i = 0; $i < $n; $i++){
if( $string1[$i] != $string2[$i] ){
// return index of first occurrence of different characters
return $i;
}
}
// first n characters are the same
return $n;
}
private function JaroWinkler($string1, $string2, $PREFIXSCALE = 0.1 ){
$JaroDistance = $this->Jaro( $string1, $string2 );
$prefixLength = $this->getPrefixLength( $string1, $string2 );
return $JaroDistance + $prefixLength * $PREFIXSCALE * (1.0 - $JaroDistance);
}
}
$jw = new StringCompareJaroWinkler();
echo $jw->compare("LEGENDARY","BARNEY STINSON");
Smith Waterman Gotoh
class SmithWatermanGotoh
{
private $gapValue;
private $substitution;
/**
* Constructs a new Smith Waterman metric.
*
* #param gapValue
* a non-positive gap penalty
* #param substitution
* a substitution function
*/
public function __construct($gapValue=-0.5,
$substitution=null)
{
if($gapValue > 0.0) throw new Exception("gapValue must be <= 0");
//if(empty($substitution)) throw new Exception("substitution is required");
if (empty($substitution)) $this->substitution = new SmithWatermanMatchMismatch(1.0, -2.0);
else $this->substitution = $substitution;
$this->gapValue = $gapValue;
}
public function compare($a, $b)
{
if (empty($a) && empty($b)) {
return 1.0;
}
if (empty($a) || empty($b)) {
return 0.0;
}
$maxDistance = min(mb_strlen($a), mb_strlen($b))
* max($this->substitution->max(), $this->gapValue);
return $this->smithWatermanGotoh($a, $b) / $maxDistance;
}
private function smithWatermanGotoh($s, $t)
{
$v0 = [];
$v1 = [];
$t_len = mb_strlen($t);
$max = $v0[0] = max(0, $this->gapValue, $this->substitution->compare($s, 0, $t, 0));
for ($j = 1; $j < $t_len; $j++) {
$v0[$j] = max(0, $v0[$j - 1] + $this->gapValue,
$this->substitution->compare($s, 0, $t, $j));
$max = max($max, $v0[$j]);
}
// Find max
for ($i = 1; $i < mb_strlen($s); $i++) {
$v1[0] = max(0, $v0[0] + $this->gapValue, $this->substitution->compare($s, $i, $t, 0));
$max = max($max, $v1[0]);
for ($j = 1; $j < $t_len; $j++) {
$v1[$j] = max(0, $v0[$j] + $this->gapValue, $v1[$j - 1] + $this->gapValue,
$v0[$j - 1] + $this->substitution->compare($s, $i, $t, $j));
$max = max($max, $v1[$j]);
}
for ($j = 0; $j < $t_len; $j++) {
$v0[$j] = $v1[$j];
}
}
return $max;
}
}
class SmithWatermanMatchMismatch
{
private $matchValue;
private $mismatchValue;
/**
* Constructs a new match-mismatch substitution function. When two
* characters are equal a score of <code>matchValue</code> is assigned. In
* case of a mismatch a score of <code>mismatchValue</code>. The
* <code>matchValue</code> must be strictly greater then
* <code>mismatchValue</code>
*
* #param matchValue
* value when characters are equal
* #param mismatchValue
* value when characters are not equal
*/
public function __construct($matchValue, $mismatchValue) {
if($matchValue <= $mismatchValue) throw new Exception("matchValue must be > matchValue");
$this->matchValue = $matchValue;
$this->mismatchValue = $mismatchValue;
}
public function compare($a, $aIndex, $b, $bIndex) {
return ($a[$aIndex] === $b[$bIndex] ? $this->matchValue
: $this->mismatchValue);
}
public function max() {
return $this->matchValue;
}
public function min() {
return $this->mismatchValue;
}
}
$o = new SmithWatermanGotoh();
echo $o->compare("LEGENDARY","BARNEY STINSON");
Hopefully it'll serve your purpose. But there are some things to keep in mind while working with similar_text().
matching percentage can differ depending on the order of parameters. ie: similar_text($a, $b, $percent) and similar_text($b, $a, $percent) here in both cases $percent may not be same.
Note that this function is case sensitive: so while passing $a and $b pass both either in uppercase or lowercase.
for more info check this page

Can you help me understand this simple PHP function, please?

I would like to know how this function works so I can re-write it in ColdFusion. I've never programmed in PHP. I think, the function checks if $string is 11 numbers in length.
But how is it doing it?
Why is it using a loop instead of a simple len() function?
I read about str_pad(), and understand it. But what is the function of the loop iteration $i as a third argument? What's it doing?
if($this->check_fake($cpf, 11)) return FALSE;
function check_fake($string, $length)
{
for($i = 0; $i <= 9; $i++) {
$fake = str_pad("", $length, $i);
if($string === $fake) return(1);
}
}
The function is meant to validate a CPF, the Brazilian equivalent of a US SSN #. The CPF is 11 characters in length. Basically, I need to know what it's doing so I can write the function in Coldfusion.
If it's just a length, can't it be if (len(cpf) != 11) return false; ?
Here is the entire code snippet if it interests you:
<?
/*
*# Class VALIDATE - It validates Brazilian CPF/CNPJ numbers
*# Developer: André Luis Cupini - andre#neobiz.com.br
************************************************************/
class VALIDATE
{
/*
*# Remove ".", "-", "/" of the string
*****************************************************/
function cleaner($string)
{
return $string = str_replace("/", "", str_replace("-", "", str_replace(".", "", $string)));
}
/*
*# Check if the number is fake
*****************************************************/
function check_fake($string, $length)
{
for($i = 0; $i <= 9; $i++) {
$fake = str_pad("", $length, $i);
if($string === $fake) return(1);
}
}
/*
*# Validates CPF
*****************************************************/
function cpf($cpf)
{
$cpf = $this->cleaner($cpf);
$cpf = trim($cpf);
if(empty($cpf) || strlen($cpf) != 11) return FALSE;
else {
if($this->check_fake($cpf, 11)) return FALSE;
else {
$sub_cpf = substr($cpf, 0, 9);
for($i =0; $i <=9; $i++) {
$dv += ($sub_cpf[$i] * (10-$i));
}
if ($dv == 0) return FALSE;
$dv = 11 - ($dv % 11);
if($dv > 9) $dv = 0;
if($cpf[9] != $dv) return FALSE;
$dv *= 2;
for($i = 0; $i <=9; $i++) {
$dv += ($sub_cpf[$i] * (11-$i));
}
$dv = 11 - ($dv % 11);
if($dv > 9) $dv = 0;
if($cpf[10] != $dv) return FALSE;
return TRUE;
}
}
}
}
?>
it basically makes sure the number isn't:
11111111111
22222222222
33333333333
44444444444
55555555555
etc...
With comments:
function check_fake($string, $length)
{
// for all digits
for($i = 0; $i <= 9; $i++)
{
// create a $length characters long string, consisting of
// $length number of the number $i
// e.g. 00000000000
$fake = str_pad("", $length, $i);
// return true if the provided CPN is equal
if($string === $fake) return(1);
}
}
In short, it tests to see if the provided string is just the same digit repeated $length times.

How to calculate the hamming distance of two binary sequences in PHP?

hamming('10101010','01010101')
The result of the above should be 8.
How to implement it?
without installed GMP here is easy solution for any same-length binary strings
function HammingDistance($bin1, $bin2) {
$a1 = str_split($bin1);
$a2 = str_split($bin2);
$dh = 0;
for ($i = 0; $i < count($a1); $i++)
if($a1[$i] != $a2[$i]) $dh++;
return $dh;
}
echo HammingDistance('10101010','01010101'); //returns 8
You don't need to implement it because it already exists:
http://php.net/manual/en/function.gmp-hamdist.php
(If you have GMP support)
The following function works with hex strings (equal length), longer than 32 bits.
function hamming($hash1, $hash2) {
$dh = 0;
$len1 = strlen($hash1);
$len2 = strlen($hash2);
$len = 0;
do {
$h1 = hexdec(substr($hash1, $len, 8));
$h2 = hexdec(substr($hash2, $len, 8));
$len += 8;
for ($i = 0; $i < 32; $i++) {
$k = (1 << $i);
if (($h1 & $k) !== ($h2 & $k)) {
$dh++;
}
}
} while ($len < $len1);
return $dh;
}
If you don't have GMP support there is always something like this. Downside it only works on binary strings up to 32 bits in length.
function hamdist($x, $y){
for($dist = 0, $val = $x ^ $y; $val; ++$dist){
$val &= $val - 1;
}
return $dist;
}
function hamdist_str($x, $y){
return hamdist(bindec($x), bindec($y));
}
echo hamdist_str('10101010','01010101'); //8
Try this function:
function hamming($b1, $b2) {
$b1 = ltrim($b1, '0');
$b2 = ltrim($b2, '0');
$l1 = strlen($b1);
$l2 = strlen($b2);
$n = min($l1, $l2);
$d = max($l1, $l2) - $n;
for ($i=0; $i<$n; ++$i) {
if ($b1[$l1-$i] != $b2[$l2-$i]) {
++$d;
}
}
return $d;
}
You can easily code your hamming function with the help of substr_count() and the code provided in this comment on the PHP manual.
/* hamdist is equivilent to: */
echo gmp_popcount(gmp_xor($ham1, $ham2)) . "\n";
Try:
echo gmp_hamdist('10101010','01010101')

question about merge sort under php

i'v been trying to implement merge sort under php. but seems unsuccessful :( couldn't find source of error. any kind of help is very much appreciated!
function merge_sort(&$input, $start, $end) {
if($start < $end) {
$mid = (int) floor($start + $end / 2);
merge_sort($input, $start, $mid);
merge_sort($input, $mid + 1, $end);
merge($input, $start, $mid, $end);
}
}
function merge(&$input, $p, $q, $r) {
$a = $q - $p + 1;
$b = $r - $q;
for($i = $p;$i <= $q;$i++) {
$arr1[] = $input[$i];
}
for($i = $q+1;$i <= $r;$i++) {
$arr2[] = $input[$i];
}
$c = $d = 0;
for($i = $p; $i <= $r; $i++) {
$s = $arr1[$c];
$t = $arr2[$d];
if($a && (($s <= $t) || !$b)) {
$input[$i] = $s;
$a--;$c++;
} else if($b) {
$input[$i] = $t;
$b--;$d++;
}
}
return true;
}
here's the info xdebug throw back:
Fatal error: Maximum function nesting level of '100' reached, aborting!
To reach a nesting level of 100 on a merge sort, you would need to have an input array of size 2^100(about 1e30), which is impossible. I suspect your recursion is incorrect. For instance, you wrote $start + $end / 2 instead of ($start + $end) / 2.
Set
xdebug.max_nesting_level=500
in my php.ini
Here's my working solution, feel free to compare...
/**
* #param array $items array to sort
* #param int $l left index (defaults to 0)
* #param int $r right index (defaults to count($items)-1)
*/
function mergeSort(&$items, $l = 0, $r = null)
{
if (!isset($r)) {
$r = count($items) - 1;
}
if ($l < $r) {
$m = floor(($r - $l) / 2) + $l;
mergeSort($items, $l, $m);
mergeSort($items, $m + 1, $r);
merge($items, $l, $m, $r);
}
}
/**
* #param array $items array to merge
* #param int $l left index
* #param int $m middle index
* #param int $r right index
*/
function merge(&$items, $l, $m, $r)
{
$itemsA = array_slice($items, $l, $m + 1 - $l);
$itemsB = array_slice($items, $m + 1, ($r + 1) - ($m + 1));
$a = 0;
$aCount = count($itemsA);
$b = 0;
$bCount = count($itemsB);
for ($i = $l; $i <= $r; $i++) {
if ($a < $aCount && ($b == $bCount || $itemsA[$a] <= $itemsB[$b])) {
$items[$i] = $itemsA[$a++];
} else {
$items[$i] = $itemsB[$b++];
}
}
}
$items = array(5,3,6,1,2,3,9,10,7,2,4,8);
mergeSort($items);
echo implode(',', $items) . "\n";
Outputs:
1,2,2,3,3,4,5,6,7,8,9,10
There is a maximum function nesting level of '100', and you have reached it. Your recursive function goes too deep.
From http://www.xdebug.org/docs/all_settings:
xdebug.max_nesting_level
Type: integer, Default value: 100
Controls the protection mechanism for infinite recursion protection. The value of this setting is the maximum level of nested functions that are allowed before the script will be aborted.
You are going too deep in your recursive function triggering this xdebug error. Try raising this limit and see if that helps.
Also, here is an interesting article about recursion and PHP: http://www.alternateinterior.com/2006/09/tail-recursion-in-php.html
PHP is not a good language for recursive algorithms. From the manual:
It is possible to call recursive
functions in PHP. However avoid
recursive function/method calls with
over 100-200 recursion levels as it
can smash the stack and cause a
termination of the current script.
If you need to do it in PHP, you'll probably have to find an iterative version of the algorithm. You've hit a hard-coded limit in the language.
Cosi dovrebbe funzionare!!!! www.dslpg.it
function merge ( &$a, $left, $center, $right ) { //left = p right = r center = q
$n1 = $center - $left + 1;
$n2 = $right - $center;
for ($i = 1; $i <= $n1; $i++) $L[$i] = $a[$left+$i-1];
for ($i = 1; $i <= $n2; $i++) $R[$i] = $a[$center+$i];
$L[$n1+1] = 99999;
$R[$n2+1] = 99999;
$i = 1;
$j = 1;
for ($k = $left; $k <= $right; $k++) {
if ($L[$i] <= $R[$j] ) {
$a[$k] = $L[$i];
echo $a[$k];
$i++;
}
else {
$a[$k] = $R[$j];
echo $a[$k];
$j++;
}
}
return $a;
}
function merge_sort ( &$a, $left, $right ) { //left = p right = r
if ( $left < $right ) {
$center = (int) floor(($left + $right) / 2 );
merge_sort ($a, $left, $center);
merge_sort ($a, $center+1, $right);
merge ($a, $left, $center, $right);
}
}
merge_sort ( $a, 1, $n );

PHP - help fix a bug caused by two number dividing exactly

I have been working with a piece of code for youtube style URL's but I have found a bug and I am hoping someone can show me the most efficient way to fix it.
function alphaID($in, $to_num = false, $pad_up = false, $passKey = null)
{
static $passcache;
if(empty($passcache))
$passcache = array();
$index = 'abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ';
$i = array('a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z');
if (!empty($passKey)) {
// Although this function's purpose is to just make the
// ID short - and not so much secure,
// with this patch by Simon Franz (http://blog.snaky.org/)
// you can optionally supply a password to make it harder
// to calculate the corresponding numeric ID
if(isset($passcache[$passKey]))
$index = $passcache[$passKey];
else {
if(strlen($passhash = hash('sha256',$passKey)) < strlen($index))
$passhash = hash('sha512',$passKey);
$p = str_split($passhash);
array_multisort($p, SORT_DESC, $i);
$index = implode($i);
$passcache = $index;
}
}
$base = strlen($index);
if ($to_num) {
// Digital number <<-- alphabet letter code
$in = strrev($in);
$out = 0;
$len = strlen($in) - 1;
for ($t = 0; $t <= $len; $t++) {
$bcpow = bcpow($base, $len - $t);
$out += strpos($index, $in[$t]) * $bcpow;
}
if (is_numeric($pad_up)) {
$pad_up--;
if ($pad_up > 0) {
$out -= pow($base, $pad_up);
}
}
} else {
// Digital number -->> alphabet letter code
if (is_numeric($pad_up)) {
$pad_up--;
if ($pad_up > 0) {
$in += pow($base, $pad_up);
}
}
$out = "";
for ($t = floor(log10($in) / log10($base)); $t >= 0; $t--) {
$bcp = bcpow($base, $t);
$a = floor($in / $bcp);
$out .= $index[$a];
$in -= $a * $bcp;
}
$out = strrev($out); // reverse
}
return $out;
}
The bug is only when encoding a single number 238328 as it is my base to the power of three. As a result it divides exactly and because of use of 'floor' it goes unnoticed and the script tries to add the 62nd character which does not exist and only produces a three character code rather than four... thus 'aa' is the result rather than 'aaab'.
Here is the problem part of the code:
for ($t = floor(log10($in) / log10($base)); $t >= 0; $t--) {
$bcp = bcpow($base, $t);
$a = floor($in / $bcp);
$out .= $index[$a];
$in -= $a * $bcp;
And to make it even easier here is the call to get the error
echo alphaID(238328);
credits: Orginally written by Kevin Vanzonneveld: kevin dot vanzonneveld dot net, modified by Simon Franz: blog dot snaky dot org and optimised by Stackoverflows very own mattbasta
Here you go:
function preciseDivision($x,$y)
{
// Correct floor's failures by adding a bit of overhead
$epsilon = 0.00000001;
return floor(($x/$y) + $epsilon);
}
function alphaID($in, $to_num = false, $pad_up = false, $passKey = null)
{
static $passcache;
if(empty($passcache))
$passcache = array();
$index = 'abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ';
$i = array('a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z');
if (!empty($passKey)) {
// Although this function's purpose is to just make the
// ID short - and not so much secure,
// with this patch by Simon Franz (http://blog.snaky.org/)
// you can optionally supply a password to make it harder
// to calculate the corresponding numeric ID
if(isset($passcache[$passKey]))
$index = $passcache[$passKey];
else {
if(strlen($passhash = hash('sha256',$passKey)) < strlen($index))
$passhash = hash('sha512',$passKey);
$p = str_split($passhash);
array_multisort($p, SORT_DESC, $i);
$index = implode($i);
$passcache = $index;
}
}
$base = strlen($index);
if ($to_num) {
// Digital number <<-- alphabet letter code
$in = strrev($in);
$out = 0;
$len = strlen($in) - 1;
for ($t = 0; $t <= $len; $t++) {
$bcpow = bcpow($base, $len - $t);
$out += strpos($index, $in[$t]) * $bcpow;
}
if (is_numeric($pad_up)) {
$pad_up--;
if ($pad_up > 0) {
$out -= pow($base, $pad_up);
}
}
} else {
// Digital number -->> alphabet letter code
if (is_numeric($pad_up)) {
$pad_up--;
if ($pad_up > 0) {
$in += pow($base, $pad_up);
}
}
$out = "";
for ($t = preciseDivision(log10($in),log10($base)); $t >= 0; $t--) {
$bcp = bcpow($base, $t);
$a = preciseDivision($in, $bcp);
$out .= $index[$a];
$in -= $a * $bcp;
}
$out = strrev($out); // reverse
}
return $out;
}
The problem here was not floor, but floating point precision. The division resulted 2.99999999, and the floor(2.999999) is equal to 2, not 3. This happens because limited size of floating point variables.
That's why it did not work.
I wrote a function preciseDivision, which automatically adds a very small value to the division, to get through this.
And I still believe that there should exist cleaner solutions to this url hashing problem. I'll see what I can do.
As per my answer to your other question, try replacing log10($in) / log10($base) with log($in, $base).
This avoids the inaccuracies associated with dividing the results of the two logarithms as floating point numbers and gives you the correct result.
Adding another answer as the first one is also working, though this one is cleaner.
I got rid of BC Math functions. If you're going to work with really large integers, this may not work. Otherwise, this is much cleaner solution:
function alphaID($in, $to_num = false, $pad_up = false, $passKey = null)
{
static $passcache;
if(empty($passcache))
$passcache = array();
$index = 'abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ';
$i = array('a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z');
if (!empty($passKey)) {
// Although this function's purpose is to just make the
// ID short - and not so much secure,
// with this patch by Simon Franz (http://blog.snaky.org/)
// you can optionally supply a password to make it harder
// to calculate the corresponding numeric ID
if(isset($passcache[$passKey]))
$index = $passcache[$passKey];
else {
if(strlen($passhash = hash('sha256',$passKey)) < strlen($index))
$passhash = hash('sha512',$passKey);
$p = str_split($passhash);
array_multisort($p, SORT_DESC, $i);
$index = implode($i);
$passcache = $index;
}
}
$base = strlen($index);
if ($to_num) {
// Digital number <<-- alphabet letter code
// A conversion from base $base to base 10
$out = 0; // End number
$shift = 1; // Starting shift
$len = strlen($in); // Length of string
for ($t = 0; $t < $len; $t++)
{
$out += strpos($index, $in[$t]) * $shift; // $out is a number form alphabet * base^shift
$shift *= $base; // increase shift
}
if (is_numeric($pad_up)) {
$pad_up--;
if ($pad_up > 0) {
$out -= pow($base, $pad_up);
}
}
} else {
// Digital number -->> alphabet letter code
if (is_numeric($pad_up)) {
$pad_up--;
if ($pad_up > 0) {
$in += pow($base, $pad_up);
}
}
$out = "";
// A simple conversion from base 10 to base $base
while ($in > 0)
{
$remainder = $in % $base;
$in = intval(($in-$remainder)/$base);
$out .= $index[$remainder];
}
}
return $out;
}
The code is cleaner, and should be faster as well.
Now it is much easier to see that this is only conversion from base 10 to base $base (62?) and vica-versa.
It does not involve floating point division, thus it does not have the bug mentioned above.
If you need multiplying large integers, and so on, this can be implemented this way as well with some bright thinking.
Added BC Maths, as you said you need large integers
function alphaID($in, $to_num = false, $pad_up = false, $passKey = null)
{
static $passcache;
if(empty($passcache))
$passcache = array();
$index = 'abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ';
$i = array('a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z');
if (!empty($passKey)) {
// Although this function's purpose is to just make the
// ID short - and not so much secure,
// with this patch by Simon Franz (http://blog.snaky.org/)
// you can optionally supply a password to make it harder
// to calculate the corresponding numeric ID
if(isset($passcache[$passKey]))
$index = $passcache[$passKey];
else {
if(strlen($passhash = hash('sha256',$passKey)) < strlen($index))
$passhash = hash('sha512',$passKey);
$p = str_split($passhash);
array_multisort($p, SORT_DESC, $i);
$index = implode($i);
$passcache = $index;
}
}
$base = strlen($index);
if ($to_num) {
// Digital number <<-- alphabet letter code
// A conversion from base $base to base 10
$out = '0'; // End number
$shift = 1; // Starting shift
$len = strlen($in); // Length of string
for ($t = 0; $t < $len; $t++)
{
$out = bcadd($out, bcmul(strpos($index, $in[$t]),$shift)); // $out is a number from alphabet * base^shift
$shift = bcmul($shift, $base); // increase shift
}
if (is_numeric($pad_up)) {
$pad_up--;
if ($pad_up > 0) {
$out -= pow($base, $pad_up);
}
}
} else {
// Digital number -->> alphabet letter code
if (is_numeric($pad_up)) {
$pad_up--;
if ($pad_up > 0) {
$in += pow($base, $pad_up);
}
}
$out = "";
// A simple conversion from base 10 to base $base
while ($in > '0') // We're treating integer as a string, so BC math works
{
$remainder = bcmod($in,$base);
$in = bcdiv($in, $base);
$out .= $index[$remainder];
}
}
return $out;
}

Categories