The function levenshtein in PHP works on strings with maximum length 255. What are good alternatives to compute a similarity score of sentences in PHP.
Basically I have a database of sentences, and I want to find approximate duplicates.
similar_text function is not giving me expected results. What is the easiest way for me to detect similar sentences like below:
$ss="Jack is a very nice boy, isn't he?";
$pp="jack is a very nice boy is he";
$ss=strtolower($ss); // convert to lower case as we dont care about case
$pp=strtolower($pp);
$score=similar_text($ss, $pp);
echo "$score %\n"; // Outputs just 29 %
$score=levenshtein ( $ss, $pp );
echo "$score\n"; // Outputs '5', which indicates they are very similar. But, it does not work for more than 255 chars :(
The levenshtein algorithm has a time complexity of O(n*m), where n and m are the lengths of the two input strings. This is pretty expensive and computing such a distance for long strings will take a long time.
For whole sentences, you might want to use a diff algorithm instead, see for example: Highlight the difference between two strings in PHP
Having said this, PHP also provides the similar_text function which has an even worse complexity (O(max(n,m)**3)) but seems to work on longer strings.
I've found the Smith Waterman Gotoh to be the best algorithm for comparing sentences. More info in this answer. Here is the PHP code example:
class SmithWatermanGotoh
{
private $gapValue;
private $substitution;
/**
* Constructs a new Smith Waterman metric.
*
* #param gapValue
* a non-positive gap penalty
* #param substitution
* a substitution function
*/
public function __construct($gapValue=-0.5,
$substitution=null)
{
if($gapValue > 0.0) throw new Exception("gapValue must be <= 0");
//if(empty($substitution)) throw new Exception("substitution is required");
if (empty($substitution)) $this->substitution = new SmithWatermanMatchMismatch(1.0, -2.0);
else $this->substitution = $substitution;
$this->gapValue = $gapValue;
}
public function compare($a, $b)
{
if (empty($a) && empty($b)) {
return 1.0;
}
if (empty($a) || empty($b)) {
return 0.0;
}
$maxDistance = min(mb_strlen($a), mb_strlen($b))
* max($this->substitution->max(), $this->gapValue);
return $this->smithWatermanGotoh($a, $b) / $maxDistance;
}
private function smithWatermanGotoh($s, $t)
{
$v0 = [];
$v1 = [];
$t_len = mb_strlen($t);
$max = $v0[0] = max(0, $this->gapValue, $this->substitution->compare($s, 0, $t, 0));
for ($j = 1; $j < $t_len; $j++) {
$v0[$j] = max(0, $v0[$j - 1] + $this->gapValue,
$this->substitution->compare($s, 0, $t, $j));
$max = max($max, $v0[$j]);
}
// Find max
for ($i = 1; $i < mb_strlen($s); $i++) {
$v1[0] = max(0, $v0[0] + $this->gapValue, $this->substitution->compare($s, $i, $t, 0));
$max = max($max, $v1[0]);
for ($j = 1; $j < $t_len; $j++) {
$v1[$j] = max(0, $v0[$j] + $this->gapValue, $v1[$j - 1] + $this->gapValue,
$v0[$j - 1] + $this->substitution->compare($s, $i, $t, $j));
$max = max($max, $v1[$j]);
}
for ($j = 0; $j < $t_len; $j++) {
$v0[$j] = $v1[$j];
}
}
return $max;
}
}
class SmithWatermanMatchMismatch
{
private $matchValue;
private $mismatchValue;
/**
* Constructs a new match-mismatch substitution function. When two
* characters are equal a score of <code>matchValue</code> is assigned. In
* case of a mismatch a score of <code>mismatchValue</code>. The
* <code>matchValue</code> must be strictly greater then
* <code>mismatchValue</code>
*
* #param matchValue
* value when characters are equal
* #param mismatchValue
* value when characters are not equal
*/
public function __construct($matchValue, $mismatchValue) {
if($matchValue <= $mismatchValue) throw new Exception("matchValue must be > matchValue");
$this->matchValue = $matchValue;
$this->mismatchValue = $mismatchValue;
}
public function compare($a, $aIndex, $b, $bIndex) {
return ($a[$aIndex] === $b[$bIndex] ? $this->matchValue
: $this->mismatchValue);
}
public function max() {
return $this->matchValue;
}
public function min() {
return $this->mismatchValue;
}
}
$str1 = "Jack is a very nice boy, isn't he?";
$str2 = "jack is a very nice boy is he";
$o = new SmithWatermanGotoh();
echo $o->compare($str1, $str2);
You could try using similar_text.
It can get quite slow with 20,000+ characters (3-5 seconds) but your example you mention using only sentences, this will work just fine for that usage.
One thing to note is when comparing string of different sizes you will not get 100%. For example if you compare "he" with "head" you would only get a 50% match.
Related
I am looking for a way to generate a specific quantity of unique random numbers (say 10,000), within a specific range of numbers like (000000000,999999999).
I'd like to do this without repeatedly using rand() or mt_rand() under a for loop as this would be computationally inefficient.
Are there any PHP libraries, or solutions which meets these requirements?
One method is to use a Format Preserving Encryption, with the output limited to the range 0 to 999999999. If you encrypt the numbers 0 to 9,999 you will get 10,000 unique outputs in the required range. With an encryption, unique inputs guarantee unique outputs as long as you don't change the key.
1) Create a class that keeps state of generation:
class Randomizer {
private $min;
private $max;
private $maxGeneration;
public function __construct($min = 0, $max = 100) {
if ($min >= $max) {
throw new Exception('Minimal value is more than or equal to Max value');
}
if ($max - $min < 3) {
throw new Exception('Nothing to randomize');
}
$this->min = $min;
$this->max = $max;
$this->maxGeneration = $max - $min - 1;
}
public function pick($quantity = 1) {
$count = 0;
$generated = [];
while($count < $quantity) {
$num = $this->generate();
if (sizeof($generated) === $this->maxGeneration) {
break;
}
if (!in_array($num, $generated)) {
$generated[] = $num;
$count++;
}
}
return ($quantity === 1) ? $generated[0] : $generated;
}
public function generate() {
return mt_rand($this->min, $this->max);
}
}
2) Use it:
$randomizer = new Randomizer(0, 999999999);
$number = $randomizer->pick(); // returns 1 number
$numbers = $randomizer->pick(100); // returns array(A, B, C...) of numbers
I need to make a PHP script that generates the interpolation function from the set of points.
I have decided to use the Lagrange Interpolation because it was easiest for me to find the example that generates the function from a list of input points. The issues with other methods is that I couldn't find an example that generates the function -> all other examples for all other interpolations only generate additional points and not the function out of the existing points.
The source that I've used to find the example for the Lagrange Interpolation is: http://www2.lawrence.edu/fast/GREGGJ/Math420/Section_3_1.pdf
I've decided to replicate this example in my PHP code.
/**
* Generate one basis polynomial function
* #param type $points array of points
* #param type $basisPolynomial Each basis polynomial will be stored in the array of values so that it can be appended to the final function
* #param type $allXValues all x values for the point
* #param type $i current index of the basis polynomial
*/
function generateBasisPolynomial(&$basisPolynomial, $allXValues, $i) {
$basisPolynomial[$i] = "(";
$divisor = "(";
for ($j = 0; $j < count($allXValues); $j++) {
if ($j == $i) {
continue;
}
$basisPolynomial[$i] .= "(x-$allXValues[$j])*";
$divisor .="($allXValues[$i]-$allXValues[$j])*";
}
//multiply the divisor by 1, because the previous loop has * at the end of the equation
$divisor .="1)";
$basisPolynomial[$i] .="1)/$divisor";
}
/**
* Function that generates the Lagrange interpolation from the list of points
* #param type $points
* #return string
*/
function generateLagrangeInterpolation($points) {
$numberOfPoints = count($points);
if ($numberOfPoints < 2) {
return "NaN";
} else {
//source http://www2.lawrence.edu/fast/GREGGJ/Math420/Section_3_1.pdf
//for each point, construct the basis polynomial
//for a sequence of x values, we will have n basis polynomials,
//Example:
//if we, for example have a sequence of four points, with their sequence of x values being {x0,x1,x2,x3}
//then we construct the basis polynomial for x0 by doing the following calculation:
//F(x) = ((x-x1)*(x-x2)*(x-x3))/((x0-x1)*(x0-x2)*(x0-x3)) -> where x is an unknown variable.
$basisPolynomial = array();
//get all x values from the array of points so that we can access them by index
$allXValues = array_keys($points);
$allYValues = array_values($points);
//Because the Y values are percentages, we want to divide them by 100.
$allYValues = array_map(function($val) {
return $val / 100;
}, $allYValues);
$returnFunction = "";
for ($i = 0; $i < $numberOfPoints; $i++) {
generateBasisPolynomial($basisPolynomial, $allXValues, $i);
//multiply this basis polynomial by y value
$returnFunction .="$allYValues[$i]*$basisPolynomial[$i]+";
}
//Append 0 to the end of the function because the above loop returns a function with a +
//at the end so we want to make it right
$returnFunction .="0";
echo $returnFunction;
}
}
//$points = array("4.1168" => "0.213631", "4.19236" => "0.214232", "4.20967" => "0.21441", "4.46908" => "0.218788");
$points = array("0.1" => "5", "0.3" => "10", "0.5" => "30", "0.6" => "60", "0.8" => "70");
generateLagrangeInterpolation($points);
What I am getting as a result is the following function:
0.05*((x-0.3)*(x-0.5)*(x-0.6)*(x-0.8)*1)/((0.1-0.3)*(0.1-0.5)*(0.1-0.6)*(0.1-0.8)*1)+0.1*((x-0.1)*(x-0.5)*(x-0.6)*(x-0.8)*1)/((0.3-0.1)*(0.3-0.5)*(0.3-0.6)*(0.3-0.8)*1)+0.3*((x-0.1)*(x-0.3)*(x-0.6)*(x-0.8)*1)/((0.5-0.1)*(0.5-0.3)*(0.5-0.6)*(0.5-0.8)*1)+0.6*((x-0.1)*(x-0.3)*(x-0.5)*(x-0.8)*1)/((0.6-0.1)*(0.6-0.3)*(0.6-0.5)*(0.6-0.8)*1)+0.7*((x-0.1)*(x-0.3)*(x-0.5)*(x-0.6)*1)/((0.8-0.1)*(0.8-0.3)*(0.8-0.5)*(0.8-0.6)*1)+0
I don't care that the expression is simplified and calculated fully (however if you have any advice or code that could do that for me it would be a huge plus).
If I look at the simplified expression it looks something like this:
(47500*x^4-79300*x^3+42245*x^2-8699*x+480)/(-840)
However if I try to paste that function into http://fooplot.com -> I get that the graph is passing through the points defined as the input parameters, however, I'm not sure if the graph for the other points is correct as it looks like it's Y values go into minus values when X <=0 or x>=1.
Do you advise that I use the different function or the existing error in the interpolation can be reduced if I had more input points? I have to be honest that I am a poor mathematician so any real example of a more accurate method or example in the code would be greatly appreciated.
Thanks
Here's what you can try:
function basisPolynomial($points, $j, $x) {
$xj = $points[$j][0]; //Assume a point is an array of 2 numbers
$partialProduct = 1;
//Product loop
for ($m = 0;$i < count($points);$m++) {
if ($m === $j) { continue; }
$partialProduct *= ($x - $points[$m][0])/($xj-$points[$m][0]);
}
return $partialProduct;
}
function lagrangePolynomial($points,$x) {
$partialSum = 0;
for ($j = 0;$j < count($points);$j++) {
$partialSum += $points[$j][1]*basisPolynomial($points,$j,$x);
}
return $partialSum;
}
Now if you need to plot it you can generate a list of points that can be used in a plotting function e.g.
$points = <my points>;
$plotPoints = [];
for ($i = 0;$i < 10;$i+= 0.1) { //for example
$plotPoints[] = [ $i, lagrangePolynomial($points,$i) ];
}
If you want to just use the to directly plot you need to use a plotting tool like gnuplot to define the functions and have it determine how to plot them.
Update: http://www.physics.brocku.ca/Courses/5P10/lectures/lecture_10_handout.pdf seems to have a gnuplot example of exactly what you need but it feels like cheating of sorts
I'm not much familiarized with the object oriented implementation of php, but in java I do this little baby ;)
import java.util.*;
import java.util.Arrays;
import java.util.List;
public class Run{
public static void main(String[] args){
int[] parameters = new int[]{-1, 2, 4, 3};
Binom b = new Binom(1, 1, parameters[1], 0);
Polinom p = new Polinom(b);
for(int i = 2; i < parameters.length; i++)
p.joinBinom(new Binom(1, 1, -1 * parameters[i], 0));
System.out.println(p.toString() + " / (" + getDenominator(parameters) + ")");
}
public static int getDenominator(int[] params){
int result = 1;
for(int i = 1; i < params.length; i++)
result *= params[0] - params[i];
return result;
}
}
class Monomial{
private int constant = 1;
private int pow = 0;
public int getConstant(){
return this.constant;
}
public void sumConstant(int value){
this.constant += value;
}
public boolean hasVariable(){
return this.pow > 0;
}
public int getPow(){
return this.pow;
}
public Monomial(int constant, int pow){
this.constant = constant;
this.pow = pow;
}
public ArrayList<Monomial> multiply(Binom a){
Monomial first = new Monomial(this.constant * a.getFirst().getConstant(), this.pow + a.getFirst().getPow());
Monomial second = new Monomial(this.constant * a.getSecond().getConstant(), this.pow + a.getSecond().getPow());
System.out.print("\t" + this.toString() + "\t* (" + a.getFirst().toString() + " " + a.getSecond().toString() + ")");
System.out.print("\t= " + first.toString() + "\t");
System.out.println(second.toString());
return (new Binom(first, second)).toList();
}
public String toString(){
String result = "";
if(this.constant == 1){
if(!this.hasVariable())
result += this.constant;
}
else
result += this.constant;
if(this.hasVariable()){
result += "X";
if(this.pow > 1)
result += "^" + this.pow;
}
return result;
}
}
class Binom{
private Monomial first;
private Monomial second;
public Monomial getFirst(){
return this.first;
}
public Monomial getSecond(){
return this.second;
}
public Binom(int constant1, int pow1, int constant2, int pow2){
this.first = new Monomial(constant1, pow1);
this.second = new Monomial(constant2, pow2);
}
public Binom(Monomial a, Monomial b){
this.first = a;
this.second = b;
}
public ArrayList<Monomial> toList(){
ArrayList<Monomial> result = new ArrayList<>();
result.add(this.first);
result.add(this.second);
return result;
}
}
class Polinom{
private ArrayList<Monomial> terms = new ArrayList<>();
public Polinom(Binom b){
this.terms.add(b.getFirst());
this.terms.add(b.getSecond());
}
private void compact(){
for(int i = 0; i < this.terms.size(); i++){
Monomial term = this.terms.get(i);
for(int j = i + 1; j < this.terms.size(); j++){
Monomial test = this.terms.get(j);
if(term.getPow() == test.getPow()){
term.sumConstant(test.getConstant());
this.terms.remove(test);
j--;
}
}
}
}
public void joinBinom(Binom b){
ArrayList<Monomial> result = new ArrayList<>();
for(Monomial t : this.terms){
result.addAll(t.multiply(b));
}
this.terms = result;
this.compact();
}
public String toString(){
String result = "";
for(Monomial t : this.terms)
result += (t.getConstant() < 0 ? " " : " +") + t.toString();
return "(" + result + ")";
}
}
which return:
X * (X -4) = X^2 -4X
2 * (X -4) = 2X -8
X^2 * (X -3) = X^3 -3X^2
-2X * (X -3) = -2X^2 6X
-8 * (X -3) = -8X 24
( +X^3 -5X^2 -2X +24) / (-60)
Looks like the current algorithm for Lagrange interpolation method provides the correct results. To correct the errors in calculation, more base points can be provided. Also, to multiply the unknown variables in mathematic function, a function example was left in one of the answers.
Thanks everyone.
I set out to make a small project around a bounch of classes that return generators (php 5.5).
The main motivation for the small project was to expand on my TDD journey, fiddle with generators and have a package I could throw on packagist for later use.
The current state of the whole "project" can be found at Github
All tests are green, the methods does what I want. Now I want to refactor as I there is lots of dublication.
/**
* Returns a Generator with a even range.
*
* getEven(10); // 10,12,14,16,18,20,22 ...
* getEven(null, 10); // 10,8,6,4,2,0,-2,-4 ...
* getEven(10, null, 2); // 10,6,2, -2 ...
* getEven(10,20); // 10,12,14,16,18,20
* getEven(20,10); // 20,18,16,14,12,10
* getEven(10,20,2); // 10,14,18
*
* #param int|null $start
* #param int|null $end
* #param int $step
* #throws InvalidArgumentException|LogicException
* #return Generator
*/
public function getEven( $start = null, $end = null, $step = 1 )
{
// Throws LogicException
$this->throwExceptionIfAllNulls( [$start, $end] );
$this->throwExceptionIfInvalidStep($step);
// Throws InvalidArgumentException
$this->throwExceptionIfNotNullOrInt( [$start, $end] );
// infinite increase range
if(is_int($start) && is_null($end))
{
// throw LogicException
$this->throwExceptionIfOdd($start);
$Generator = function() use ($start, $step)
{
for($i = $start; true; $i += $step * 2)
{
yield $i;
}
};
}
// infinite decrease range
elseif(is_int($end) && is_null($start))
{
// throws LogicException
$this->throwExceptionIfUneven($end);
$Generator = function() use ($end, $step)
{
for($i = $end; true; $i -= $step * 2)
{
yield $i;
}
};
}
// predetermined range
else
{
// throws LogicException
$this->throwExceptionIfUneven($start);
$this->throwExceptionIfUneven($end);
// decrease
if($start >= $end)
{
$Generator = function() use ($start, $end, $step)
{
for($i = $start; $i >= $end; $i -= $step * 2)
{
yield $i;
}
};
}
// increase
else
{
$Generator = function() use ($start, $end, $step)
{
for($i = $start; $i <= $end; $i += $step * 2)
{
yield $i;
}
};
}
}
return $Generator();
}
The class also has a method named getOdd (and yes it looks alot like it ;) )
The main dublication is the closures $Generator = function() ... and the difference is mostly operators such as + - * / and arguments in the for loop. This is mainly the same in the rest of th class.
I read Dynamic Comparison Operators in PHP and come to the conclusion that there is no native method like compare(...)
Should I make a private/protected method for comparison. If so should I make a new class/function for this? I do not think it belongs in the current class.
Is it something else I am missing, I am unsure on how to DRY this up, in a proper way?
Btw. iknow a getEven, getOdd is kinda silly when i got a getRange With step function, but it is a more general refactoring / pattern question.
Update
#github the getEven and getOdd are now removed...
The code below has not been tested or verified to work, but I have faith in it and at least it shows one possible way of removing the multiple generator functions.
As you state yourself, the duplication you are trying to remove is mainly in the generator function. If you look into this you can see that every generator function you have can be written as this:
function createGenerator($index, $limit, $step) {
return function() use($index, $limit, $step) {
$incrementing = $step > 0;
for ($i = $index; true; $i += 2 * $step) {
if (($incrementing && $i <= $limit) || (!$incrementing && $i >= $limit)) {
yield $i;
}else {
break;
}
}
};
}
In order to utilize this you need to do some magic with the input arguments and it helps (at least makes it pretty) to define some constants. PHP allready got a PHP_INT_MAX constant holding the greatest value possible for an integer, however it does not got a PHP_INT_MIN. So I would define that as a constant of its own.
define('PHP_INT_MIN', ~PHP_INT_MAX);
Now lets take a look at the four cases in your function.
1) Infinite increase range
Infinte is a rather bold claim here, if we change it to "greatest value possible given the constraints of an int" we get a finite range from $index to PHP_INT_MAX, hence by setting $limit = PHP_INT_MAX; the above mentioned generator function will still be the same.
//$limit = PHP_INT_MAX;
createGenerator($index, PHP_INT_MAX, $step);
2) Infinite decrease range
The same argument as above can again be used here, but with a negativ $step and swapping $index and $limit;
//$index = $limit;
//$limit = PHP_INT_MIN;
//$step *= -1;
createGenerator($limit, PHP_INT_MIN, -1 * $step);
3) Predetermined decreasing range
Swap and negate once again.
//$temp = $index;
//$index = $limit;
//$limit = $temp;
//$step *= -1;
createGenerator($limit, $index, -1 * $step);
4) Predetermined increasing range
Well this is just the default case, where all arguments are given. And nothing needs to change.
createGenerator($index, $limit, $step);
The revised code
public function getEven($index = null, $limit = null, $step = 1) {
// Throws LogicException
$this->throwExceptionIfAllNulls([$index, $limit]);
$this->throwExceptionIfInvalidStep($step);
// Throws InvalidArgumentException
$this->throwExceptionIfNotNullOrInt([$index, $limit]);
//Generator function
function createGenerator($index, $limit, $step) {
return function() use($index, $limit, $step) {
$incrementing = $step > 0;
for ($i = $index; true; $i += 2 * $step) {
if (($incrementing && $i <= $limit) || (!$incrementing && $i >= $limit)) {
yield $i;
}else {
break;
}
}
};
}
// infinite increase range
if (is_int($index) && is_null($limit)) {
// throw LogicException
$this->throwExceptionIfodd($index);
return createGenerator($index, PHP_INT_MAX, $step);
}
// infinite decrease range
elseif (is_int($limit) && is_null($index)) {
// throws LogicException
$this->throwExceptionIfodd($limit);
return createGenerator($limit, PHP_INT_MIN, -1*$step);
}
// predetermined range
else {
// throws LogicException
$this->throwExceptionIfodd($index);
$this->throwExceptionIfodd($limit);
// decrease
if ($index >= $limit) {
return createGenerator($limit, $index, -1 * $step);
}
return createGenerator($index, $limit, $step);
}
}
Earlier I wrote a code in Matlab for this sort of lottery function, just to test if it was possible. However, I actually needed it in PHP so I've just rewritten the code and it does seem to work, but as it involves a lot of looping I want to make sure I'm doing it as efficiently as possible.
What the code does:
You can call the function $lotto -> type($users,$difficulty) and it will return two numbers. Here's the explanation, $users is the number of users registered on the website, i.e the people who will potentially buy a ticket. $difficulty is a number between 1 and 10, where 5 is normal, 1 is easy and 10 is hard. Difficulty here means how hard it is to match all numbers on a lottery ticket.
So what are the numbers that the function returns? That would be $n and $r. $n is the amount of numbers there will be on the lottery ticket, and $r is the amount of numbers you can choose from the lottery ticket. For example, in the UK a national lottery ticket has 49 numbers if which you choose 6. I.e $n = 49 and $r = 6.
How does the function calculate these two numbers? In the UK national lottery there are 13,983,816 different possible ticket combinations. If I were to run $lotto -> type(13983816,1) it would return array(49,6). Basically it tried to make it so there are as many combinations of tickets as there are registered users.
tl;dr, here's the code:
<?php
class lotto {
public function type($users,$difficulty){
$current_r = $r = 2;
$current_n = 0;
$difficulty = ($difficulty + 5) / 10; // sliding scale from 1 - 10
$last_tickets_sold = 200; // tickets sold in last lotto
$last_users = 100; // how many users there were in the last lotto
$last_factor = $last_tickets_sold / $last_users; // tickets per user
$factor = $last_factor * $difficulty;
$users *= $factor;
while($r <= 10){
$u = 0;
$n = $r;
while($u < $users && $n < 50){
$u = $this -> nCr(++$n,$r);
}
if($r == 2){
$current_n = $n;
} elseif(abs($this -> nCr($n,$r) - $users) < abs($this -> nCr($current_n,$current_r) - $users)){
// this is a better match so update current n and r
$current_r = $r;
$current_n = $n;
}
$r++;
}
return array($current_n,$current_r);
}
private function nCr($n,$r){
return $this -> factorial($n) / (
$this -> factorial($r) * $this -> factorial($n - $r)
);
}
private function factorial($x){
$f = $x;
while(--$x){
$f *= $x;
}
return $f;
}
}
$lotto = new lotto;
print_r($lotto -> type(1000,5));
?>
I did a quick scan and spotted a few places that can be further optimized.
Combination
Your algorithm is a brute force one and can be further optimized
private function nCr($n,$r){
return $this -> factorial($n) / (
$this->factorial($r) * $this->factorial($n - $r)
);
}
to
function nCr($n,$r) {
$top = 1;
$sub = 1;
for($i = $r+1; $i <= $n; $i++)
$top *= $i;
$n -= $r;
for($i = 2; $i <= $n; $i++)
$sub *= $i;
return $top / $sub;
}
Too Much Combination Calculation
Calculate combination is expensive.
$u = 0;
$n = $r;
while($u < $users && $n < 50){
$u = $this -> nCr(++$n,$r);
}
to
$n = $r + 1;
$u = nCr($n, $r);
while ($u < $users && $n < 50) {
$n++;
$u *= $n;
$u /= ($n - $r);
}
An immediate observation is that you have the possibility of a divide by 0 error
$last_factor = $last_tickets_sold / $last_users;
Could be solved by putting a simple if statement around it
$last_factor = ($last_users == 0) ? 0 : $last_tickets_sold / $last_users;
Regardless detailed examination of your code, are you sure that your loops does not need continue or break?
The range of factorial() in your algo is [0,50], so why not just precompute this statically?
private static $factorial=array(1);
private static genFactorial($max) {
if( count( self::$factorial ) > $max ) return;
foreach ( range(count(self::$factorial), $max) as $n ) {
self::$factorial[$n] = $i*self::$factorial[$n-1];
}
}
Now add a self::genFactorial(50); to __construct() or to type() and replace references to $this -> factorial($n) by self::$factorial[$n].
This is just a quick code dump; not even compile checked so forgive any typos, etc. but what this does is to replace a function call (which includes a while loop) by an array element fetch.
I found Marcel Jackwerth's response to How to code a URL shortener? to be a good answer for the problem, however my question is how it'll look in PHP? Here's Marcel's answer:
You need a Bijective Function f (there must be no x1 != x2, that will make f(x1) = f(x2); and for every y you will find a x so that f(x)=y). This is necessary so that you can find a inverse function g('abc') = 123 for your f(123)='abc' function.
I would continue your "convert number to string" approach (however you will realize that your proposed algorithm fails if your id is a prime and greater than 52).
How to convert the id to a shortened url:
Think of an alphabet you want to use. In your case that's [a-zA-Z0-9]. It contains 62 letters.
Take the auto-generated unique numerical key (auto-incremented id): for example 125 (a decimal number)
Now you have to convert the 125 (base 10) to X (base 62). This will then be {2}{1} (2×62+1=125).
Now map the symbols {2} and {1} to your alphabet. Say {0} = 'a', {25} = 'z' and so on. We will have {2} = 'c' and {1} = 'b'. So '/cb' will be your shortened url.
How to resolve a shortened url abc to the initial id:
If you want to do this in reverse, it's not quite diffcult. 'e9a' will be resolved to "4th,61st,0th letter in alphabet" = {4}{61}{0}, which is 4×62×62 + 61×62 + 0 = 19158. You will then just have to find your database-record with id 19158.
function convert($src, $srcAlphabet, $dstAlphabet) {
$srcBase = strlen($srcAlphabet);
$dstBase = strlen($dstAlphabet);
$wet = $src;
$val = 0;
$mlt = 1;
while ($l = strlen($wet)) {
$digit = $wet[$l - 1];
$val += $mlt * strpos($srcAlphabet, $digit);
$wet = substr($wet, 0, $l - 1);
$mlt *= $srcBase;
}
$wet = $val;
$dst = '';
while ($wet >= $dstBase) {
$digitVal = $wet % $dstBase;
$digit = $dstAlphabet[$digitVal];
$dst = $digit . $dst;
$wet /= $dstBase;
}
$digit = $dstAlphabet[$wet];
$dst = $digit . $dst;
return $dst;
}
// prints cb
print convert('125', '0123456789', 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789');
// prints 19158
print convert('e9a', 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789', '0123456789');
I like this PHP function which allows you to customise the alphabet (and remove confusing 0/O's etc.)
// From http://snipplr.com/view/22246/base62-encode--decode/
private function base_encode($val, $base=62, $chars='0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ') {
$str = '';
do {
$i = fmod($val, $base);
$str = $chars[$i] . $str;
$val = ($val - $i) / $base;
} while($val > 0);
return $str;
}
Follow the URL to find the reverse 'decode' function too.
The main problem with Marcel's solution is that it uses a zero digit as a placeholder. By converting between bases, inevitably the numeral chosen to represent 0 can't appear at the front of the converted number.
For example, if you convert base 10 integers to base 4 using "ABCD" using the provided mechanism, there is no way to obtain output that starts with the letter "A", since that represents a zero in the new base and won't prefix the number. You might expect 5 to be "AA", but instead, it is "BA". There is no way to coerce that algorithm into producing "AA", because it would be like writing "00" in decimal, which has the same value as "0".
Here's an alternate solution in PHP that uses the entire gamut:
function encode($n, $alphabet = 'ABCD') {
$output = '';
if($n == 0) {
$output = $alphabet[0];
}
else {
$digits = floor(log($n, strlen($alphabet))) + 1;
for($z = 0; $z < $digits; $z++) {
$digit = $n % 4;
$output = $alphabet[$digit] . $output;
$n = floor($n / 4) - 1;
}
}
return $output;
}
function decode($code, $alphabet = 'ABCD') {
$n = 0;
$code = str_split($code);
$unit = 1;
while($letter = array_pop($code)) {
$n += (strpos($alphabet, $letter) + 1) * $unit;
$unit = $unit * strlen($alphabet);
}
return $n - 1;
}
echo encode(25); // should output "ABB"
echo decode('ABB'); // should output 25
Change/pass the second parameter to a list of characters to use instead of the short 4-character dictionary of "ABCD".
all you need to do is convert between different base systems base 10 to base 62
https://github.com/infinitas/infinitas/blob/dev/core/short_urls/models/short_url.php