I saw this question,and pop up this idea.
Is there an efficient way to do this in PHP?
EDIT
Best with a demo?
You could use the pear package Math_Matrix for this.
This package claims to be able to do what you are looking for.
There is this open source PHP Library that is able to invert a Matrix.
All you need to do is
<?php
include_once ("Matrix.class.php");
$matrixA = new Matrix(array(array(0, 1), array(2, 6)));
echo $matrixA->getInverse()->getMathMl();
?>
Here tested code https://gist.github.com/unix1/7510208
Only identity_matrix() and invert() functions are enough
Yes there are several ways to accomplish this in php. There are a handful of available libraries. Alternatively, you could maintain your own class and customize as needed. Here is an excerpt from our inhouse library that is based on the mathematical method described in the link. There is a demonstration at the end of the class for further reference.
https://www.intmath.com/matrices-determinants/inverse-matrix-gauss-jordan-elimination.php
class MatrixLibrary
{
//Gauss-Jordan elimination method for matrix inverse
public function inverseMatrix(array $matrix)
{
//TODO $matrix validation
$matrixCount = count($matrix);
$identityMatrix = $this->identityMatrix($matrixCount);
$augmentedMatrix = $this->appendIdentityMatrixToMatrix($matrix, $identityMatrix);
$inverseMatrixWithIdentity = $this->createInverseMatrix($augmentedMatrix);
$inverseMatrix = $this->removeIdentityMatrix($inverseMatrixWithIdentity);
return $inverseMatrix;
}
private function createInverseMatrix(array $matrix)
{
$numberOfRows = count($matrix);
for($i=0; $i<$numberOfRows; $i++)
{
$matrix = $this->oneOperation($matrix, $i, $i);
for($j=0; $j<$numberOfRows; $j++)
{
if($i !== $j)
{
$matrix = $this->zeroOperation($matrix, $j, $i, $i);
}
}
}
$inverseMatrixWithIdentity = $matrix;
return $inverseMatrixWithIdentity;
}
private function oneOperation(array $matrix, $rowPosition, $zeroPosition)
{
if($matrix[$rowPosition][$zeroPosition] !== 1)
{
$numberOfCols = count($matrix[$rowPosition]);
if($matrix[$rowPosition][$zeroPosition] === 0)
{
$divisor = 0.0000000001;
$matrix[$rowPosition][$zeroPosition] = 0.0000000001;
}
else
{
$divisor = $matrix[$rowPosition][$zeroPosition];
}
for($i=0; $i<$numberOfCols; $i++)
{
$matrix[$rowPosition][$i] = $matrix[$rowPosition][$i] / $divisor;
}
}
return $matrix;
}
private function zeroOperation(array $matrix, $rowPosition, $zeroPosition, $subjectRow)
{
$numberOfCols = count($matrix[$rowPosition]);
if($matrix[$rowPosition][$zeroPosition] !== 0)
{
$numberToSubtract = $matrix[$rowPosition][$zeroPosition];
for($i=0; $i<$numberOfCols; $i++)
{
$matrix[$rowPosition][$i] = $matrix[$rowPosition][$i] - $numberToSubtract * $matrix[$subjectRow][$i];
}
}
return $matrix;
}
private function removeIdentityMatrix(array $matrix)
{
$inverseMatrix = array();
$matrixCount = count($matrix);
for($i=0; $i<$matrixCount; $i++)
{
$inverseMatrix[$i] = array_slice($matrix[$i], $matrixCount);
}
return $inverseMatrix;
}
private function appendIdentityMatrixToMatrix(array $matrix, array $identityMatrix)
{
//TODO $matrix & $identityMatrix compliance validation (same number of rows/columns, etc)
$augmentedMatrix = array();
for($i=0; $i<count($matrix); $i++)
{
$augmentedMatrix[$i] = array_merge($matrix[$i], $identityMatrix[$i]);
}
return $augmentedMatrix;
}
public function identityMatrix(int $size)
{
//TODO validate $size
$identityMatrix = array();
for($i=0; $i<$size; $i++)
{
for($j=0; $j<$size; $j++)
{
if($i == $j)
{
$identityMatrix[$i][$j] = 1;
}
else
{
$identityMatrix[$i][$j] = 0;
}
}
}
return $identityMatrix;
}
}
$matrix = array(
array(11, 3, 12),
array(8, 7, 10),
array(13, 14, 15),
);
$matrixLibrary = new MatrixLibrary();
$inverseMatrix = $matrixLibrary->inverseMatrix($matrix);
print_r($inverseMatrix);
/*
Array
(
[0] => Array
(
[0] => 0.33980582524272
[1] => -1.1941747572816
[2] => 0.52427184466019
)
[1] => Array
(
[0] => -0.097087378640777
[1] => -0.087378640776699
[2] => 0.13592233009709
)
[2] => Array
(
[0] => -0.20388349514563
[1] => 1.1165048543689
[2] => -0.51456310679612
)
)
*/
/**
* matrix_inverse
*
* Matrix Inverse
* Guass-Jordan Elimination Method
* Reduced Row Eshelon Form (RREF)
*
* In linear algebra an n-by-n (square) matrix A is called invertible (some
* authors use nonsingular or nondegenerate) if there exists an n-by-n matrix B
* such that AB = BA = In where In denotes the n-by-n identity matrix and the
* multiplication used is ordinary matrix multiplication. If this is the case,
* then the matrix B is uniquely determined by A and is called the inverse of A,
* denoted by A-1. It follows from the theory of matrices that if for finite
* square matrices A and B, then also non-square matrices (m-by-n matrices for
* which m ? n) do not have an inverse. However, in some cases such a matrix may
* have a left inverse or right inverse. If A is m-by-n and the rank of A is
* equal to n, then A has a left inverse: an n-by-m matrix B such that BA = I.
* If A has rank m, then it has a right inverse: an n-by-m matrix B such that
* AB = I.
*
* A square matrix that is not invertible is called singular or degenerate. A
* square matrix is singular if and only if its determinant is 0. Singular
* matrices are rare in the sense that if you pick a random square matrix over
* a continuous uniform distribution on its entries, it will almost surely not
* be singular.
*
* While the most common case is that of matrices over the real or complex
* numbers, all these definitions can be given for matrices over any commutative
* ring. However, in this case the condition for a square matrix to be
* invertible is that its determinant is invertible in the ring, which in
* general is a much stricter requirement than being nonzero. The conditions for
* existence of left-inverse resp. right-inverse are more complicated since a
* notion of rank does not exist over rings.
*/
public function matrix_inverse($m1)
{
$rows = $this->rows($m1);
$cols = $this->columns($m1);
if ($rows != $cols)
{
die("Matrim1 is not square. Can not be inverted.");
}
$m2 = $this->eye($rows);
for ($j = 0; $j < $cols; $j++)
{
$factor = $m1[$j][$j];
if ($this->debug)
{
fms_writeln('Divide Row [' . $j . '] by ' . $m1[$j][$j] . ' (to
give us a "1" in the desired position):');
}
$m1 = $this->rref_div($m1, $j, $factor);
$m2 = $this->rref_div($m2, $j, $factor);
if ($this->debug)
{
$this->disp2($m1, $m2);
}
for ($i = 0; $i < $rows; $i++)
{
if ($i != $j)
{
$factor = $m1[$i][$j];
if ($this->debug)
{
$this->writeln('Row[' . $i . '] - ' . number_format($factor, 4) . ' ×
Row[' . $j . '] (to give us 0 in the desired position):');
}
$m1 = $this->rref_sub($m1, $i, $factor, $j);
$m2 = $this->rref_sub($m2, $i, $factor, $j);
if ($this->debug)
{
$this->disp2($m1, $m2);
}
}
}
}
return $m2;
}
Related
Let's say we have the following data in an array:
$data1 = [3,5,7,6,8,9,13,14,17,15,16,16,16,18,22,20,21,20];
$data2 = [23,18,17,17,16,15,16,14,15,10,11,7,4,5];
As with $data1 we can say the data is increasing while in $data2 it is decreasing.
Using PHP, how do you know the data is increasing or decreasing, and is there a way on how to measure
know the rate of increasing as well as decreasing i.e in terms of percentage.
Edit
From the comments I received I got an idea and here is what I have tried.
What I want to achieve;
I want to know if the trend of the data coming in is upwards or downwards.
Want also to know the rate at which the data is rising or droping. For example $data1 = [1,3,5]; is not the same as $data2 = [1, 20, 55];. You can see $data1 rate of increase is not the same as $data2.
function increaseOrDecrease($streams = []) : array
{
$streams = [3,5,7,6,8,9,13,14,17,15,16,16,16,18,22,20,21,20]; // For the increasing
//$streams = [23,18,17,17,16,15,16,14,15,10,11,7,4,5]; // For the decreasing
$first = 0;
$diff = [];
foreach ($streams as $key => $number) {
if ($key != 0) {
$diff[] = $number - $first;
}
$first = $number;
}
$avgdifference = array_sum($diff)/count($diff); //Get the average
$side = $avgdifference > 0 ? 'UP' : 'DOWN';
$avgsum = array_sum($streams)/count($streams);
$percentage = abs($avgdifference)/$avgsum * 100;
if ($side == 'UP') {
$data = [
'up' => true,
'percent' => $percentage,
];
}else {
$data = [
'up' => false,
'percent' => $percentage,
];
}
return $data;
}
I would like some help to refactor this code or the best approach to solve the issue.
There are several ways to analyze data and extract a trend. The most classical method is called
least squares. It's a way of fitting a line
through the data. The method computes the slope and the intercept of the line. The trend is just the slope.
The formulas are given here.
A PHP implementation is the following:
function linearRegression($x, $y)
{
$x_sum = array_sum($x);
$y_sum = array_sum($y);
$xy_sum = 0;
$x2_sum = 0;
$n = count($x);
for($i=0;$i<$n;$i++)
{
$xy_sum += $x[$i] * $y[$i];
$x2_sum += $x[$i] * $x[$i];
}
$beta = ($n * $xy_sum - $x_sum * $y_sum) / ($n * $x2_sum - $x_sum * $x_sum);
$alpha = $y_sum / $n - $beta * $x_sum / $n;
return ['alpha' => $alpha, 'beta' => $beta];
}
function getTrend($data)
{
$x = range(1, count($data)); // [1, 2, 3, ...]
$fit = linearRegression($x, $data);
return $fit['beta']; // slope of fitted line
}
Examples:
echo getTrend([1, 2, 3]); // 1
echo getTrend([1, 0, -1]); // -1
echo getTrend([3,5,7,6,8,9,13,14,17,15,16,16,16,18,22,20,21,20]); // 1.065
echo getTrend([23,18,17,17,16,15,16,14,15,10,11,7,4,5]); // -1.213
You are asking for a type of data structure that can represent ascending as well as descending data. PHP got SplMinHeap and SplMaxHeap for this purpose. These built in classes make life easer when dealing with ascending or descending datasets.
A quick example ...
<?php
declare(strict_types=1);
namespace Marcel;
use SplMinHeap;
$numbers = [128, 32, 64, 8, 256];
$heap = new SplMinHeap();
foreach ($numbers as $number) {
$heap->insert($number);
}
$heap->rewind();
while($heap->valid()) {
// 8, 32, 64, 128, 256
echo $heap->current() . PHP_EOL;
$heap->next();
}
The SplMinHeap class keeps the minimum automatically on the top. So just use heaps instead of arrays that have no structure. Same goes for SplMaxHeap that keeps the highest value on the top.
Finding the differences
If you want to iterate all data and finding the differences between one to the next, you just have to iterate the heap. It 's ordered anyway.
$heap->rewind();
$smallest = $heap->current();
while($heap->valid()) {
// 8, 32, 64, 128, 256
$current = $heap->current();
echo $current . PHP_EOL;
// 0 (8 - 8), 24 (32 - 8), 32 (64 - 32), 64 (128 - 64), 128 (256 - 128)
echo "difference to the value before: " . ($current - $smallest) . PHP_EOL;
$smallest = $current;
$heap->next();
}
I would do simple things like this
$data1 = [3,5,7,6,8,9,13,14,17,15,16,16,16,18,22,20,21,20];
$data2 = [23,18,17,17,16,15,16,14,15,10,11,7,4,5];
getTrend($data1) //Returns up
getTrend($data2) // Returns down
function getTrend($arr)
{
$up = 0;
$down = 0;
$prev = "";
foreach($arr as $val)
{
if($prev != "" && $val > $prev)
{
$up = $val-$prev;
}
else if($prev != "" && $val < $prev)
{
$down = $prev-$val ;
}
$prev = $val);
}
if($up > $down)
{
return "up";
}
else if($down > $up)
{
return "down";
}
else {
return "flat";
}
}
I have a number of participants and a number of groups, and I have to organize the participants into groups.
Example:
10/3 = 3, 3 and 4.
10/9 = 2,2,2 and 4.
23/3 = 6,6,6 and 5.
I have tried with array_chunk using the size paramether as a rounded result of participants/groups but it Did not work well.
Edit with my problem solved.
$groups = $this->request->data['phases_limit'];
$classified_lmt = $this->request->data['classified_limit'];
$participants = count($game->user_has_game);
$participants_lmt = floor($participants / $groups);
$remainders = $participants % $groups;
if ($groups > $participants) {
throw new \Exception("Há mais grupos que participantes");
}
for ($i=0; $i < $groups; $i++) {
$p = $this->Phase->newEntity();
$p->name = 'Grupo #' . $game->id;
$p->game_id = $game->id;
$p->classified_limit = $classified_lmt;
$this->Phase->save($p);
// add the number of participants per group
for ($j=0; $j < $participants_lmt; $j++) {
$user_has_game = array_pop($game->user_has_game);
$g = $this->Phase->GroupUserHasGame->newEntity();
$g->group_id = $p->id;
$g->user_has_game_id = $user_has_game->id;
$this->Phase->GroupUserHasGame->save($g);
}
// check if it is the last iteration
if (($groups - 1) == $i) {
// add the remainders on the last iteration
for ($k=0; $k < $remainders; $k++) {
$user_has_game = array_pop($game->user_has_game);
$g = $this->Phase->GroupUserHasGame->newEntity();
$g->group_id = $p->id;
$g->user_has_game_id = $user_has_game->id;
$this->Phase->GroupUserHasGame->save($g);
}
}
}
Have you tried the modulus operator? It gives you the remainder after dividing the numerator by the denominator.
For example, if you want to split 10 people into 3 groups:
floor(10 / 3) = 3; // people per group
10 % 3 = 1; // 1 person left over to add to an existing group.
Edit - I included the following function as part of my original answer. This doesn't work for OP, however I want to leave it here, as it may help others.
function group($total, $groups)
{
// Calculate participants per group and remainder
$group = floor($total / $groups);
$remainder = $total % $groups;
// Prepare groupings and append remaining participant to first group
$groupings = array_fill(0, $groups, $group);
$groupings[0] += $remainder;
return $groupings;
}
Not sure there are off-the-shelf libraries for that. I just implemented something similar in Java if you need some ideas:
public List<Integer> createDistribution(int population_size, int groups) {
List<Integer> lst = new LinkedList();
int total = 0;
for (double d : createDistribution(groups)) {
// this makes smaller groups first int i = new Double(population_size * d).intValue();
int i = (int)Math.round(population_size * d);
total += i;
lst.add(i);
}
// Fix rounding errors
while (total < population_size) {
int i = r.nextInt(groups);
lst.set(i, lst.get(i) + 1);
total += 1;
}
while (total > population_size) {
int i = r.nextInt(groups);
if (lst.get(i) > 0) {
lst.set(i, lst.get(i) - 1);
total -= 1;
}
}
return lst;
}
I'm working on Advent of Code as a way to practice TDD and learn PHPSpec. I'm stuck on Day 17, which is essentially the coin change puzzle.
The elves bought too much eggnog again - 150 liters this time. To fit it all into your refrigerator, you'll need to move it into smaller containers. You take an inventory of the capacities of the available containers.
For example, suppose you have containers of size 20, 15, 10, 5, and 5 liters. If you need to store 25 liters, there are four ways to do it:
15 and 10
20 and 5 (the first 5)
20 and 5 (the second 5)
15, 5, and 5
Filling all containers entirely, how many different combinations of containers can exactly fit all 150 liters of eggnog?
Here's my code. I wrote a test using the examples above. The combinations method should return 4 per the example, but it returns 3. It doesn't seem to be able to handle the fact that there's more than one container of size 5 litres.
Any suggestions please?
<?php
namespace Day17;
class Calculator
{
private $containers = [];
public function combinations($total, array $containers)
{
$combinations = $this->iterate($total, $containers);
return count($combinations);
}
/**
* http://stackoverflow.com/questions/12837431/find-combinations-sum-of-elements-in-array-whose-sum-equal-to-a-given-number
*
* #param $array
* #param array $combinations
* #param array $temp
* #return array
*/
private function iterate($sum, $array, $combinations = [], $temp = [])
{
if (count($temp) && !in_array($temp, $combinations)) {
$combinations[] = $temp;
}
$count = count($array);
for ($i = 0; $i < $count; $i++) {
$copy = $array;
$elem = array_splice($copy, $i, 1);
if (count($copy) > 0) {
$add = array_merge($temp, array($elem[0]));
sort($add);
$combinations = $this->iterate($sum, $copy, $combinations, $add);
} else {
$add = array_merge($temp, array($elem[0]));
sort($add);
if (array_sum($combinations) == $sum) {
$combinations[] = $add;
}
}
}
return array_filter($combinations, function ($combination) use ($sum) {
return array_sum($combination) == $sum;
});
}
}
Use the Array Indices of the Available Containers as the combination values.
Does PHP have existing functionality for irregular step ranges, is there a common solution to provide this functionality, or how can the following function be optimized?
The first function is the function I am concerned about. The second function is a real world use case that generates an array to populate values for a function that outputs a select dropdown for HTML.
<?php
function range_multistep($min, $max, Array $steps, $jmp = 10) {
$steps = array_unique($steps);
sort($steps, SORT_NUMERIC);
$bigstep = ($jmp > 0) ? $jmp : $jmp * -1;
$e = ($min > 0) ? floor(log($min, $bigstep)) : 0;
for (; ; $e++) {
foreach ($steps as $step) {
$jump = pow($bigstep, $e);
$num = $step * $jump;
if ($num > $max) {
break 2;
} elseif ($num >= $min) {
$arr[] = $num;
}
}
}
$arr = array_unique($arr);
sort($arr, SORT_NUMERIC);
return $arr;
}
function prices() {
$price_steps = range_multistep(50, 100000, array(5, 10, 25));
$prev_step = 0;
foreach ($price_steps as $price) {
$price_str = '$' . $prev_step . ' - $' . ($price - 1);
$price_arr[] = $price_str;
$prev_step = $price;
}
$price_arr[] = '$' . end($price_steps) . "+";
return $price_arr;
}
print_r(prices());
The result of the previous:
Array
(
[0] => $0 - $49
[1] => $50 - $99
[2] => $100 - $249
[3] => $250 - $499
[4] => $500 - $999
[5] => $1000 - $2499
[6] => $2500 - $4999
[7] => $5000 - $9999
[8] => $10000 - $24999
[9] => $25000 - $49999
[10] => $50000 - $99999
[11] => $100000+
)
Repeated addition is best replaced by multiplication, and repeated multiplication is best replaced by raising to powers -- which you've done.
I see nothing here that requires improvement assuming you don't need "bulletproof" behavior in the face of $jmp = 1 or $min >= $max badly-behaved inputs.
The $e incrementor in the for loop is more of a while(1) endless loop.
So instead misusing the incrementor in pow(), do the pow on your own by just multiplying once per iteration. Calling pow() can be pretty expensive, so doing the pow calculation your own would better distribute the multiplication onto each iteration.
Edit: The following is a variant of your function that distributes the pow() calculation over the iteration. Additionally it does more proper variable initialisation (the return value was not set for example), gives notice if $min and $max are swapped and corrects that, uses abs instead of your ternary, throws an exception if an invalid value was given for log(), renamed some variables and add $num to the return value as key first to spare the array_unique operation at the end:
/**
* #param int $min
* #param int $max
* #param array $steps
* #param int $jmp
* #return array range
*/
function range_multistep($min, $max, Array $steps, $jmp = 10) {
$range = array();
if (!$steps) return $range;
if ($min < $max) {
trigger_error(__FUNCTION__.'(): Minima and Maxima mal-aligned.', E_USER_NOTICE);
list($max, $min) = array($min, $max);
}
$steps = array_unique($steps);
sort($steps, SORT_NUMERIC);
$bigstep = abs($jmp);
if ($bigstep === 0) {
throw new InvalidArgumentException(sprintf('Value %d is invalid for jmp', $jmp));
}
$initExponent = ($min > 0) ? floor(log($min, $bigstep)) : 0;
for ($multiplier = pow($bigstep, $initExponent); ; $multiplier *= $bigstep) {
foreach ($steps as $step) {
$num = $step * $multiplier;
if ($num > $max) {
break 2;
} elseif ($num >= $min) {
$range[$num] = 1;
}
}
}
$range = array_keys($range);
sort($range, SORT_NUMERIC);
return $range;
}
In case you feel experimental, it's also possible to turn the two loops (for+foreach) into one, but the readability of the code does not benefit from it:
for(
$multiplier = pow($bigstep, $initExponent),
$step = reset($steps)
;
$num = $step * $multiplier,
$num <= $max
;
# infinite array iterator:
($step=next($steps))?:
(
$step=reset($steps)
# with reset expression:
AND $multiplier *= $bigstep
)
){
if ($num >= $min)
$range[$num] = 1;
}
I think if you take care to not re-use variables (like the function parameter) and give them better to read names, improvement comes on it's own.
The function levenshtein in PHP works on strings with maximum length 255. What are good alternatives to compute a similarity score of sentences in PHP.
Basically I have a database of sentences, and I want to find approximate duplicates.
similar_text function is not giving me expected results. What is the easiest way for me to detect similar sentences like below:
$ss="Jack is a very nice boy, isn't he?";
$pp="jack is a very nice boy is he";
$ss=strtolower($ss); // convert to lower case as we dont care about case
$pp=strtolower($pp);
$score=similar_text($ss, $pp);
echo "$score %\n"; // Outputs just 29 %
$score=levenshtein ( $ss, $pp );
echo "$score\n"; // Outputs '5', which indicates they are very similar. But, it does not work for more than 255 chars :(
The levenshtein algorithm has a time complexity of O(n*m), where n and m are the lengths of the two input strings. This is pretty expensive and computing such a distance for long strings will take a long time.
For whole sentences, you might want to use a diff algorithm instead, see for example: Highlight the difference between two strings in PHP
Having said this, PHP also provides the similar_text function which has an even worse complexity (O(max(n,m)**3)) but seems to work on longer strings.
I've found the Smith Waterman Gotoh to be the best algorithm for comparing sentences. More info in this answer. Here is the PHP code example:
class SmithWatermanGotoh
{
private $gapValue;
private $substitution;
/**
* Constructs a new Smith Waterman metric.
*
* #param gapValue
* a non-positive gap penalty
* #param substitution
* a substitution function
*/
public function __construct($gapValue=-0.5,
$substitution=null)
{
if($gapValue > 0.0) throw new Exception("gapValue must be <= 0");
//if(empty($substitution)) throw new Exception("substitution is required");
if (empty($substitution)) $this->substitution = new SmithWatermanMatchMismatch(1.0, -2.0);
else $this->substitution = $substitution;
$this->gapValue = $gapValue;
}
public function compare($a, $b)
{
if (empty($a) && empty($b)) {
return 1.0;
}
if (empty($a) || empty($b)) {
return 0.0;
}
$maxDistance = min(mb_strlen($a), mb_strlen($b))
* max($this->substitution->max(), $this->gapValue);
return $this->smithWatermanGotoh($a, $b) / $maxDistance;
}
private function smithWatermanGotoh($s, $t)
{
$v0 = [];
$v1 = [];
$t_len = mb_strlen($t);
$max = $v0[0] = max(0, $this->gapValue, $this->substitution->compare($s, 0, $t, 0));
for ($j = 1; $j < $t_len; $j++) {
$v0[$j] = max(0, $v0[$j - 1] + $this->gapValue,
$this->substitution->compare($s, 0, $t, $j));
$max = max($max, $v0[$j]);
}
// Find max
for ($i = 1; $i < mb_strlen($s); $i++) {
$v1[0] = max(0, $v0[0] + $this->gapValue, $this->substitution->compare($s, $i, $t, 0));
$max = max($max, $v1[0]);
for ($j = 1; $j < $t_len; $j++) {
$v1[$j] = max(0, $v0[$j] + $this->gapValue, $v1[$j - 1] + $this->gapValue,
$v0[$j - 1] + $this->substitution->compare($s, $i, $t, $j));
$max = max($max, $v1[$j]);
}
for ($j = 0; $j < $t_len; $j++) {
$v0[$j] = $v1[$j];
}
}
return $max;
}
}
class SmithWatermanMatchMismatch
{
private $matchValue;
private $mismatchValue;
/**
* Constructs a new match-mismatch substitution function. When two
* characters are equal a score of <code>matchValue</code> is assigned. In
* case of a mismatch a score of <code>mismatchValue</code>. The
* <code>matchValue</code> must be strictly greater then
* <code>mismatchValue</code>
*
* #param matchValue
* value when characters are equal
* #param mismatchValue
* value when characters are not equal
*/
public function __construct($matchValue, $mismatchValue) {
if($matchValue <= $mismatchValue) throw new Exception("matchValue must be > matchValue");
$this->matchValue = $matchValue;
$this->mismatchValue = $mismatchValue;
}
public function compare($a, $aIndex, $b, $bIndex) {
return ($a[$aIndex] === $b[$bIndex] ? $this->matchValue
: $this->mismatchValue);
}
public function max() {
return $this->matchValue;
}
public function min() {
return $this->mismatchValue;
}
}
$str1 = "Jack is a very nice boy, isn't he?";
$str2 = "jack is a very nice boy is he";
$o = new SmithWatermanGotoh();
echo $o->compare($str1, $str2);
You could try using similar_text.
It can get quite slow with 20,000+ characters (3-5 seconds) but your example you mention using only sentences, this will work just fine for that usage.
One thing to note is when comparing string of different sizes you will not get 100%. For example if you compare "he" with "head" you would only get a 50% match.