similar text in PHP - php

I've PHP array something like this
$array = array("foo", "bar", "hallo", "world", "fooo", "bar1", "hall_o", "wor1ld", "foo", "bard", "hzallo", "w44orld");
I want to compare each element of an array with remaining elements.
Ex: I want to compre "foo" with "bar", "hallo", "world", "fooo", "bar1", "hall_o", "wor1ld", "foo", "bard", "hzallo" and "w44orld".
Then, I want to compre "bar" with "foo", "hallo", "world", "fooo", "bar1", "hall_o", "wor1ld", "foo", "bard", "hzallo", "w44orld"
and so on till last element.
Let's consider element, which we are comparing as $var_1 and variable for remaining elements as $var_2;
If similar_text($var_1, $var_2, $percent); returns $percent value > 90% then I want to print
$var_1 and all corresponding similar text values of $var_2 for which matching percentage > 90
Currently I'm planning to use two loops to achieve this, external loop for $var_1 and internal loop for $var_2 .
Each element of the array can have value upto 5000 characters and there can be 1000 elements in a array, so my current logic is very expensive.
Any direction to handle it in better way?

In order for the indexing to work, the array $arr must have unique values:
$arr = array("foo", "bar", "hallo", "world", "fooo", "bar1", "hall_o", "wor1ld", "bard", "hzallo", "w44orld");
$dexed = array();
foreach ($arr as $key => $value){
$dexed[$key]['val'] = $value;
$dexed[$key]['key'] = $key;
}
$out = array();//output
$rev = array();//reverse lookup array
$t = 80;//threshold value
$cnt = count($dexed);
$k = 0;
for ($i=0; $i<$cnt-1; $i++){
for ($j=$i+1; $j<$cnt; $j++){
//similar_text calculates differently depending on order of arguments
similar_text($dexed[$i]['val'], $dexed[$j]['val'], $percent1);
similar_text($dexed[$j]['val'], $dexed[$i]['val'], $percent2);
if (($percent1 >= $t) || ($percent2 >= $t)){
//check if value already exists under different key
if (in_array($dexed[$i]['val'], array_keys($rev))){
if ( ! in_array($dexed[$j]['val'], array_keys($rev))){
$fkey = $rev[$dexed[$i]['val']];//key found
$next = count($out[$fkey]);
$out[$fkey][$next]['val'] = $dexed[$j]['val'];
$out[$fkey][$next]['key'] = $dexed[$j]['key'];
$rev[$dexed[$j]['val']] = $fkey;
}
} else {
$out[$k][0]['val'] = $dexed[$i]['val'];
$out[$k][0]['key'] = $dexed[$i]['key'];
$out[$k][1]['val'] = $dexed[$j]['val'];
$out[$k][1]['key'] = $dexed[$j]['key'];
$rev[$dexed[$i]['val']] = $k;
$rev[$dexed[$j]['val']] = $k;
$k++;
}
}
}
}
Once $out is generated, use the following to generate an index array:
$index = array();
foreach ($out as $key => $group){
$cnt = count($group);
foreach ($group as $key2 => $word){
for ($i=0; $i<$cnt; $i++){
if ($i != $key2){
$index[$word['key']][] = $key.':'.$i;
}
}
}
}
Access all similar words for a given key (the key value for the word in the original array $arr);
$key = 2;
foreach ($index[$key] as $value){
$parts = explode(':', $value);
echo '<p>'.$out[$parts[0]][$parts[1]]['val'].'</p>';
}

Unfortunately, what you're proposing is slow if the list gets bigger than trivial and won't work very well. Here's something that might, and will also be algorithmically efficient.
First, create an inverted index of letter bigrams (http://en.wikipedia.org/wiki/Bigram). For example (assuming case insensitivity):
"foo" => ^f,fo,oo,o$
"hzallo" => ^h,hz,za,al,ll,o$
You can use an underscore instead of ^ and $, which are pseudocharacters. I think they'll help you rank the results.
Now to find similar words you can use a typical ranking algorithm (see tf*idf and simpler token-count-based algorithms) to rank the best matches. So, given "hallo,"
QUERY(^h,ha,al,ll,lo,o$) AGAINST index_of_words
& you'll get a good match for "hzallo" because ^h,al,ll,lo,o$ all match.
You'll need something like Solr or your database's TEXT index to do this unless you want to write a simple inverted index, but it's worth it. The lookup will be orders of magnitude faster than what you're entertaining, and the results will be ranked by closeness.
Afterwards, you can use something like levenshtein, but I don't think you'll need to in many cases.

Related

Sort a flat array in recurring ascending sequences

I am trying to sort it in a repeating, sequential pattern of numerical order with the largest sets first.
Sample array:
$array = [1,1,1,2,3,2,3,4,5,4,4,4,5,1,2,2,3];
In the above array, I have the highest value of 5 which appears twice so the first two sets would 1,2,3,4,5 then it would revert to the second, highest value set etc.
Desired result:
[1,2,3,4,5,1,2,3,4,5,1,2,3,4,1,2,4]
I am pretty sure I can split the array into chunks of the integer values then cherrypick an item from each subarray sequentially until there are no remaining items, but I just feel that this is going to be poor for performance and I don't want to miss a simple trick that PHP can already handle.
Here's my attempt at a very manual loop using process, the idea is to simply sort the numbers into containers for array_unshifting. I'm sure this is terrible and I'd love someone to do this in five lines or less :)
$array = array(1,1,1,2,3,2,3,4,5,4,4,4,5,1,2,2,3);
sort($array);
// Build the container array
$numbers = array_fill_keys(array_unique($array),array());
// Assignment
foreach( $array as $number )
{
$numbers[ $number ][] = $number;
}
// Worker Loop
$output = array();
while( empty( $numbers ) === false )
{
foreach( $numbers as $outer => $inner )
{
$output[] = array_shift( $numbers[ $outer ] );
if( empty( $numbers[ $outer ] ) )
{
unset( $numbers[ $outer ] );
}
}
}
var_dump( $output );
I think I'd look at this not as a sorting problem, but alternating values from multiple lists, so rather than coming up with sets of distinct numbers I'd make sets of the same number.
Since there's no difference between one 1 and another, all you actually need is to count the number of times each appears. It turns out PHP can do this for you with aaray_count_values.
$sets = array_count_values ($input);
Then we can make sure the sets are in order by sorting by key:
ksort($sets);
Now, we iterate round our sets, counting down how many times we've output each number. Once we've "drained" a set, we remove it from the list, and once we have no sets left, we're all done:
$output = [];
while ( count($sets) > 0 ) {
foreach ( $sets as $number => $count ) {
$output[] = $number;
if ( --$sets[$number] == 0 ) {
unset($sets[$number]);
}
}
}
This algorithm could be adapted for cases where the values are actually distinct but can be put into sets, by having the value of each set be a list rather than a count. Instead of -- you'd use array_shift, and then check if the length of the set was zero.
You can use only linear logic to sort using php functions. Here is optimized way to fill data structures. It can be used for streams, generators or anything else you can iterate and compare.
$array = array(1,1,1,2,3,2,3,4,5,4,4,4,5,1,2,2,3);
sort($array);
$chunks = [];
$index = [];
foreach($array as $i){
if(!isset($index[$i])){
$index[$i]=0;
}
if(!isset($chunks[$index[$i]])){
$chunks[$index[$i]]=[$i];
} else {
$chunks[$index[$i]][] = $i;
}
$index[$i]++;
}
$result = call_user_func_array('array_merge', $chunks);
print_r($result);
<?php
$array = array(1,1,1,2,3,2,3,4,5,4,4,4,5,1,2,2,3);
sort($array);
while($array) {
$n = 0;
foreach($array as $k => $v) {
if($v>$n) {
$result[] = $n = $v;
unset($array[$k]);
}
}
}
echo implode(',', $result);
Output:
1,2,3,4,5,1,2,3,4,5,1,2,3,4,1,2,4
New, more elegant, more performant, more concise answer:
Create a sorting array where each number gets its own independent counter to increment. Then use array_multisort() to sort by this grouping array, then sort by values ascending.
Code: (Demo)
$encounters = [];
foreach ($array as $v) {
$encounters[] = $e[$v] = ($e[$v] ?? 0) + 1;
}
array_multisort($encounters, $array);
var_export($array);
Or with a functional style with no global variable declarations: (Demo)
array_multisort(
array_map(
function($v) {
static $e;
return $e[$v] = ($e[$v] ?? 0) + 1;
},
$array
),
$array
);
var_export($array);
Old answer:
My advice is functionally identical to #El''s snippet, but is implemented in a more concise/modern/attractive fashion.
After ensuring that the input array is sorted, make only one pass over the array and push each re-encountered value into its next row of values. The $counter variable indicates which row (in $grouped) the current value should be pushed into. When finished looping and grouping, $grouped will have unique values in each row. The final step is to merge/flatten the rows (preserving their order).
Code: (Demo)
$grouped = [];
$counter = [];
sort($array);
foreach ($array as $v) {
$counter[$v] = ($counter[$v] ?? -1) + 1;
$grouped[$counter[$v]][] = $v;
}
var_export(array_merge(...$grouped));

PHP get ranges of same values from array

Is there any way to get the key range of same values and make a new array?
Let's say we have an Array Like this in php :
$first_array = ['1'=>'a','2'=>'a','3'=>'a','4'=>'b','5'=>'b','6'=>'a','7'=>'a'];
How can i get this array? Is there any function for this?
$second_array = ['1-3'=>'a','4-5'=>'b','6-7'=>'a'];
Loop through it, extract the keys, generate the ranges and insert to the new array -
$first_array = ['1'=>'a','2'=>'a','3'=>'a','4'=>'b','5'=>'b'];
$flip = array();
foreach($first_array as $key => $val) {
$flip[$val][] = $key;
}
$second_array = [];
foreach($flip as $key => $value) {
$newKey = array_shift($value).' - '.end($value);
$second_array[$newKey] = $key;
}
Output
array(2) {
["1 - 3"]=>
string(1) "a"
["4 - 5"]=>
string(1) "b"
}
regarding your first question you can get range of each value using foreach() loop.
$first_array = ['1'=>'a','2'=>'a','3'=>'a','4'=>'b','5'=>'b'];
foreach($first_array as $key=>$value)
{
//do your coding here, $key is the index of the array and $value is the value at that range, you can use that index and value to perform array manipulations
}
Regarding your second question it not exactly clear what are trying to implement there. But what ever you want to do like creating a new array with modified index and other things can be done within this foreach() loop itself
I hope this helps you.
If someone is still looking for an answer, here is what I did.
Given the array
$first_array = ['0'=>'a',
'1'=>'a',
'2'=>'a',
'3'=>'a',
'4'=>'a',
'5'=>'b',
'6'=>'b',
'7'=>'a',
'8'=>'a']
I build a multidimensional array, in which each element is an array of three more elements:
[0] - The value in the first array
[1] - The key where the value starts repeating
[2] - The last key where the value stops repeating
The code
$arrayRange = [];
for($i = 0; $i < count($first_array); $i++){
if(count($arrayRange) == 0){
// The multidimensional array is still empty
$arrayRange[0] = array($first_array[$i], $i, $i);
}else{
if($first_array[$i] == $arrayRange[count($arrayRange)-1][0]){
// It's still the same value, I update the value of the last key
$arrayRange[count($arrayRange)-1][2] = $i;
}else{
// It's a new value, I insert a new array
$arrayRange[count($arrayRange)] = array($first_array[$i], $i, $i);
}
}
}
This way you get a multidimensional array like this:
$arrayRange[0] = array['a', 0, 4];
$arrayRange[1] = array['b', 5, 6];
$arrayRange[2] = array['a', 7, 8];

PHP : How to get sum of multidimensionnal arrays with specific keys?

I would like to know how to get the sum of some key values of multi-dimensional arrays, knowing that some keys are variables; here is an example of the situation :
The array could be written like this :
$array[$dim1][$dim2][$dim3][$dim4] = $variable_value;
$dim1, 2, 3 and 4 are arrays with dimensions, and we don't know the name of the $dim1, 2, 3 and 4.
We want the sum of all $variable_value of each dimensions, but we can't do array_sum($array[$dim1][$dim2][$dim3][$dim4]) because the $dim are not known.
The algorithm I need to find must permit me to apply filters on the sums, like "get the sum of all the $variable_value where $dim3 = $variableX...", so a function like this :
function array_sum_filter($array, $dimension, [$filter_on_key_value])
Any ideas?
use for/foreach loops for looping through the multidimensional array and an if statement to check if $dim3 = $variableX...
You can try this using RecursiveArrayIterator and RecursiveIteratorIterator
$sum = 0;
$specific = 0;
$array = array();
$array["A"]["B"]["C"]["D"] = 5;
$array["A"]["B"]["K"]["D"] = 3;
$array["A"]["C"] = 2;
$array_obj = new RecursiveIteratorIterator(new RecursiveArrayIterator($array));
foreach ( $array_obj as $key => $value ) {
$sum += $value;
if ($key == "D")
$specific += $value;
}
echo $sum, PHP_EOL;
echo $specific, PHP_EOL;
Output
10
8

Exploding Arrays in PHP While Keeping the Original Key

How can I do the following without lots of complicated code?
Explode each value of an array in PHP (I sort of know how to do this step)
Discard the first part
Keep the original key for the second part (I know there will be only two parts)
By this, I mean the following:
$array[1]=blue,green
$array[2]=yellow,red
becomes
$array[1]=green //it exploded [1] into blue and green and discarded blue
$array[2]=red // it exploded [2] into yellow and red and discarded yellow
I just realized, could I do this with a for...each loop? If so, just reply yes. I can code it once I know where to start.
given this:
$array[1] = "blue,green";
$array[2] = "yellow,red";
Here's how to do it:
foreach ($array as $key => $value) {
$temp = explode(",", $value, 2); // makes sure there's only 2 parts
$array[$key] = $temp[1];
}
Another way you could do it would be like this:
foreach ($array as $key => $value) {
$array[$key] = preg_replace("/^.+?,$/", "", $value);
}
... or use a combination of substr() and strpos()
Try this:
$arr = explode(',','a,b,c');
unset($arr[0]);
Although, really, what you're asking doesn't make sense. If you know there are two parts, you probably want something closer to this:
list(,$what_i_want) = explode('|','A|B',2);
foreach ($array as $k => &$v) {
$v = (array) explode(',', $v);
$v = (!empty($v[1])) ? $v[1] : $v[0];
}
The array you start with:
$array[1] = "blue,green";
$array[2] = "yellow,red";
One way of coding it:
function reduction($values)
{
// Assumes the last part is what you want (regardless of how many you have.)
return array_pop(explode(",", $values));
}
$prime = array_map('reduction', $array);
Note: This creates a different array than $array.
Therefore $array == $prime but is not $array === $prime

foreach php statement

I need to combine two foreach statement into one for example
foreach ($categories_stack as $category)
foreach ($page_name as $value)
I need to add these into the same foreach statement
Is this possible if so how?
(I am not sure I have understood your question completely. I am assuming that you want to iterate through the two lists in parallel)
You can do it using for loop as follows :
$n = min(count($category), count($value));
for($c = 0; $c < $n; $c = $c + 1){
$categories_stack = $category[$c];
$pagename = $value[$c];
...
}
To achieve the same with foreach you need a function similar to Python's zip() function.
In Python, it would be :
for categories_stack, pagename in zip(categories, values):
print categories_stack, pagename
Since PHP doesn't have a standard zip() function, you'll have to write such a function on your own or go with the for loop solution.
You can do nested foreachs if that's what you want. But without knowing more of your data, it's impossible to say if this helps:
foreach ($categories_stack as $category) {
foreach ($page_name as $value) {
}
}
Probably you want to print out all pages in a category? That probably won't work, so can you give a bit more info on how the arrays look like and relate to each other?
This loop will continue to the length of the longest array and return null for where there are no matching elements in either of the arrays. Try it out!
$a = array(1 => "a",25 => "b", 10 => "c",99=>"d");
$b = array(15=>1,5=>2,6=>3);
$ao = new ArrayObject($a);
$bo = new ArrayObject($b);
$ai = $ao->getIterator();
$bi = $bo->getIterator();
for (
$ai->rewind(),$bi->rewind(),$av = $ai->current(),$bv = $bi->current();
list($av,$bv) =
array(
($ai->valid() ? $ai->current() : null),
($bi->valid() ? $bi->current() : null)
),
($ai->valid() || $bi->valid());
($ai->valid() ? $ai->next() : null),($bi->valid() ? $bi->next() : null))
{
echo "\$av = $av\n";
echo "\$bv = $bv\n";
}
I cannot really tell from the question exactly how you want to traverse the two arrays. For a nested foreach you simply write
foreach ($myArray as $k => $v) {
foreach ($mySecondArray as $kb => $vb {
}
}
However you can do all sorts of things with some creative use of callback functions. In this case an anonymous function returning two items from each array on each iteration. It's then easy to use the iteration value as an array or split it into variables using list() as done below.
This also has the added benefit of working regardless of key structure. I's purely based on the ordering of array elements. Just use the appropriate sorting function if the elements are out of order.
It does not worry about the length of the arrays as there is no error reported, so make sure you keep an eye out for empty values.
$a = array("a","b","c");
$b = array(1,2,3);
foreach (
array_map(
create_function(
'$a,$b', 'return array($a,$b);'
)
,$a,$b
)
as $value
)
{
list($a,$b) = $value;
echo "\$a = $a\n";
echo "\$b = $b\n";
}
Output
$a = a
$b = 1
$a = b
$b = 2
$a = c
$b = 3
Here's another one for you that stops on either of the lists ending. Same as using min(count(a),count(b). Useful if you have arrays of same length. If someone can make it continue to the max(count(a),count(b)) let me know.
$ao = new ArrayObject($a);
$bo = new ArrayObject($b);
$ai = $ao->getIterator();
$bi = $bo->getIterator();
for (
$ai->rewind(),$bi->rewind();
$av = $ai->current(),$bv=$bi->current();
$ai->next(),$bi->next())
{
echo "\$av = $av\n";
echo "\$bv = $bv\n";
}
This is where the venerable for loop comes in handy:
for(
$i = 0,
$n = sizeof($categories_stack),
$m = sizeof($page_name);
$i < $n && $i < $m;
$i++
) {
$category = $categories_stack[$i];
$value = $page_name[$i];
// do stuff here ....
}
Surely you can just merge the arrays before looping?
$data = array_merge($categories_stack, $page_name);
foreach($data AS $item){
...
}
Do the array elements have a direct correspondence with one another, i.e. is there an element in $page_name for each element in $categories_stack? If so, just iterate over the keys and values (assuming they have the same keys):
foreach ($categories_stack as $key => $value)
{
$category = $value;
$page = $page_name[$key];
// ...
}
Could you just nest them with variables outside the scope of the foreach, or prehaps store the content as an array similar to a KVP setup? My answer is vague but I'm not really sure why you're trying to accomplish this.

Categories