Delete duplicate words in array with sentences in PHP - php

I have a string array with words and sentences included.
For example:
array("dog","cat","the dog is running","some other text","some","text")
And I want to remove duplicate words, leaving only unique words in it. I want to remove these words even in sentences.
The result should look like:
array("dog","cat","the is running","other","some","text")
I tried the array_unique function but it didn't work.

You can use the array_unique after loop with explode and array_push:
$res = [];
foreach($arr as $e) {
array_push($res, ...explode(" ", $e));
}
print_r(array_unique($res));
Reference:
array_push, explode, array-unique
Live example: 3v4l
If you want to keep the sentences use:
$arr = array("dog","cat","the dog is running","some other text","some","text");
// sort first to get the shortest sentence first
usort($arr, function ($a, $b) {return count(explode(" ", $a)) - count(explode(" ", $b)); });
$words = [];
foreach($arr as &$e) {
$res[] = trim(strtr($e, $words)); //get the word after swapping existing
foreach(explode(" ", $e) as $w)
$words[$w] =''; //add all new words to the swapping array with value of empty string
}

This solution is not pretty, but should get the job done and meet some of the edge cases at hand. I'm assuming that no more than one space separates words in a sentence string and that you want to preserve original ordering.
The approach is to walk the array twice, once to filter out duplicate single words, then once again to filter out duplicate words in sentences. This guarantees priority for single words. Finally, ksort the array (this is the ugly part from a time complexity standpoint: everything is O(max_len_sentence * n) up until now).
$arr = ["dog","cat","the dog is running","some other text","some","text"];
$seen = [];
$result = [];
foreach ($arr as $i => $e) {
if (preg_match("/^\w+$/", $e) &&
!array_key_exists($e, $seen)) {
$result[$i] = $e;
$seen[$e] = 1;
}
}
foreach ($arr as $i => $e) {
$words = explode(" ", $e);
if (count($words) > 1) {
$filtered = [];
foreach ($words as $word) {
if (!array_key_exists($word, $seen)) {
$seen[$word] = 0;
}
if (++$seen[$word] < 2) {
$filtered[]= $word;
}
}
if ($filtered) {
$result[$i] = implode($filtered, " ");
}
}
}
ksort($result);
$result = array_values($result);
print_r($result);
Output
Array
(
[0] => dog
[1] => cat
[2] => the is running
[3] => other
[4] => some
[5] => text
)

Related

Parse formatted strings containing 3 delimiters to create multiple flat arrays

I have strings in following format:
$strings[1] = cat:others;id:4,9,13
$strings[2] = id:4,9,13;cat:electric-products
$strings[3] = id:4,9,13;cat:foods;
$strings[4] = cat:drinks,foods;
where cat means category and id is identity number of a product.
I want to split these strings and convert into arrays $cats = array('others'); and $ids = array('4','9','13');
I know that it can be done by foreach and explode function through multiple steps. I think I am somewhere near, but the following code does not work.
Also, I wonder if it can be done by preg_match or preg_split in fewer steps. Or any other simpler method.
foreach ($strings as $key=>$string) {
$temps = explode(';', $string);
foreach($temps as $temp) {
$tempnest = explode(':', $temp);
$array[$tempnest[0]] .= explode(',', $tempnest[1]);
}
}
My desired result should be:
$cats = ['others', 'electric-products', 'foods', 'drinks';
and
$ids = ['4','9','13'];
One option could be doing a string compare for the first item after explode for cat and id to set the values to the right array.
$strings = ["cat:others;id:4,9,13", "id:4,9,13;cat:electric-products", "id:4,9,13;cat:foods", "cat:drinks,foods"];
foreach ($strings as $key=>$string) {
$temps = explode(';', $string);
$cats = [];
$ids = [];
foreach ($temps as $temp) {
$tempnest = explode(':', $temp);
if ($tempnest[0] === "cat") {
$cats = explode(',', $tempnest[1]);
}
if ($tempnest[0] === "id") {
$ids = explode(',', $tempnest[1]);
}
}
print_r($cats);
print_r($ids);
}
Php demo
Output for the first item would for example look like
Array
(
[0] => others
)
Array
(
[0] => 4
[1] => 9
[2] => 13
)
If you want to aggregate all the values in 2 arrays, you can array_merge the results, and at the end get the unique values using array_unique.
$strings = ["cat:others;id:4,9,13", "id:4,9,13;cat:electric-products", "id:4,9,13;cat:foods", "cat:drinks,foods"];
$cats = [];
$ids = [];
foreach ($strings as $key=>$string) {
$temps = explode(';', $string);
foreach ($temps as $temp) {
$tempnest = explode(':', $temp);
if ($tempnest[0] === "cat") {
$cats = array_merge(explode(',', $tempnest[1]), $cats);
}
if ($tempnest[0] === "id") {
$ids = array_merge(explode(',', $tempnest[1]), $ids);
}
}
}
print_r(array_unique($cats));
print_r(array_unique($ids));
Output
Array
(
[0] => drinks
[1] => foods
[3] => electric-products
[4] => others
)
Array
(
[0] => 4
[1] => 9
[2] => 13
)
Php demo
I don't generally recommend using variable variables, but you are looking for a sleek snippet which uses regex to avoid multiple explode() calls.
Here is a script that will use no explode() calls and no nested foreach() loops.
You can see how the \G ("continue" metacharacter) allows continuous matches relative the "bucket" label (id or cat) by calling var_export($matches);.
If this were my own code, I'd probably not create separate variables, but a single array containing id and cat --- this would alleviate the need for variable variables.
By using the encountered value as the key for the element to be added to the bucket, you are assured to have no duplicate values in any bucket -- just call array_values() if you want to re-index the bucket elements.
Code: (Demo) (Regex101)
$count = preg_match_all(
'/(?:^|;)(id|cat):|\G(?!^),?([^,;]+)/',
implode(';', $strings),
$matches,
PREG_UNMATCHED_AS_NULL
);
$cat = [];
$id = [];
for ($i = 0; $i < $count; ++$i) {
if ($matches[1][$i] !== null) {
$arrayName = $matches[1][$i];
} else {
${$arrayName}[$matches[2][$i]] = $matches[2][$i];
}
}
var_export(array_values($id));
echo "\n---\n";
var_export(array_values($cat));
All that said, I probably wouldn't rely on regex because it isn't very readable to the novice regex developer. The required logic is much simpler and easier to maintain with nested loops and explosions. Here is my adjustment of your code.
Code: (Demo)
$result = ['id' => [], 'cat' => []];
foreach ($strings as $string) {
foreach (explode(';', $string) as $segment) {
[$key, $values] = explode(':', $segment, 2);
array_push($result[$key], ...explode(',', $values));
}
}
var_export(array_unique($result['id']));
echo "\n---\n";
var_export(array_unique($result['cat']));
P.s. your posted coding attempt was using a combined operator .= (assignment & concatenation) instead of the more appropriate combined operator += (assignment & array union).

php remove duplicate words in an array

Sorry for English is not my mother language, maybe the question title is not quite good. I want to do something like this.
$str = array("Lincoln Crown","Crown Court","go holiday","house fire","John Hinton","Hinton Jailed");
here is an array, "Lincoln Crown" contain "Lincoln" and "Crown", so remove next words, which contains these 2 words, and "Crown Court(contain Crown)" has been removed.
in another case. "John Hinton" contain "John" and "Hinton", so "Hinton Jailed(contain Hinton)" has been removed. the final output should be like this:
$output = array("Lincoln Crown","go holiday","house fire","John Hinton");
for my php skill is not good, it is not simply to use array_unique() array_diff(), so open a question for help, thanks.
I think this might work :P
function cool_function($strs){
// Black list
$toExclude = array();
foreach($strs as $s){
// If it's not on blacklist, then search for it
if(!in_array($s, $toExclude)){
// Explode into blocks
foreach(explode(" ",$s) as $block){
// Search the block on array
$found = preg_grep("/" . preg_quote($block) . "/", $strs);
foreach($found as $k => $f){
if($f != $s){
// Place each found item that's different from current item into blacklist
$toExclude[$k] = $f;
}
}
}
}
}
// Unset all keys that was found
foreach($toExclude as $k => $v){
unset($strs[$k]);
}
// Return the result
return $strs;
}
$strs = array("Lincoln Crown","Crown Court","go holiday","house fire","John Hinton","Hinton Jailed");
print_r(cool_function($strs));
Dump:
Array
(
[0] => Lincoln Crown
[2] => go holiday
[3] => house fire
[4] => John Hinton
)
Seems like you would need a loop and then build a list of words in the array.
Like:
<?
// Store existing array's words; elements will compare their words to this array
// if an element's words are already in this array, the element is deleted
// else the element has its words added to this array
$arrayWords = array();
// Loop through your existing array of elements
foreach ($existingArray as $key => $phrase) {
// Get element's individual words
$words = explode(" ", $phrase);
// Assume the element will not be deleted
$keepWords = true;
// Loop through the element's words
foreach ($words as $word) {
// If one of the words is already in arrayWords (another element uses the word)
if (in_array($word, $arrayWords)) {
// Delete the element
unset($existingArray[$key]);
// Indicate we are not keeping any of the element's words
$keepWords = false;
// Stop the foreach loop
break;
}
}
// Only add the element's words to arrayWords if the entire element stays
if ($keepWords) {
$arrayWords = array_merge($arrayWords, $words);
}
}
?>
As I would do in your case:
$words = array();
foreach($str as $key =>$entry)
{
$entryWords = explode(' ', $entry);
$isDuplicated = false;
foreach($entryWords as $word)
if(in_array($word, $words))
$isDuplicated = true;
if(!$isDuplicated)
$words = array_merge($words, $entryWords);
else
unset($str[$key]);
}
var_dump($str);
Output:
array (size=4)
0 => string 'Lincoln Crown' (length=13)
2 => string 'go holiday' (length=10)
3 => string 'house fire' (length=10)
4 => string 'John Hinton' (length=11)
I can imagine quite a few techniques that can provide your desired output, but the logic that you require is poorly defined in your question. I am assuming that whole word matching is required -- so word boundaries should be used in any regex patterns. Case sensitivity isn't mentioned. I am unsure if only fully unique elements (multi-word strings) should have their words entered into the black list. I'll offer a few snippets, but choosing the appropriate technique will depend on exact logical requirements.
Demo
$output = [];
$blacklist = [];
foreach ($input as $string) {
if (!$blacklist || !preg_match('/\b(?:' . implode('|', $blacklist) . ')\b/', $string)) {
$output[] = $string;
}
foreach(explode(' ', $string) as $word) {
$blacklist[$word] = preg_quote($word);
}
}
var_export($output);
Demo
$output = [];
$blacklist = [];
foreach ($input as $string) {
$words = explode(' ', $string);
foreach ($words as $word) {
if (in_array($word, $blacklist)) {
continue 2;
}
}
array_push($blacklist, ...$words);
$output[] = $string;
}
var_export($output);
And my favorite because it performs fewest iterations in the parent loop, is more compact, and doesn't require the declaration/maintenance of a blacklist array.
Demo
$output = [];
while ($input) {
$output[] = $words = array_shift($input);
$input = preg_grep('~\b(?:\Q' . str_replace(' ', '\E|\Q', $words) . '\E)\b~', $input, PREG_GREP_INVERT);
}
var_export($output);
You can explode each string in the original array and then compare per-words using a loop (comparing each word from one array with each word from another, and if they match, remove the whole array).
array_unique() example
<?php
$input = array("a" => "green", "red", "b" => "green", "blue", "red");
$result = array_unique($input);
print_r($result);
?>
output:
Array
(
[a] => green
[0] => red
[1] => blue
)
Source

Select words from string according to array list

I want to select specific words from a sentence according to my array list
$sentence = "please take this words only to display in my browser";
$list = array ("display","browser","words","in");
I want the output just like " words display in browser"
please somebody help me with this one. THX
I wonder if this one liner would do it :
echo join(" ", array_intersect($list, explode(" ",$sentence)));
Use at your own risk :)
edit : yay, it does the job, just tested
You can do it with preg_match:
$sentence = "please take this words only to display in my browser";
$list = array ("display","browser","words","in");
preg_match_all('/\b'.implode('\b|\b', $list).'\b/i', $sentence, $matches) ;
print_r($matches);
You'll get the words in order
Array
(
[0] => Array
(
[0] => words
[1] => display
[2] => in
[3] => browser
)
)
But be careful with regular expressions performance if the text is not that simple.
I don't know any short version for this rather than checking word by word.
$words = explode(" ", $sentence);
$new_sentence_array = array();
foreach($words as $word) {
if(in_array($word, $list)) {
$new_sentence_array[] = $word;
}
}
$new_sentece = implode(" ", $new_sentence_array);
echo $new_sentence;
I think you could search the string for each value in the array and assign it to a new array with the strpos value as the key; that would give you a sortable array that you could then output in the order that the terms appear in the string. See below, or example.
<?php
$sentence = "please take this words only to display in my browser";
$list = array ("display","browser","words","in");
$found = array();
foreach($list as $k => $v){
$position = strpos(strtolower($sentence), strtolower($v));
if($position){
$found[$position] = $v;
}
}
ksort($found);
foreach($found as $v){
echo $v.' ';
}
?>
$narray=array();
foreach ($list as $value) {
$status=stristr($sentence, $value);
if ($status) {
$narray[]=$value;
}
}
echo #implode(" ",$narray);

"Unfolding" a String

I have a set of strings, each string has a variable number of segments separated by pipes (|), e.g.:
$string = 'abc|b|ac';
Each segment with more than one char should be expanded into all the possible one char combinations, for 3 segments the following "algorithm" works wonderfully:
$result = array();
$string = explode('|', 'abc|b|ac');
foreach (str_split($string[0]) as $i)
{
foreach (str_split($string[1]) as $j)
{
foreach (str_split($string[2]) as $k)
{
$result[] = implode('|', array($i, $j, $k)); // more...
}
}
}
print_r($result);
Output:
$result = array('a|b|a', 'a|b|c', 'b|b|a', 'b|b|c', 'c|b|a', 'c|b|c');
Obviously, for more than 3 segments the code starts to get extremely messy, since I need to add (and check) more and more inner loops. I tried coming up with a dynamic solution but I can't figure out how to generate the correct combination for all the segments (individually and as a whole). I also looked at some combinatorics source code but I'm unable to combine the different combinations of my segments.
I appreciate if anyone can point me in the right direction.
Recursion to the rescue (you might need to tweak a bit to cover edge cases, but it works):
function explodinator($str) {
$segments = explode('|', $str);
$pieces = array_map('str_split', $segments);
return e_helper($pieces);
}
function e_helper($pieces) {
if (count($pieces) == 1)
return $pieces[0];
$first = array_shift($pieces);
$subs = e_helper($pieces);
foreach($first as $char) {
foreach ($subs as $sub) {
$result[] = $char . '|' . $sub;
}
}
return $result;
}
print_r(explodinator('abc|b|ac'));
Outputs:
Array
(
[0] => a|b|a
[1] => a|b|c
[2] => b|b|a
[3] => b|b|c
[4] => c|b|a
[5] => c|b|c
)
As seen on ideone.
This looks like a job for recursive programming! :P
I first looked at this and thought it was going to be a on-liner (and probably is in perl).
There are other non-recursive ways (enumerate all combinations of indexes into segments then loop through, for example) but I think this is more interesting, and probably 'better'.
$str = explode('|', 'abc|b|ac');
$strlen = count( $str );
$results = array();
function splitAndForeach( $bchar , $oldindex, $tempthread) {
global $strlen, $str, $results;
$temp = $tempthread;
$newindex = $oldindex + 1;
if ( $bchar != '') { array_push($temp, $bchar ); }
if ( $newindex <= $strlen ){
print "starting foreach loop on string '".$str[$newindex-1]."' \n";
foreach(str_split( $str[$newindex - 1] ) as $c) {
print "Going into next depth ($newindex) of recursion on char $c \n";
splitAndForeach( $c , $newindex, $temp);
}
} else {
$found = implode('|', $temp);
print "Array length (max recursion depth) reached, result: $found \n";
array_push( $results, $found );
$temp = $tempthread;
$index = 0;
print "***************** Reset index to 0 *****************\n\n";
}
}
splitAndForeach('', 0, array() );
print "your results: \n";
print_r($results);
You could have two arrays: the alternatives and a current counter.
$alternatives = array(array('a', 'b', 'c'), array('b'), array('a', 'c'));
$counter = array(0, 0, 0);
Then, in a loop, you increment the "last digit" of the counter, and if that is equal to the number of alternatives for that position, you reset that "digit" to zero and increment the "digit" left to it. This works just like counting with decimal numbers.
The string for each step is built by concatenating the $alternatives[$i][$counter[$i]] for each digit.
You are finished when the "first digit" becomes as large as the number of alternatives for that digit.
Example: for the above variables, the counter would get the following values in the steps:
0,0,0
0,0,1
1,0,0 (overflow in the last two digit)
1,0,1
2,0,0 (overflow in the last two digits)
2,0,1
3,0,0 (finished, since the first "digit" has only 3 alternatives)

How to get first and last occurence of an array of words in text using PHP?

$arr = array('superman','gossipgirl',...);
$text = 'arbitary stuff here...';
What I want to do is find the first/last occurencing index of each word in $arr within $text,how to do it efficiently in PHP?
What i think you want is array_keys http://uk3.php.net/manual/en/function.array-keys.php
<?php
$array = array("blue", "red", "green", "blue", "blue");
$keys = array_keys($array, "blue");
print_r($keys);
?>
The above example will output:
Array
(
[0] => 0
[1] => 3
[2] => 4
)
echo 'First '.$keys[0] will echo the first.
You can get the last various ways, one way would be to count the elements and then echo last one e.g.
$count = count($keys);
echo ' Last '.$keys[$count -1]; # -1 as count will return the number of entries.
The above example will output:
First 0 Last 4
I think you want:
<?php
$arr = array('superman','gossipgirl',...);
$text = 'arbitary stuff here...';
$occurence_array = array();
foreach ($arr as $value) {
$first = strpos($text, $value);
$last = strrpos($text, $value);
$occurence_array[$value] = array($first,$last);
}
?>
strpos-based methods will tell you nothing about words positions, they only able to find substrings of text. Try regular expressions:
preg_match_all('~\b(?:' . implode('|', $words) . ')\b~', $text, $m, PREG_OFFSET_CAPTURE);
$map = array();
foreach($m[0] as $e) $map[$e[0]][] = $e[1];
this generates a word-position map like this
'word1' => array(pos1, pos2, ...),
'word2' => array(pos1, pos2, ...),
Once you've got this, you can easily find first/last positions by using
$firstPosOfEachWord = array_map('min', $map);
You could do this by using strpos and strrpos together with a simple foreach loop.

Categories