php remove duplicate words in an array - php

Sorry for English is not my mother language, maybe the question title is not quite good. I want to do something like this.
$str = array("Lincoln Crown","Crown Court","go holiday","house fire","John Hinton","Hinton Jailed");
here is an array, "Lincoln Crown" contain "Lincoln" and "Crown", so remove next words, which contains these 2 words, and "Crown Court(contain Crown)" has been removed.
in another case. "John Hinton" contain "John" and "Hinton", so "Hinton Jailed(contain Hinton)" has been removed. the final output should be like this:
$output = array("Lincoln Crown","go holiday","house fire","John Hinton");
for my php skill is not good, it is not simply to use array_unique() array_diff(), so open a question for help, thanks.

I think this might work :P
function cool_function($strs){
// Black list
$toExclude = array();
foreach($strs as $s){
// If it's not on blacklist, then search for it
if(!in_array($s, $toExclude)){
// Explode into blocks
foreach(explode(" ",$s) as $block){
// Search the block on array
$found = preg_grep("/" . preg_quote($block) . "/", $strs);
foreach($found as $k => $f){
if($f != $s){
// Place each found item that's different from current item into blacklist
$toExclude[$k] = $f;
}
}
}
}
}
// Unset all keys that was found
foreach($toExclude as $k => $v){
unset($strs[$k]);
}
// Return the result
return $strs;
}
$strs = array("Lincoln Crown","Crown Court","go holiday","house fire","John Hinton","Hinton Jailed");
print_r(cool_function($strs));
Dump:
Array
(
[0] => Lincoln Crown
[2] => go holiday
[3] => house fire
[4] => John Hinton
)

Seems like you would need a loop and then build a list of words in the array.
Like:
<?
// Store existing array's words; elements will compare their words to this array
// if an element's words are already in this array, the element is deleted
// else the element has its words added to this array
$arrayWords = array();
// Loop through your existing array of elements
foreach ($existingArray as $key => $phrase) {
// Get element's individual words
$words = explode(" ", $phrase);
// Assume the element will not be deleted
$keepWords = true;
// Loop through the element's words
foreach ($words as $word) {
// If one of the words is already in arrayWords (another element uses the word)
if (in_array($word, $arrayWords)) {
// Delete the element
unset($existingArray[$key]);
// Indicate we are not keeping any of the element's words
$keepWords = false;
// Stop the foreach loop
break;
}
}
// Only add the element's words to arrayWords if the entire element stays
if ($keepWords) {
$arrayWords = array_merge($arrayWords, $words);
}
}
?>

As I would do in your case:
$words = array();
foreach($str as $key =>$entry)
{
$entryWords = explode(' ', $entry);
$isDuplicated = false;
foreach($entryWords as $word)
if(in_array($word, $words))
$isDuplicated = true;
if(!$isDuplicated)
$words = array_merge($words, $entryWords);
else
unset($str[$key]);
}
var_dump($str);
Output:
array (size=4)
0 => string 'Lincoln Crown' (length=13)
2 => string 'go holiday' (length=10)
3 => string 'house fire' (length=10)
4 => string 'John Hinton' (length=11)

I can imagine quite a few techniques that can provide your desired output, but the logic that you require is poorly defined in your question. I am assuming that whole word matching is required -- so word boundaries should be used in any regex patterns. Case sensitivity isn't mentioned. I am unsure if only fully unique elements (multi-word strings) should have their words entered into the black list. I'll offer a few snippets, but choosing the appropriate technique will depend on exact logical requirements.
Demo
$output = [];
$blacklist = [];
foreach ($input as $string) {
if (!$blacklist || !preg_match('/\b(?:' . implode('|', $blacklist) . ')\b/', $string)) {
$output[] = $string;
}
foreach(explode(' ', $string) as $word) {
$blacklist[$word] = preg_quote($word);
}
}
var_export($output);
Demo
$output = [];
$blacklist = [];
foreach ($input as $string) {
$words = explode(' ', $string);
foreach ($words as $word) {
if (in_array($word, $blacklist)) {
continue 2;
}
}
array_push($blacklist, ...$words);
$output[] = $string;
}
var_export($output);
And my favorite because it performs fewest iterations in the parent loop, is more compact, and doesn't require the declaration/maintenance of a blacklist array.
Demo
$output = [];
while ($input) {
$output[] = $words = array_shift($input);
$input = preg_grep('~\b(?:\Q' . str_replace(' ', '\E|\Q', $words) . '\E)\b~', $input, PREG_GREP_INVERT);
}
var_export($output);

You can explode each string in the original array and then compare per-words using a loop (comparing each word from one array with each word from another, and if they match, remove the whole array).

array_unique() example
<?php
$input = array("a" => "green", "red", "b" => "green", "blue", "red");
$result = array_unique($input);
print_r($result);
?>
output:
Array
(
[a] => green
[0] => red
[1] => blue
)
Source

Related

Parse formatted strings containing 3 delimiters to create multiple flat arrays

I have strings in following format:
$strings[1] = cat:others;id:4,9,13
$strings[2] = id:4,9,13;cat:electric-products
$strings[3] = id:4,9,13;cat:foods;
$strings[4] = cat:drinks,foods;
where cat means category and id is identity number of a product.
I want to split these strings and convert into arrays $cats = array('others'); and $ids = array('4','9','13');
I know that it can be done by foreach and explode function through multiple steps. I think I am somewhere near, but the following code does not work.
Also, I wonder if it can be done by preg_match or preg_split in fewer steps. Or any other simpler method.
foreach ($strings as $key=>$string) {
$temps = explode(';', $string);
foreach($temps as $temp) {
$tempnest = explode(':', $temp);
$array[$tempnest[0]] .= explode(',', $tempnest[1]);
}
}
My desired result should be:
$cats = ['others', 'electric-products', 'foods', 'drinks';
and
$ids = ['4','9','13'];
One option could be doing a string compare for the first item after explode for cat and id to set the values to the right array.
$strings = ["cat:others;id:4,9,13", "id:4,9,13;cat:electric-products", "id:4,9,13;cat:foods", "cat:drinks,foods"];
foreach ($strings as $key=>$string) {
$temps = explode(';', $string);
$cats = [];
$ids = [];
foreach ($temps as $temp) {
$tempnest = explode(':', $temp);
if ($tempnest[0] === "cat") {
$cats = explode(',', $tempnest[1]);
}
if ($tempnest[0] === "id") {
$ids = explode(',', $tempnest[1]);
}
}
print_r($cats);
print_r($ids);
}
Php demo
Output for the first item would for example look like
Array
(
[0] => others
)
Array
(
[0] => 4
[1] => 9
[2] => 13
)
If you want to aggregate all the values in 2 arrays, you can array_merge the results, and at the end get the unique values using array_unique.
$strings = ["cat:others;id:4,9,13", "id:4,9,13;cat:electric-products", "id:4,9,13;cat:foods", "cat:drinks,foods"];
$cats = [];
$ids = [];
foreach ($strings as $key=>$string) {
$temps = explode(';', $string);
foreach ($temps as $temp) {
$tempnest = explode(':', $temp);
if ($tempnest[0] === "cat") {
$cats = array_merge(explode(',', $tempnest[1]), $cats);
}
if ($tempnest[0] === "id") {
$ids = array_merge(explode(',', $tempnest[1]), $ids);
}
}
}
print_r(array_unique($cats));
print_r(array_unique($ids));
Output
Array
(
[0] => drinks
[1] => foods
[3] => electric-products
[4] => others
)
Array
(
[0] => 4
[1] => 9
[2] => 13
)
Php demo
I don't generally recommend using variable variables, but you are looking for a sleek snippet which uses regex to avoid multiple explode() calls.
Here is a script that will use no explode() calls and no nested foreach() loops.
You can see how the \G ("continue" metacharacter) allows continuous matches relative the "bucket" label (id or cat) by calling var_export($matches);.
If this were my own code, I'd probably not create separate variables, but a single array containing id and cat --- this would alleviate the need for variable variables.
By using the encountered value as the key for the element to be added to the bucket, you are assured to have no duplicate values in any bucket -- just call array_values() if you want to re-index the bucket elements.
Code: (Demo) (Regex101)
$count = preg_match_all(
'/(?:^|;)(id|cat):|\G(?!^),?([^,;]+)/',
implode(';', $strings),
$matches,
PREG_UNMATCHED_AS_NULL
);
$cat = [];
$id = [];
for ($i = 0; $i < $count; ++$i) {
if ($matches[1][$i] !== null) {
$arrayName = $matches[1][$i];
} else {
${$arrayName}[$matches[2][$i]] = $matches[2][$i];
}
}
var_export(array_values($id));
echo "\n---\n";
var_export(array_values($cat));
All that said, I probably wouldn't rely on regex because it isn't very readable to the novice regex developer. The required logic is much simpler and easier to maintain with nested loops and explosions. Here is my adjustment of your code.
Code: (Demo)
$result = ['id' => [], 'cat' => []];
foreach ($strings as $string) {
foreach (explode(';', $string) as $segment) {
[$key, $values] = explode(':', $segment, 2);
array_push($result[$key], ...explode(',', $values));
}
}
var_export(array_unique($result['id']));
echo "\n---\n";
var_export(array_unique($result['cat']));
P.s. your posted coding attempt was using a combined operator .= (assignment & concatenation) instead of the more appropriate combined operator += (assignment & array union).

Split strings in a flat array on the space between mixed-case words and all-caps words

I have this array
$arr2 = array(
"SUBTITLE",
"Test Your Might RUNTIME",
"1 hr 41 mins GENRE",
"Science-Fiction/Fantasy SYNOPSIS",
"his film adaptation of the wildly popular video game comes complete with dazzling special effects and plenty of martial arts action. LIGTHNING AND EFFECT"
);
And I would like to separate the words that are all upper case from the sentences that are lower. Like this.
$arr2 = array(
"SUBTITLE",
"Test Your Might",
"RUNTIME",
"1 hr 41 mins",
"GENRE",
"Science-Fiction/Fantasy",
"SYNOPSIS",
"his film adaptation of the wildly popular video game comes complete with dazzling special effects and plenty of martial arts action.",
"LIGHTNING AND EFFECT"
)
How would I do this? If there is a way to do this with regex, that would be preferred.
I used a separator to join the cases whenever there is a change. I don't know what other kind of data you could be passing so I left the separator as a variable to be changed easily.
$separator = '<br>';
foreach($arr2 as $index => $str) {
$str = explode(" ", $str);
$final_string = '';
$last_case = '';
foreach($str as $word) {
if ($word === strtoupper($word)) {
if ($last_case == 'lower') $final_string .= $separator.$word;
else $final_string .= $word . ' ';
$last_case = 'upper';
} else {
if ($last_case == 'upper') $final_string .= $separator.$word;
else $final_string .= $word . ' ';
$last_case = 'lower';
}
}
$arr2[$index] = explode($separator, trim($final_string));
}
$arr2 = array_reduce($arr2, 'array_merge', []);
print_r($arr2); // array ( 0 => 'SUBTITLE', 1 => 'Test Your Might ', 2 => 'RUNTIME', 3 => '1 hr 41 mins', 4 => 'GENRE', 5 => 'Science-Fiction/Fantasy', 6 => 'SYNOPSIS', 7 => 'his film adaptation of the wildly popular video game comes complete with dazzling special effects and plenty of martial arts action.', 8 => 'STARRING');
Here's what I would do:
<?php
$a = ['SUBTITLE', 'Test Your Might RUNTIME', '1 hr 41 mins GENRE', 'Science-Fiction/Fantasy SYNOPSIS', 'his film adaptation of the wildly popular video game comes complete with dazzling special effects and plenty of martial arts action. LIGTHNING AND EFFECT'];
foreach($a as $v){
if(preg_match('/(?:[A-Z]{2,}(?:\s+|$))+/', $v, $m)){
foreach($m as $s){
if($s === $v){
$r[] = $s;
}
else{
$r[] = str_replace($s, '', $v); $r[] = $s;
}
}
}
else{
$r[] = $v;
}
}
print_r($r);
?>
Please confirm it works as expected, but one way would be to do something like the following:
function splitter($jumbled) {
$lowercase = [];
foreach($jumbled as $element){
$exploded = explode(" ", $element);
foreach ($exploded as $word){
if ($word == strtoupper($word)){
$uppercase[] = $word;
} else {
$lowercase .= $word . ' ';
}
}
$output[] = $uppercase;
$output[] = rtrim($lowercase,' ');
}
return $output;
}
then just call the function with arr2: splitter($arr2)
(note: this returns the array as you requested, but it would be easier to use the array afterwords if you matched the corresponding values as an $uppercase => $lowercase key/value pair in the array being returned)
Use array_reduce() for a functional-style approach to return a result array with a different count than the input array.
Split on sequences of all-caps words, including their leading and trailing spaces. Capture the sequence of words so that only the outermost spaces are lost while splitting.
Merge all newly generated elements with the result array as you iterate so that you end up with a flat array.
Code: (Demo)
var_export(
array_reduce(
$array,
fn($result, $v) => array_merge(
$result,
preg_split(
'/ *\b([A-Z]+(?: [A-Z]+)*)\b */',
$v,
0,
PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE
)
),
[]
)
);
Specifically tailored to your input which only has the all-caps words at the end of the string, you can use / (\[A-Z\]+(?: \[A-Z\]+)*)$/ with the same effect.

Delete duplicate words in array with sentences in PHP

I have a string array with words and sentences included.
For example:
array("dog","cat","the dog is running","some other text","some","text")
And I want to remove duplicate words, leaving only unique words in it. I want to remove these words even in sentences.
The result should look like:
array("dog","cat","the is running","other","some","text")
I tried the array_unique function but it didn't work.
You can use the array_unique after loop with explode and array_push:
$res = [];
foreach($arr as $e) {
array_push($res, ...explode(" ", $e));
}
print_r(array_unique($res));
Reference:
array_push, explode, array-unique
Live example: 3v4l
If you want to keep the sentences use:
$arr = array("dog","cat","the dog is running","some other text","some","text");
// sort first to get the shortest sentence first
usort($arr, function ($a, $b) {return count(explode(" ", $a)) - count(explode(" ", $b)); });
$words = [];
foreach($arr as &$e) {
$res[] = trim(strtr($e, $words)); //get the word after swapping existing
foreach(explode(" ", $e) as $w)
$words[$w] =''; //add all new words to the swapping array with value of empty string
}
This solution is not pretty, but should get the job done and meet some of the edge cases at hand. I'm assuming that no more than one space separates words in a sentence string and that you want to preserve original ordering.
The approach is to walk the array twice, once to filter out duplicate single words, then once again to filter out duplicate words in sentences. This guarantees priority for single words. Finally, ksort the array (this is the ugly part from a time complexity standpoint: everything is O(max_len_sentence * n) up until now).
$arr = ["dog","cat","the dog is running","some other text","some","text"];
$seen = [];
$result = [];
foreach ($arr as $i => $e) {
if (preg_match("/^\w+$/", $e) &&
!array_key_exists($e, $seen)) {
$result[$i] = $e;
$seen[$e] = 1;
}
}
foreach ($arr as $i => $e) {
$words = explode(" ", $e);
if (count($words) > 1) {
$filtered = [];
foreach ($words as $word) {
if (!array_key_exists($word, $seen)) {
$seen[$word] = 0;
}
if (++$seen[$word] < 2) {
$filtered[]= $word;
}
}
if ($filtered) {
$result[$i] = implode($filtered, " ");
}
}
}
ksort($result);
$result = array_values($result);
print_r($result);
Output
Array
(
[0] => dog
[1] => cat
[2] => the is running
[3] => other
[4] => some
[5] => text
)

Select words from string according to array list

I want to select specific words from a sentence according to my array list
$sentence = "please take this words only to display in my browser";
$list = array ("display","browser","words","in");
I want the output just like " words display in browser"
please somebody help me with this one. THX
I wonder if this one liner would do it :
echo join(" ", array_intersect($list, explode(" ",$sentence)));
Use at your own risk :)
edit : yay, it does the job, just tested
You can do it with preg_match:
$sentence = "please take this words only to display in my browser";
$list = array ("display","browser","words","in");
preg_match_all('/\b'.implode('\b|\b', $list).'\b/i', $sentence, $matches) ;
print_r($matches);
You'll get the words in order
Array
(
[0] => Array
(
[0] => words
[1] => display
[2] => in
[3] => browser
)
)
But be careful with regular expressions performance if the text is not that simple.
I don't know any short version for this rather than checking word by word.
$words = explode(" ", $sentence);
$new_sentence_array = array();
foreach($words as $word) {
if(in_array($word, $list)) {
$new_sentence_array[] = $word;
}
}
$new_sentece = implode(" ", $new_sentence_array);
echo $new_sentence;
I think you could search the string for each value in the array and assign it to a new array with the strpos value as the key; that would give you a sortable array that you could then output in the order that the terms appear in the string. See below, or example.
<?php
$sentence = "please take this words only to display in my browser";
$list = array ("display","browser","words","in");
$found = array();
foreach($list as $k => $v){
$position = strpos(strtolower($sentence), strtolower($v));
if($position){
$found[$position] = $v;
}
}
ksort($found);
foreach($found as $v){
echo $v.' ';
}
?>
$narray=array();
foreach ($list as $value) {
$status=stristr($sentence, $value);
if ($status) {
$narray[]=$value;
}
}
echo #implode(" ",$narray);

How to get first and last occurence of an array of words in text using PHP?

$arr = array('superman','gossipgirl',...);
$text = 'arbitary stuff here...';
What I want to do is find the first/last occurencing index of each word in $arr within $text,how to do it efficiently in PHP?
What i think you want is array_keys http://uk3.php.net/manual/en/function.array-keys.php
<?php
$array = array("blue", "red", "green", "blue", "blue");
$keys = array_keys($array, "blue");
print_r($keys);
?>
The above example will output:
Array
(
[0] => 0
[1] => 3
[2] => 4
)
echo 'First '.$keys[0] will echo the first.
You can get the last various ways, one way would be to count the elements and then echo last one e.g.
$count = count($keys);
echo ' Last '.$keys[$count -1]; # -1 as count will return the number of entries.
The above example will output:
First 0 Last 4
I think you want:
<?php
$arr = array('superman','gossipgirl',...);
$text = 'arbitary stuff here...';
$occurence_array = array();
foreach ($arr as $value) {
$first = strpos($text, $value);
$last = strrpos($text, $value);
$occurence_array[$value] = array($first,$last);
}
?>
strpos-based methods will tell you nothing about words positions, they only able to find substrings of text. Try regular expressions:
preg_match_all('~\b(?:' . implode('|', $words) . ')\b~', $text, $m, PREG_OFFSET_CAPTURE);
$map = array();
foreach($m[0] as $e) $map[$e[0]][] = $e[1];
this generates a word-position map like this
'word1' => array(pos1, pos2, ...),
'word2' => array(pos1, pos2, ...),
Once you've got this, you can easily find first/last positions by using
$firstPosOfEachWord = array_map('min', $map);
You could do this by using strpos and strrpos together with a simple foreach loop.

Categories