PHP - Most Efficient Dictionary Code - php

I'm using the following code to pull the definition of a word from a tab-delimited file with only two columns (word, definition). Is this the most efficient code for what I'm trying to do?
<?php
$haystack = file("dictionary.txt");
$needle = 'apple';
$flipped_haystack = array_flip($haystack);
foreach($haystack as $value)
{
$haystack = explode("\t", $value);
if ($haystack[0] == $needle)
{
echo "Definition of $needle: $haystack[1]";
$defined = "1";
break;
}
}
if($defined != "1")
{
echo "$needle not found!";
}
?>

Right now you're doing a lot of pointless work
1) load the file into a per-line array
2) flip the array
3) iterate over and explode every value of the array
4) test that exploded value
You can't really avoid step 1, but why do you have to do all that useless "busy work" for 2&3?
e.g. if your dictionary text was set up something like this:
word:definition
then a simple:
$matches = preg_grep('/^$word:(.*)$/', $haystack);
would do the trick for you, with far less code.

No. Most likely a trie is more efficient and you didn't sort your dictionary and it doesn't use a binary tree or ternary tree. I guess if you need to search in a huge dictionary your method is simply too slow.

Is this the most efficient code for what I'm trying to do?
Surely not.
To find only one needle you are processing all the entries.
I will be building up to have 100,000+ entries.
use a database then.

Related

PHP - Matching words in dynamic arrays

I've taken a look around but cant seem to find anything that does as needed.
Lets say I have 2 arrays in a function, however they are completely dynamic. So each time this function is run, the arrays are created based on a page that has been submitted.
I need to some how match these arrays and look for any phrase/words that appear in both.
Example: (with only a single element in each array)
Array 1: "This is some sample text that will display on the web"
Array 2: "You could always use some sample text for testing"
So in that example, the 2 arrays have a phrase that appears exactly the same in each: "Sample Text"
So seeing as these arrays are always dynamic I am unable to do anything like Regex because I will never know what words will be in the arrays.
You could find all words in an array of strings like this:
function find_words(array $arr)
{
return array_reduce($arr, function(&$result, $item) {
if (($words = str_word_count($item, 1))) {
return array_merge($result, $words);
}
}, array());
}
To use it, you run the end results through array_intersect:
$a = array('This is some sample text that', 'will display on the web');
$b = array('You could always use some sample text for testing');
$similar = array_intersect(find_words($a), find_words($b));
// ["some", "sample", "text"]
Array_intersect() should do this for you:
http://www.php.net/manual/en/function.array-intersect.php
*array_intersect() returns an array containing all the values of array1 that are present in all the arguments. Note that keys are preserved.*
maybe something like this:
foreach($arr as $v) {
$pos = strpos($v, "sample text");
if($pos !== false) {
// success
}
}
here is the manual:
http://de3.php.net/manual/de/function.strpos.php
Explode the two strings by spaces, and it is a simple case of comparing arrays.

Need every permutation of capitalized letters in php

I want to build an array in php that contains every possible capitalization permutation of a word. so it would be (pseudocode)
function permutate($word){
for ($i=0; $i<count($word); $i++){
...confused here...
array_push($myArray, $newWord)
}
return $myArray;
}
So say I put in "School" I should get an array back of
{school, School, sChool, SCHool, schOOl, ... SCHOOL}
I know of functions that capitalize the string or the first character, but I am really struggling with how to accomplish this.
This should do it for you:
function permute($word){
if(!$word)
return array($word);
$permutations = array();
foreach(permute(substr($word, 1)) as $permutation){
$lower = strtolower($word[0]);
$permutations[] = $lower . $permutation;
$upper = strtoupper($word[0]);
if($upper !== $lower)
$permutations[] = $upper . $permutation;
}
return $permutations;
}
Codepad Demo
However, for your particular use case there may be a better solution. As there are 2^n permutations for a string of length n. It will be infeasible to run this (or even to generate all those strings using any method at all) on a much longer string.
In reality you should probably be converting strings to one particular case before hashing them, before storing them in the database, if you want to do case-insensitive matching.

Check if comma separated string contains in another comma separated string

I m having trouble checking if a comma separated string contains another comma separated string.
Suppose I have two strings
$stringA="red,blue,yellow,green,black,grey,purple,pink,khaki,lemon,orange,white,maroon";
$stringB="blue,green,white,pink,maroon";
All I want to check is whether colors in $stringB is contained in $stringA or not?? The only way I could think of is converting $stringA into an array, and checking the colors one by one using in_array function. Is there another easier way around?
Thanks in advance
$stringA="red,blue,yellow,green,black,grey,purple,pink,khaki,lemon,orange,white,maroon";
$stringB="blue,green,white,pink,maroon";
$arrayA = explode(',', $stringA);
$arrayB = explode(',', $stringB);
$min = min(array(
count($arrayA),
count($arrayB),
));
$AcontainsB = ($min == count(array_intersect($arrayA, $arrayB)));
I think comparing arrays is not bad idea, but you can also do something like that:
$stringATmp = ','.$stringA.',';
$colors = explode(',', $stringB);
$contains = true;
foreach ($colors as $color) {
if (strpos($stringATmp, ','.$color.',') === false) {
$contains = false;
break;
}
}
There are ways of doing it that are faster than others, but no ways that are conceptually easier than loading the data into some kind of data structure. Since you are talking about checking a list of items in arbitrary order against another list of items that can be in arbitrary order, there are no shortcuts around getting the reference list (stringA) into a data structure, and then looking up the stringB list in that data structure.
One way to speed it up.
Explode stringA into an array.
array flip the stringA array so that the colors become keys in the array (it does not matter what the values are).
Now you can look up each color from an exploded String B by with code like the following:
Something like this:
$stringAArray = explode(',', $stringA);
$stringAArray = array_flip($stringAArray);
$stringBArray = explode(',',$stringB);
$itemsToFind = count($stringBArray);
foreach ($stringBArray as $colorFromB) {
if (array_key_exists($colorFromB, $stringAArray)) {
$itemsToFind--;
}
}
if ($itemsToFind == 0) {
echo "All B items are in A"
}
This is a very fast lookup and scales well for lots of items in A and B.
Final note: for smallish arrays, doing it via in_array is going to be comparably fast.

Compare All strings in a array to all strings in another array, PHP

What i am trying to do is really but i am going into a lot of detail to make sure it is easily understandable.
I have a array that has a few strings in it. I then have another that has few other short strings in it usually one or two words.
I need it so that if my app finds one of the string words in the second array, in one of the first arrays string it will proceed to the next action.
So for example if one of the strings in the first array is "This is PHP Code" and then one of the strings in the second is "PHP" Then it finds a match it proceeds to the next action. I can do this using this code:
for ( $i = 0; $i < count($Array); $i++) {
$Arrays = strpos($Array[$i],$SecondArray[$i]);
if ($Arrays === false) {
echo 'Not Found Array String';
}
else {
echo 'Found Array String';
However this only compares the First Array object at the current index in the loop with the Second Array objects current index in the loop.
I need it to compare all the values in the array, so that it searches every value in the first array for the First Value in the second array, then every value in the First array for the Second value in the second array and so on.
I think i have to do two loops? I tried this but had problems with the array only returning the first value.
If anyone could help it would be appreciated!
Ill mark the correct answer and + 1 any helpful comments!
Thanks!
Maybe the following is a solution:
// loop through array1
foreach($array1 as $line) {
// check if the word is found
$word_found = false;
// explode on every word
$words = explode(" ", $line);
// loop through every word
foreach($words as $word) {
if(in_array($word, $array2)) {
$word_found = true;
break;
}
}
// if the word is found do something
if($word_found) {
echo "There is a match found.";
} else {
echo "No match found."
}
}
Should give you the result you want. I'm absolute sure there is a more efficient way to do this.. but thats for you 2 find out i quess.. good luck
You can first normalize your data and then use PHP's build in array functions to get the intersection between two arrays.
First of all convert each array with those multiple string with multiple words in there into an array only containing all words.
A helpful function to get all words from a string can be str_word_count.
Then compare those two "all words" arrays with each other using array_intersect.
Something like this:
$words1 = array_unique(str_word_count(implode(' ', $Array), 1));
$words2 = array_unique(str_word_count(implode(' ', $SecondArray), 1));
$intersection = array_intersect($words1, $words2);
if(count($intersection))
{
# there is a match!
}
function findUnit($packaging_units, $packaging)
{
foreach ($packaging_units as $packaging_unit) {
if (str_contains(strtoupper($packaging[3]), $packaging_unit)) {
return $packaging_unit;
}
}
}
Here First parameter is array and second one is variable to find

Remove composed words

I have a list of words in which some are composed words, in example
palanca
plato
platopalanca
I need to remove "plato" and "palanca" and let only "platopalanca".
Used array_unique to remove duplicates, but those composed words are tricky...
Should I sort the list by word length and compare one by one?
A regular expression is the answer?
update: The list of words is much bigger and mixed, not only related words
update 2: I can safely implode the array into a string.
update 3: I'm trying to avoid doing this as if this was a bobble sort. there must be a more effective way of doing this
Well, I think that a buble-sort like approach is the only possible one :-(
I don't like it, but it's what i have...
Any better approach?
function sortByLengthDesc($a,$b){
return strlen($a)-strlen($b);
}
usort($words,'sortByLengthDesc');
$count = count($words);
for($i=0;$i<=$count;$i++) {
for($j=$i+1;$j<$count;$j++) {
if(strstr($words[$j], $words[$i]) ){
$delete[]=$i;
}
}
}
foreach($delete as $i) {
unset($words[$i]);
}
update 5: Sorry all. I'm A moron. Jonathan Swift make me realize I was asking the wrong question.
Given x words which START the same, I need to remove the shortests ones.
"hot, dog, stand, hotdogstand" should become "dog, stand, hotdogstand"
"car, pet, carpet" should become "pet, carpet"
"palanca, plato, platopalanca" should become "palanca, platopalanca"
"platoother, other" should be untouchedm they both start different
I think you need to define the problem a little more, so that we can give a solid answer. Here are some pathological lists. Which items should get removed?:
hot, dog, hotdogstand.
hot, dog, stand, hotdogstand
hot, dogs, stand, hotdogstand
SOME CODE
This code should be more efficient than the one you have:
$words = array('hatstand','hat','stand','hot','dog','cat','hotdogstand','catbasket');
$count = count($words);
for ($i=0; $i<=$count; $i++) {
if (isset($words[$i])) {
$len_i = strlen($words[$i]);
for ($j=$i+1; $j<$count; $j++) {
if (isset($words[$j])) {
$len_j = strlen($words[$j]);
if ($len_i<=$len_j) {
if (substr($words[$j],0,$len_i)==$words[$i]) {
unset($words[$i]);
}
} else {
if (substr($words[$i],0,$len_j)==$words[$j]) {
unset($words[$j]);
}
}
}
}
}
}
foreach ($words as $word) {
echo "$word<br>";
}
You could optimise this by storing word lengths in an array before the loops.
You can take each word and see, if any word in array starts with it or ends with it. If yes - this word should be removed (unset()).
You could put the words into an array, sort the array alphabetically and then loop through it checking if the next words start with the current index, thus being composed words. If they do, you can remove the word in the current index and the latter parts of the next words...
Something like this:
$array = array('palanca', 'plato', 'platopalanca');
// ok, the example array is already sorted alphabetically, but anyway...
sort($array);
// another array for words to be removed
$removearray = array();
// loop through the array, the last index won't have to be checked
for ($i = 0; $i < count($array) - 1; $i++) {
$current = $array[$i];
// use another loop in case there are more than one combined words
// if the words are case sensitive, use strpos() instead to compare
while ($i < count($array) && stripos($array[$i + 1], $current) === 0) {
// the next word starts with the current one, so remove current
$removearray[] = $current;
// get the other word to remove
$removearray[] = substr($next, strlen($current));
$i++;
}
}
// now just get rid of the words to be removed
// for example by joining the arrays and getting the unique words
$result = array_unique(array_merge($array, $removearray));
Regex could work. You can define within the regex where the start and end of the string applies.
^ defines the start
$ defines the end
so something like
foreach($array as $value)
{
//$term is the value that you want to remove
if(preg_match('/^' . $term . '$/', $value))
{
//Here you can be confident that $term is $value, and then either remove it from
//$array, or you can add all not-matched values to a new result array
}
}
would avoid your issue
But if you are just checking that two values are equal, == will work just as well as (and possibly faster than) preg_match
In the event that the list of $terms and $values are huge this won't come out to be the most efficient of strategies, but it is a simple solution.
If performance is an issue, sorting (note the provided sort function) the lists and then iterating down the lists side by side might be more useful. I'm going to actually test that idea before I post the code here.

Categories