Is there any possible code for getting output in php(all possible word from word dictionary)
for example....for word "werflo"
flower
fowler
reflow
wolfer
Take your word list, order each word's letters (alphebetical or otherwise, as long as it's consistent).
Associate each word with its ordered letter string
Apply the same letter ordering to your input
Find the matching words, which is now trivial as you just need to find those where the ordered letter sequence matches.
I don't know PHP, but you could
pre-sort all the words in the dictionary, remembering their original position (for example, "flower" will be stored as "eflorw"); sort the dictionary lexocographically;
sort the letters in your input word the same way;
with binary search find the sorted word within the sorted dictionary;
by the stored index, find the original words in the original dictionary.
Related
So I'm doing a random MATCH AGAINST for phrases/words.
I've got these phrases in the database
WHERE
MATCH (keywords.keyword) AGAINST ('$keyword*' IN BOOLEAN MODE)
how are you?
how will you?
When the search/$keyword is identical to one of the phrases - the other one still gets selected from time to time.
An identical match should be that, identical right?
Is it because the search is matching a single word and not the entire string/phrase?
Can't see how else to solve this.Any help is greatly appreciated! Thanks.
You need to learn about stop words and minimum word length. These are key parameters that control what "words" get indexed.
The stop word list consists of common words, such as "are", that are ignored in the index (and in searching).
The minimum word length is the minimum length of a word. It is 3 (innodb_ft_min_token_size = 3) or 4 (ft_min_word_len = 4).
Both of these can be overridden, but you have to rebuild the index.
I have a content description and few listed words ("Google" and "Gmail"). Now if these words appear in content description then I have to replace them with their links. I have created a regular expression and replaced them successfully using preg_match. But now I want to limit them. for example:
If 2 found words are very close them this will not be replaced.
My description is as follow:
"This is my description for Google and Gmail. I need to replace Google with its link and also Gmail"
Now my requirement is, First Gmail should not be replaced because first "Google" is very near to it (1 word distance only) and rest of the words should be replaced because the are very far then each other. So my result should be:
This is my description for Google and Gmail. I need to replace Google with its link and also Gmail.
I have used lookahead matching but it is not working.
Ok I got the solution.
I used preg_match_all for each word one by one, then maintained an array of matched words with offset (PREG_OFFSET_CAPTURE).
Now I managed a list of all matched words with position and sort that list according to word's weight. Now we can use any algorithm to track nearest replacement in text. I did the following:
1: Replace first list word in body and maintain a temp tracking array with position of this word.
2: For second word in list, first check the temp tracking array and find nearest position of second word. Now you can find words between first word and second word using str_word_count function.
3: Now do this for all words in list.
I am in the process of learning MySQL and querying, and right now working with PHP to begin with.
For learning purposes I chose a small anagram solver kind of project to begin with.
I found a very old English language word list on the internet freely available to use as the DB.
I tried querying, find in set and full-text search matching but failed.
How can I:
Match a result letter by letter?
For example, let's say that I have the letters S-L-A-O-G to match against the database entry.
Since I have a large database which surely contains many words, I want to have in return of the query:
lag
goal
goals
slag
log
... and so on.
Without having any other results which might have a letter used twice.
How would I solve this with SQL?
Thank you very much for your time.
$str_search = 'SLAOG';
SELECT word
FROM table_name
WHERE word REGEXP '^[{$str_search}]+$' # '^[SLAOG]+$'
// Filter the results in php afterwards
// Loop START
$arr = array();
for($i = 0; $i < strlen($row->word); $i++) {
$h = substr($str_search, $i, 0);
preg_match_all("/{$h}/", $row->word, $arr_matches);
preg_match_all("/{$h}/", $str_search, $arr_matches2);
if (count($arr_matches[0]) > count($arr_matches2[0]))
FALSE; // Amount doesn't add up
}
// Loop END
Basicly run a REGEXP on given words and filter result based on how many occurencies the word compared with the search word.
The REGEXP checks all columns, from beginning to end, with a combination of given words. This may result in more rows then you need, but it will give a nice filter nonetheless.
The loop part is to filter words where a letter is used more times then in the search string. I run a preg_match_all() on each letter in found the word and the search word to check the amount of occurencies, and compare them with count().
If you want a quick and dirty solution....
Split the word you're trying to get anagrams for into individual letters. Assign each letter an individual prime number value, and multiply them all together; eg:
C - 2
A - 3
T - 5
For a total of 30
Then step through your dictionary list, and do the same operation on each word in that. If your target word's value is divisible exactly by the dictionary word's value, then you know that the dictionary word has only letters that occur in your target word.
You can speed it up by pre-calculating the dictionary values, and then querying for just the right values:
SELECT * FROM dictionary WHERE ($searchWordTotal % wordTotal) = 0
(searchWordTotal is the total for the word you're looking for, and wordTotal is the one from the database)
I should get around to writing this properly one of these days....
since you only want words with the letters given, and no others, but you dont need to use all the letters, then i suggest logic like this:
* take your candidate word,
* do a string replace of the first occurrence of each letter in your match set,
* set the new value to null
* then finally wrap all that in a strlength to see if there are any characters left.
you can do all that in sql - but a little procedure will probably look more familiar to most coders.
I have a list of words in a dropdown and I have a single word that is looking for a suiting partner(user is choosing it)
To make this easier for the user(because the list can be very long and the porcess has to be fast) I want to give a possible option.
I already had a look how i van change the selected word.
I want to find the alphabetically "nearest" option but i have no idear how i could find out which word is the nearest neigbore....
I already googled with all words I could think of to get a solution but I couldnĀ“t find something.
Does someone have an idear how i can do it?
The levenshtein function will compute the 'closeness' of 2 string. You could rank the words you have relative to user's string and return the string with the lowest value.
have a look at this library, it contains Fuzzy string matching functions for javascript, including stemming, lehvenstein distance and metaphones: http://code.google.com/p/yeti-witch/
If by alphabetically you mean matching letters read from the left, the answer is easy. Simply go through every letter of the word and compare it with the ones in the select drop down. The word that shares the longest starting substring is your "nearest".
The simplest (and probably fastest) thing in javascript is finding (by binary search) where to put the word in sorted array of your option words using < and > string operators.
For more advanced and precise results, use Levenshtein distance
So I have a database of words between 3 and 20 characters long. I want to code something in PHP that finds all of the smaller words that are contained within a larger word. For example, in the word "inward" there are the words "rain", "win", "rid", etc.
At first I thought about adding a field to the Words tables (Words3 through Words20, denoting the number of letters in the words), something like "LetterCount"... for example, "rally" would be represented as 10000000000200000100000010: 1 instances of the letter A, 0 instances of the letter B, ... 2 instances of the letter L, etc. Then, go through all the words in each table (or one table if the target length of found words was specified) and compare the LetterCount of each word to the LetterCount of the source word ("inward" in the example above).
But then I started thinking that that would place too much of a load on the MySQL database as well as the PHP script, calling each and every word's LetterCount, comparing each and every digit to that of the source word, etc.
Is there an easier, perhaps more intuitive way of doing this? I'm open to using stored procedures if it will help with overhead in any way. Just some suggestions would be greatly appreciated. Thanks!
Here is a simple solution that should be pretty efficient, but will only work up to certain size of words (probably about 15-20 characters it will break down, depending on whether the letters making up the word are low-frequency letters with lower values or high-frequency letters with higher values):
Assign each letter a prime number according to it's frequency. So e is 2, t = 3, a = 5, etc. using frequency values from here or some similar source.
Precalculate the value of each word in your word list by multiplying the prime values for the letters in the word, and store in the table in a bigint data type column. For instance, tea would have a value of 3*2*5=30. If a word has repeated letters, repeat the factor, so that teat should have a value of 3*2*5*3=90.
When checking if a word, such as rain, is contained inside of another word, such as inward, it's sufficient to check if the value for rain divides the value for inward. In this case, inward = 14213045, rain = 7315, and 14213045 is divisible by 7315, so the word rain is inside the word inward.
A bigint column maxes out at 9223372036854775807, which should be fine up to about 15-20 characters (depending on the frequencies of letters in the word). For instance, I picked up the first 20-letter word from here, which is anitinstitutionalism, and has a value of 6901041299724096525 which would just barely fit inside the bigint column. However, the 14-letter word xylopyrography has a value of 635285791503081662905, which is too big. You might have to handle the really large ones as special cases using an alternate method, but hopefully there's few enough of them that it would still be relatively efficient.
The query would work something like the demo I've prepared here: http://www.sqlfiddle.com/#!2/9bd27/8