Select alphabetically "nearest" option from a dropdown - php

I have a list of words in a dropdown and I have a single word that is looking for a suiting partner(user is choosing it)
To make this easier for the user(because the list can be very long and the porcess has to be fast) I want to give a possible option.
I already had a look how i van change the selected word.
I want to find the alphabetically "nearest" option but i have no idear how i could find out which word is the nearest neigbore....
I already googled with all words I could think of to get a solution but I couldnĀ“t find something.
Does someone have an idear how i can do it?

The levenshtein function will compute the 'closeness' of 2 string. You could rank the words you have relative to user's string and return the string with the lowest value.

have a look at this library, it contains Fuzzy string matching functions for javascript, including stemming, lehvenstein distance and metaphones: http://code.google.com/p/yeti-witch/

If by alphabetically you mean matching letters read from the left, the answer is easy. Simply go through every letter of the word and compare it with the ones in the select drop down. The word that shares the longest starting substring is your "nearest".

The simplest (and probably fastest) thing in javascript is finding (by binary search) where to put the word in sorted array of your option words using < and > string operators.
For more advanced and precise results, use Levenshtein distance

Related

Full text document similarity search

I have big database of articles and I'd like before adding new items to DB check if already similar items exist and if so - group them together, so that later I can easily display them as a group of similar items.
Currently we use very simple, but shockingly very precise and our needs fully satisfying PHP's similar_text() function. The problem is, that before we add an item to DB, we first need to pull X amount of items from DB to then loop through every single one in order to check whether our new item is at least 75% similar to other items in order to group them together. This uses a lot of resources and time that we don't really have.
We use MySQL and Solr for all our queries. I've tried using MySQL Full-Text Search, Solr More like this. Compared to PHPs implementation, they are super fast and efficient, but I just can't get a robust percentage score which PHP similar_text() provides. It is crucial for our grouping to be accurate.
For example using this MySQL query:
SELECT id, body, ROUND(((MATCH(body) AGAINST ('ARTICLE TEXT')) / scores.max_score) * 100) as relevance
FROM natural_text_test,
(SELECT MAX(MATCH(body) AGAINST('ARTICLE TEXT')) as max_score FROM natural_text_test LIMIT 1) scores
HAVING relevance > 75
ORDER BY relevance DESC
i get that article with 130 words is 85% similar with another article with 4700 words. And in comparison PHP's similar_text() returns only 3% similarity score which is well below our threshold and is correct in our case.
I've also looked into Levenshtein distance algorithm, but it seems that the same problem as with MySQL and Solr arises.
There has to be a better way to handle similarity checks, maybe I'm using the algorithms incorrectly?
Based on some of the Comments, I might propose this...
It seems that 75%-similar documents would have a lot of the same sentences in the same order.
Break the doc into sentences
Take a crude hash of each sentence, map it to a visible ascii character. This gives you a string that is, perhaps, 1/100th the size of the original doc.
Store that with the doc.
When searching, use levenshtein() on this string to find 'similar' documents.
Sure, hashing is imperfect, etc. But this is fast. And you could apply some other technique to double-check the few docs that are close.
For a hash, I might do
$md5 = md5($sentence);
$x = somehow get 6 bits out of that hex string
$hash = chr(ord('0' + $x));

SQL MATCH phrase difficulty

So I'm doing a random MATCH AGAINST for phrases/words.
I've got these phrases in the database
WHERE
MATCH (keywords.keyword) AGAINST ('$keyword*' IN BOOLEAN MODE)
how are you?
how will you?
When the search/$keyword is identical to one of the phrases - the other one still gets selected from time to time.
An identical match should be that, identical right?
Is it because the search is matching a single word and not the entire string/phrase?
Can't see how else to solve this.Any help is greatly appreciated! Thanks.
You need to learn about stop words and minimum word length. These are key parameters that control what "words" get indexed.
The stop word list consists of common words, such as "are", that are ignored in the index (and in searching).
The minimum word length is the minimum length of a word. It is 3 (innodb_ft_min_token_size = 3) or 4 (ft_min_word_len = 4).
Both of these can be overridden, but you have to rebuild the index.

mysql: matching a query letter by letter

I am in the process of learning MySQL and querying, and right now working with PHP to begin with.
For learning purposes I chose a small anagram solver kind of project to begin with.
I found a very old English language word list on the internet freely available to use as the DB.
I tried querying, find in set and full-text search matching but failed.
How can I:
Match a result letter by letter?
For example, let's say that I have the letters S-L-A-O-G to match against the database entry.
Since I have a large database which surely contains many words, I want to have in return of the query:
lag
goal
goals
slag
log
... and so on.
Without having any other results which might have a letter used twice.
How would I solve this with SQL?
Thank you very much for your time.
$str_search = 'SLAOG';
SELECT word
FROM table_name
WHERE word REGEXP '^[{$str_search}]+$' # '^[SLAOG]+$'
// Filter the results in php afterwards
// Loop START
$arr = array();
for($i = 0; $i < strlen($row->word); $i++) {
$h = substr($str_search, $i, 0);
preg_match_all("/{$h}/", $row->word, $arr_matches);
preg_match_all("/{$h}/", $str_search, $arr_matches2);
if (count($arr_matches[0]) > count($arr_matches2[0]))
FALSE; // Amount doesn't add up
}
// Loop END
Basicly run a REGEXP on given words and filter result based on how many occurencies the word compared with the search word.
The REGEXP checks all columns, from beginning to end, with a combination of given words. This may result in more rows then you need, but it will give a nice filter nonetheless.
The loop part is to filter words where a letter is used more times then in the search string. I run a preg_match_all() on each letter in found the word and the search word to check the amount of occurencies, and compare them with count().
If you want a quick and dirty solution....
Split the word you're trying to get anagrams for into individual letters. Assign each letter an individual prime number value, and multiply them all together; eg:
C - 2
A - 3
T - 5
For a total of 30
Then step through your dictionary list, and do the same operation on each word in that. If your target word's value is divisible exactly by the dictionary word's value, then you know that the dictionary word has only letters that occur in your target word.
You can speed it up by pre-calculating the dictionary values, and then querying for just the right values:
SELECT * FROM dictionary WHERE ($searchWordTotal % wordTotal) = 0
(searchWordTotal is the total for the word you're looking for, and wordTotal is the one from the database)
I should get around to writing this properly one of these days....
since you only want words with the letters given, and no others, but you dont need to use all the letters, then i suggest logic like this:
* take your candidate word,
* do a string replace of the first occurrence of each letter in your match set,
* set the new value to null
* then finally wrap all that in a strlength to see if there are any characters left.
you can do all that in sql - but a little procedure will probably look more familiar to most coders.

Compare popularity of keywords within string

I want to take a long string (hundreds of thousands of characters) and to compare it against an array of keywords to determine which one of the keywords in the array is mentioned more than the rest.
This seems pretty easy, but I am a bit worried about strstr under performing for this task.
Should I do it in a different way?
Thanks,
I think you can do it in a different way, with a single scan, and if you do it the right way, it can give you a dramatic improvement as of performance.
Create an associative array, where keys are the keywords and values are the occurrences.
Read the string word by word, I mean take a word and put it in a variable. Then, compare it against all the keywords (there are several ways to do it, you can query the associative array with isset). When a keyword is found, increment its counter.
I hope PHP implements associative arrays with some hashmap-like thingie...
Parse the words out in linear fashion. For each word you encounter, increment its count in the associative array of words you are looking for (skipping those you aren't interested in, of course). This will be much faster than strstr.

Random word from giving letter

Is there any possible code for getting output in php(all possible word from word dictionary)
for example....for word "werflo"
flower
fowler
reflow
wolfer
Take your word list, order each word's letters (alphebetical or otherwise, as long as it's consistent).
Associate each word with its ordered letter string
Apply the same letter ordering to your input
Find the matching words, which is now trivial as you just need to find those where the ordered letter sequence matches.
I don't know PHP, but you could
pre-sort all the words in the dictionary, remembering their original position (for example, "flower" will be stored as "eflorw"); sort the dictionary lexocographically;
sort the letters in your input word the same way;
with binary search find the sorted word within the sorted dictionary;
by the stored index, find the original words in the original dictionary.

Categories