look for most matching word - php

I have a table with Tags (words). Each time I want to add a new item (word) to the table, I want to first see words that look the most like the word I am entering, so I could come realize I already have a word in the table that looks like it.
Kind of like using the match() function in Mysql, but I don't want a score of how many words are corresponding. But a score of within a word, how many characters are corresponding.
So something like: select * from tags order by look_a_like_score(#newword)
But is there such a function like look_a_like_score() ?
Example, I already have in table:
Restaurant
Elevator
Swimming pool
Wifi
Now I want to add:
Free swimmer facilities
What I would like to have now is a list with 'Swimming pool' on top, because the part 'swimm' is most matching.
Can you help me do this?
PS. I collect the entire table into PHP and then put them into an array. So a PHP approach is also welcome.

On MySQL side you have soundex, not really working good as I like.
You may want to implement a MySQL module to use levenshtein (you'll need to compile in C either).
On PHP side you have levenshtein() available which is quite decent to have similarity score
You may use too:
soundex() - Calculate the soundex key of a string
similar_text() - Calculate the similarity between two strings
metaphone() - Calculate the metaphone key of a string
Check the manual to know how to use them

You can look here here for an implementation of the levenshtein distance formula this is good for finding the edit distance between to string.
Other things that might work out for you is using soundex or possibly double metaphone to do "Sounds like" matches.

There is no function. But, you can do this with some SQL. Let me assume that #newtag contains your new tag and that you have a numbers table. You can do something like this:
select t.tag, max(len) as biggestmatch
from (select concat('%', substr(#newtag, n1.n, n2.n), '%') as pat,
n1.n as start, n2.n as len
from numbers n1 cross join
numbers n2
where n1.n <= length(#newtag) and n1+n2 <= length(#newtag)
) patterns join
tags t
on t.tag like patterns.pat
group by t.tag
order by max(len)
limit 1 /* you only need this if you want the best one */
I'm not promising that this will perform particularly well. But for a handful of tags and strings that are not too long, it might suit your purposes.

Related

How to not search for exact word match using LIKE?

Hello guys I am using the following query to select data
mysql_query("SELECT * FROM prodcuts WHERE prod_name LIKE '%".$search."%'");
The problem is when you search let's say 'he' it select more than I would like. It takes 'hello' 'helblabla' everything with 'he' in it.
What I would like to do is when let's say you are searching for playstation it accepts the following searches playstatioq, yplaystation, playstationq etc..
So I want to allow the search to differ one letter from the exact product name. Any ideas how this can be done?
Thanks in advance
Sounds like what you want is the Levenshtein distance. Check out MySQL Levenshtein here on StackOverflow for more information.
Expanding on what Glen Solsberry wrote, you can use an '_' (underscore) for a single letter wild card search. For example:
"SELECT * FROM prodcuts WHERE prod_name LIKE '%_".$search."_%'"
If your $search var is 'laystatio', this query will match 'playstation', 'yplaystation', 'playstationq' and 'playstatioq'.
If your $search var is 'playstation', or 'playstatioq', or 'yplaystation', using soundex, this next query should return the result you are after, plus a heap that are a close match:
"SELECT * FROM prodcuts WHERE SOUNDEX(prod_name) LIKE SOUNDEX('".$search."')"
You could potentially solve your problem obliquely by building an index with something like Elastic Search or Solr that provides fuzzy matching for full text queries.
Depending on the scale of your problem, efficiency requirements, and how fuzzy you need your matches to be, it could save you a lot of effort. On the other hand, it could of course be overkill.

Searching a big mysql database with relevance

I'm building a rather large "search" engine for our company intranet, it has 1miljon plus entries
it's running on a rather fast server and yet it takes up to 1 min for some search queries.
This is how the table looks
I tried create an index for it, but it seems as if i'm missing something, this is how the show index is showing
and this is the query itself, it is the ordering that slows the query mostly but even a query without the sorting is somewhat slow.
SELECT SQL_CALC_FOUND_ROWS *
FROM `businessunit`
INNER JOIN `businessunit-postaddress` ON `businessunit`.`Id` = `businessunit-postaddress`.`BusinessUnit`
WHERE `businessunit`.`Name` LIKE 'tanto%'
ORDER BY `businessunit`.`Premium` DESC ,
CASE WHEN `businessunit`.`Name` = 'tanto'
THEN 0
WHEN `businessunit`.`Name` LIKE 'tanto %'
THEN 1
WHEN `businessunit`.`Name` LIKE 'tanto%'
THEN 2
ELSE 3
END , `businessunit`.`Name`
LIMIT 0 , 30
any help is very much appreciated
Edit:
What's choking this query 99% is ordering by relevance with the wildcharacter %
When i Do an explain it says using where; using fsort
You should try sphinx search solution which is full-text search engine will give you very good performance along with lots of options to set relevancy.
Click here for more details.
Seems like the index doesn't cover Premium, yet that is the first ORDER BY argument.
Use EXPLAIN your query here to figure out the query plan and change your index to remove any table scans as explained in http://dev.mysql.com/doc/refman/5.0/en/using-explain.html
MySQL is good for storing data but not great when it comes down to fast text based search.
Apart from Sphinx which has been already suggested I recommend two fantastic search engines:
Solr with http://pecl.php.net/package/solr - very popular search engine. Used on massive services like NetFlix.
Elastic Search - relatively new software but with very active community and lots of respect
Both solution are based on the same library Apache Lucene
If the "ORDER BY" is really the bottleneck, the straight-forward solution would be to remove the "ORDER BY" logic from your query, and re-implement the sorting directly in your application's code using C# sorting. Unfortunately, this means you'd also have to move your pagination into your application, since you'd need to obtain the complete result set before you can sort & paginate it. I'm just mentioning this because no-one else so far appears to have thought of it.
Frankly (like others have pointed out), the query you showed at the top should not need full-text indexing. A single suffix wildcard (e.g., LIKE 'ABC%') should be very effective as long as a BTREE (and not a HASH) index is available on the column in question.
And, personally, I have no aversion to even double-wildcard (e.g., LIKE '%ABC%"), which of course can never make use of indexes, as long as a full table scan is cheap. Probably 250,000 rows is the point where I'll start to seriously consider full-text indexing. 100,000 is definitely no problem.
I always make sure my SELECT's are dirty-reads, though (no transactionality applied to the select).
It's dirty once it gets to the user's eyeballs in any case!
Most of the search engine oriended sites are use FULL-TEXT-SEARCH.
It will be very faster compare to select and LIKE...
I have added one example and some links ...
I think it will be useful for you...
In this full text search have some conditions also...
STEP:1
CREATE TABLE articles (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
title VARCHAR(200),
body TEXT,
FULLTEXT (title,body)
);
STEP:2
INSERT INTO articles (title,body) VALUES
('MySQL Tutorial','DBMS stands for DataBase ...'),
('How To Use MySQL Well','After you went through a ...'),
('Optimizing MySQL','In this tutorial we will show ...'),
('1001 MySQL Tricks','1. Never run mysqld as root. 2. ...'),
('MySQL vs. YourSQL','In the following database comparison ...'),
('MySQL Security','When configured properly, MySQL ...');
STEP:3
Natural Language Full-Text Searches:
SELECT * FROM articles
WHERE MATCH (title,body) AGAINST ('database');
Boolean Full-Text Searches
SELECT * FROM articles WHERE MATCH (title,body)
AGAINST ('+MySQL -YourSQL' IN BOOLEAN MODE);
Go through this links
viralpatel.net,devzone.zend.com,sqlmag.com,colorado.edu,en.wikipedia.org
It's so strange query :)
Let's try to understand what it does.
The results are less than 30 rows from the table "businessunit" with some conditions.
The first condition is a foreign key of the "businessunit-postaddress" table.
Please check if you have an index on the column businessunit-postaddress.BusinessUnit.
The second one is a filter for returning rows only with businessunit.Name begining with 'tanto'.
If I didn't make a mistake you have a very complex index 'Business' consists of 11 fields!
And field 'Name' is not the first field in this index.
So this index is useless when you run "like tanto%"'s query.
I have strong doubt about necessity of this index at all.
By the way it demands quite big resources for its maintaining and slow down edit operations with this table.
You have to make an index with the only field 'Name'.
After filtering the query is sorting results and do it in some strange way too.
At first it sorts by field businessunit.Premium - it's normal.
However next statements with CASE are useless too.
That's why.
The zero are assigned to Name = 'tanto' (exactly).
The next rows with the one are rows with space after 'tanto' - these will be after 'tanto' in any cases (except special symbols) cause space is lower than any letter.
The next rows with the two are rows with some letters after 'tanto' (include space!). These rows will be in this order too by definition.
And the three is "reserved" for "other" rows but you won't get "other" rows - remeber about [WHERE businessunit.Name LIKE 'tanto%'] condition.
So this part of ORDER BY is meaningless.
And at the end of ORDER BY there is businessunit.Name again...
My advice: you need rebuild the query from scratch keeping in mind what you want to get.
Anyway I guess you can use
SELECT SQL_CALC_FOUND_ROWS *
FROM `businessunit`
INNER JOIN `businessunit-postaddress` ON `businessunit`.`Id` = `businessunit-postaddress`.`BusinessUnit`
WHERE `businessunit`.`Name` LIKE 'tanto%'
ORDER BY `businessunit`.`Premium` DESC,
`businessunit`.`Name`
LIMIT 0 , 30
Don't forget about an index on field businessunit-postaddress.BusinessUnit!
And I have strong assumption about field Premium.
I guess it is designed for storing binary data (yes/no).
So an ordinary (BTREE) index doesn't match.
You have to use bitmap index.
P.S. I'm not sure that you really need to use SQL_CALC_FOUND_ROWS
MySQL: Pagination - SQL_CALC_FOUND_ROWS vs COUNT()-Query
Its either full-text(http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html) or the pattern matching (http://dev.mysql.com/doc/refman/5.0/en/pattern-matching.html) from php and mysql side.
From experience and theory:
Advantages of full-text -
1) Results are very relevant and de-limit characters like spacing in the search query does not hinder the search.
Disadvantages of full-text -
1) There are stopwords that are used as restrictions by webhosters to prevent excess load of data.(E.g. search results containing the word 'one' or 'moz' are not displayed. And this can be avoided if you're running your own server by keeping no stopwords.
2) If I type 'ree' it only displays words containing exactly 'ree' not 'three' or 'reed'.
Advantages of pattern matching -
1) It does not have any stopwords as in full-text and if you search for 'ree', it displays any word containing 'ree' like 'reed' or 'three' unlike fulltext where only the exact word is retreived.
Disadvantages of pattern matching-
1) If delimiters like spaces are used in your search words and if these spaces are not there in the results, because each word is separate from any delimiters, then it returns no result.
If the argument of LIKE doesn't begin with a wildchard character, like in your example, LIKE operator should be able to take advantage of indexes.
In this case, LIKE operator should perform better than LOCATE or LEFT, so I suspect that changing the condition like this could make things worse, but I still think it's worth trying (who knows?):
WHERE LOCATE('tanto', `businessunit`.`Name`)=1
or:
WHERE LEFT(`businessunit`.`Name`,5)='tanto'
I would also change your order by clause:
ORDER BY
`businessunit`.`Premium` DESC ,
CASE WHEN `businessunit`.`Name` LIKE 'tanto %' THEN 1
WHEN `businessunit`.`Name` = 'tanto' THEN 0
ELSE 2 END,
`businessunit`.`Name`
Name has to be LIKE 'tanto%' already, so you can skip a condition (CASE will never return value 3). Of course, make sure that Premium field is indexed.
Hope this helps.
I think you need to collect the keys only, sort them, then join last
SELECT A.*,B.* FROM
(
SELECT * FROM (
SELECT id BusinessUnit,Premium
CASE
WHEN Name = 'tanto' THEN 0
WHEN Name LIKE 'tanto %' THEN 1
WHEN Name LIKE 'tanto%' THEN 2
ELSE 3
END SortOrder
FROM businessunit Name LIKE 'tanto%'
) AA ORDER BY Premium,SortOrder LIMIT 0,30
) A LEFT JOIN `businessunit-postaddress` B USING (BusinessUnit);
This will still generate a filesort.
You may want to consider preloading the needed keys in a separate table you can index.
CREATE TABLE BusinessKeys
(
id int not null auto_increment,
BusinessUnit int not null,
Premium int not null,
SortOrder int not null,
PRIMARY KEY (id),
KEY OrderIndex (Premuim,SortOrder,BusinessUnit)
);
Populate all keys that match
INSERT INTO BusinessKeys (BusinessUnit,Premuim,SortOrder)
SELECT id,Premium
CASE
WHEN Name = 'tanto' THEN 0
WHEN Name LIKE 'tanto %' THEN 1
WHEN Name LIKE 'tanto%' THEN 2
ELSE 3
END
FROM businessunit Name LIKE 'tanto%';
Then, to paginate, run LIMIT on the BusinessKeys only
SELECT A.*,B.*
FROM
(
SELECT FROM BusinessKeys
ORDER BY Premium,SortOrder
LIMIT 0,30
) BK
LEFT JOIN businessunit A ON BK.BusinessUnit = A.id
LEFT JOIN `businessunit-postaddress` B ON A.BusinessUnit = B.BusinessUnit
;
CAVEAT : I use LEFT JOIN instead of INNER JOIN because LEFT JOIN preserves the order of the keys from the left side of the query.
I've read the answer to use Sphinx to optimize the search. But regarding my experience I would advise a different solution. We used Sphinx for some years and had a few nasty problems with segmentation faults and corrupted indice. Perhaps Sphinx isn't as buggy as a few years before, but for a year now we are very happy with a different solution:
http://www.elasticsearch.org/
The great benefits:
Scalability - you can simply add another server with nearly zero configuration. If you know mysql replication, you'll love this feature
Speed - Even under heavy load you get good results in much less than a second
Easy to learn - Only by knowing HTTP and JSON you can use it. If you are a Web-Developer, you feel like home
Easy to install - it is useable without touching the configuration. You just need simple Java (no Tomcat or whatever) and a Firewall to block direct access from the public
Good Javascript integration - even a phpMyAdmin-like Tool is a simple HTML-Page using Javascript: https://github.com/mobz/elasticsearch-head
Good PHP Integration with https://github.com/ruflin/Elastica
Good community support
Good documentation (it is not eye friendly, but it covers nearly every function!)
If you need an additional storage solution, you can easily combine the search engine with http://couchdb.apache.org/

How to find 'similar' records in a MySQL table based on 'title' and 'description' columns?

I have a MySQL table storing some user generated content. For each piece of content, I have a title (VARCHAR 255) and a description (TEXT) column.
When a user is viewing a record, I want to find other records that are 'similar' to it, based on the title/description being similar.
What's the best way to go about doing this? I'm using PHP and MySQL.
My initial ideas are:
1) Either to strip out common words from the title and description to be left with 'unique' keywords, and then find other records which share those keywords.
E.g in the sentence: "Bob woke up at 5 am and went to school", the keywords would be: "Bob, woke, 5, went, school". Then if there's another record whose title talks about 'bob' and 'school', they would be considered 'similar'.
2) Or to use MySQL's full text search, though I don't know if this would be any good for something like this?
Which method would be better out of the two, or is there another method which is even better?
I'll keep this short (it could be way too long)...
I would not select they keywords 'manually' or modify your original data.
MySQL supports full text search with MyISAM (not InnoDB) engine. A full description of the options available when querying the DB are available here. The query can automatically get rid of common stop-words and words too common in the data set (more than 50% of the rows contains them) depending on the querying method. Query expansion is also available and the query type should be decided depending on your needs.
Consider also using a separate engine like Lucene. With Lucene you will probably have more functionalities and better indexing/searching. You can automatically get rid of common words (they get a low score and do not influence the search) and use things as stemming for instance. There is a little bit of a learning curve but I'll definitely look into it.
EDIT:
The MySQL 'full-text natural language search' returns the most similar rows (and their relevance score) and is not a boolean matching search.
You would start by defining what similar means to you and how you want to score the similarity between two different documents.
Using that algorithm you can processing all your documents and build a table of similarity scores.
Depending on the complexity of your scoring algorithm and size of data set, this may not be something you would run realtime, but instead batch it through something like Hadoop.
I have done something like this. I replace all of the spaces in the string with % then use LIKE in the where clause. Here, I will give you my code. It is from MSSQL but minor adjustments can be made to work it with MySQL. Hope it helps.
CREATE FUNCTION [dbo].[fss_MakeTextSearchable] (#text NVARCHAR(MAX)) RETURNS NVARCHAR(MAX)
--replaces spaces with wildcard characters to return more matches in a LIKE condition
-- for example:
-- #text = 'my file' will return '%my%file%'
-- SELECT WHERE 'my project files' like #text would return true
AS
BEGIN
DECLARE #searchableText NVARCHAR(MAX)
SELECT #searchableText = '%' + replace(#text, ' ', '%') + '%'
RETURN #searchableText
END
Then use the function like this:
SELECT #searchString = dbo.fss_MakeTextSearchable(#String)
Then in your query:
Select * from Table where title LIKE #searchString

Relevancy search with PHP and Mysql

Whats the best way to go around doing this?
I have columns: track_name, artist_name, album_name
I want all columns to be matched against the search query. and some flexibility while matching.
mysql like is too strict, even with %XXX%. It matches the string as a whole, not the parts.
Your MySQL query could have several OR clauses, searching for each space-delimited word entered by the user. For example, a user search for "Queens of the Stoneage" may be represented in SQL as SELECT * FROM songs WHERE artist_name LIKE "%Queens%" OR artist_name LIKE "Stoneage".
However, that could be undesirable because LIKE searches which start with an % are inefficient and could be terribly slow on a large database.
Though I can't speak to the performance implications, you should have a look at natural language full-text searches. It's probably the most effective solution you'll find:
SELECT * FROM songs WHERE MATCH(track_name, artist_name, album_name) AGAINST('Queens of the Stoneage' IN NATURAL LANGUAGE MODE);
Some PHP functions do exist for determining the similarity of strings of text, but keeping this work in the database will probably be most efficient (and less frustrating):
levenshtein()
similar_text()
soundex()
I think you need to rethink your application. What I understood from your comment is that you need to implement some logic operator like "and", "or" and "not" in your program. It's not only about fancy algorithm like fulltext index or longest common substring like this mysql match query. But I can be wrong.

How to find similarity between mySQL rows?

I am trying to create a script that finds a matching percentage between my table rows. For example my mySQL database in the table products contains the field name (indexed, FULLTEXT) with values like
LG 50PK350 PLASMA TV 50" Plasma TV Full HD 600Hz
LG TV 50PK350 PLASMA 50"
LG S24AW 24000 BTU
Aircondition LG S24AW 24000 BTU Inverter
As you may see all of them have some same keyword. But the 1st name and 2nd name are more similar. Additionally, 3rd and 4th have more similar keywords between them than 1st and 2nd.
My mySQL DB has thousands of product names. What I want is to find those names that have more than a percentage (let's say 60%) of similarity.
For example, as I said, 1st, 2nd (and any other name) that match between them with more than 60%, will be echoed in a group-style-format to let me know that those products are similar. 3rd and 4th and any other with more than 60% matching will be echoed after in another group, telling me that those products match.
If it is possible, it would be great to echo the keywords that satisfy all the grouped matching names. For example LG S24AW 24000 BTU is the keyword that is contained in 3rd and 4th name.
At the end I will create a list of all those keywords.
What I have now is the following query (as Jitamaro suggested)
Select t1.name, t2.name From products t1, products t2
that creates a new name field next to all other names. Excuse me that I don't know how to explain it right but this is what it does: (The real values are product names like above)
Before the query
-name-
A
B
C
D
E
After the query
-name- -name-
A A
B A
C A
D A
E A
A B
B B
C B
D B
E B
.
.
.
Is there a way either with mySQL or PHP that will find me the matching names and extract the keywords as I described above? Please share code examples.
Thank you community.
Query the DB with LIKE OR REGEXP:
SELECT * FROM product WHERE product_name LIKE '%LG%';
SELECT * FROM product WHERE product_name REGEXP "LG";
Loop the results and use similar_text():
$a = "LG 50PK350 PLASMA TV 50\" Plasma TV Full HD 600Hz"; // DB value
$b = "LG TV 50PK350 PLASMA 50\"" ; // USER QUERY
$i = similar_text($a, $b, $p);
echo("Matched: $i Percentage: $p%");
//outputs: Matched: 21 Percentage: 58.3333333333%
Your second example matches 62.0689655172%:
$a = "LG S24AW 24000 BTU"; // DB value
$b = "Aircondition LG S24AW 24000 BTU Inverter" ; // USER QUERY
$i = similar_text($a, $b, $p);
echo("Matched: $i Percentage: $p%");
You can define a percentage higher than, lets say, 40%, to match products.
Please note that similar_text() is case SensItivE so you should lower case the string.
As for your second question, the levenshtein() function (in MySQL) would be a good candidate.
When I look at your examples, I consider how I would try to find similar products based on the title. From your two examples, I can see one thing in each line that stands out above anything else: the model numbers. 50PK350 probably doesn't show up anywhere other than as related to this one model.
Now, MySQL itself isn't designed to deal with questions like this, but some bolt-on tools above it are. Part of the problem is that querying across all those fields in all positions is expensive. You really want to split it up a certain way and index that. The similarity class of Lucene will grant a high score to words that rarely appear across all data, but do appear as a high percentage of your data. See High level explanation of Similarity Class for Lucene?
You should also look at Comparison of full text search engine - Lucene, Sphinx, Postgresql, MySQL?
Scoring each word against the Lucene similarity class ought to be faster and more reliable. The sum of your scores should give you the most related products. For the TV, I'd expect to see exact matches first, then some others of the same size, then brand, then TVs in general, etc.
Whatever you do, realize that unless you alter the data structures by using another tool on top of the SQL system to create better data structures, your queries will be too slow and expensive. I think Lucene is probably the way to go. Sphinx or other options not mentioned may also be up for consideration.
This is trickier than it seems and there is information missing in your post:
How are people going to use this auto-complete function?
Is it relevant that you can find all names for a product? Because apparently not all stores name their products similarly so a clerk might not be able to find the product (s)he found.
Do you have information about which product names are for the same product?
Is it relevant from which store you're searching? where is this auto-complete used?
Should the auto-complete really only suggest products that match all the words you typed? (it's not so hard, technically, to correct typos)
I think you need a more clear picture of what you (or better yet: the users) want this auto-complete function to do.
An auto-complete function is very much a user-friendly type feature. It aids the user, possibly in a fuzzy way so there is no single right answer. You have to figure out what works best, not what is easiest to do technically.
First figure out what you want, then worry about technology.
One possible solution is to use Damerau-Levenstein distance. It could be used like this
select *
from products p
where DamerauLevenstein(p.name, '*user input here*')<=*X*
You'll have to figure out X that suites your needs best. It should be integer greater than zero. You could have it hard-coded, parameterized or calculated as needed.
The trickiest thing here is DamerauLevenstein. It has to be stored procedure, that implements Damerau-Levenstein algorithm. I don't have MySQL here, so I might write it for you later this day.
Update: MySQL does not support arrays in stored procedures, so there is no way to implement Damerau-Levenstein in MySQL, except using temporary table for each function call. And that will result in terrible performance. So you have two options: loop through the results in PHP with levenstein like Alix Axel suggests, or migrate your database to PostgreSQL, where arrays are supported.
There is also an option to create User-Defined function, but this requires writing this function in C, linking it to MySQL and possibly rebuilding MySQL, so this way you'll just add more headache.
Your approach seems sound. For matching similar products, I would suggest a trigram search. There's a pretty decent explanation of how this works along with the String::Trigram Perl module.
I would suggest using trigram search to get a list of matches, perhaps coupled with some manual review depending on how much data you have to deal with and how frequent you need to add new products. I've found this approach to work quite well in practice.
Maybe you want to find the longest common substring from the 2 strings? Then you need to compute a suffix tree for each of your strings see here http://en.wikipedia.org/wiki/Longest_common_substring_problem.
If you want to check all names against each other you need a cross join in mysql. There are many ways to achieve this:
1. Select a, b From t1, t2
2. Select a, b From t1 Join t2
3. Select a, b From t1 Cross Join t2
Then you can loop through the result. This is the same when I say create a 2d array with n^2-(n-1) elements and each element is connected with each other.
P.S.: Select t1.name, t2.name From products t1, products t2
It sounds like you've gone through all this trouble to explain a complex scenario, then said that you want to ignore the optimal answers and just get us to give you the "handshake" protocol (everything is compared to everything that hasn't been compared to it yet). So... pseudocode:
select * from table order by id
while (result) {
select * from table where id > result_id
}
That will do it.
If your database simply had a UPC code as one of it's fields, and this field was well-maintained, i.e., you could trust that it was entered correctly by the database maintainer and correctly reflected what the item was -- then you wouldn't need to do all of the work you suggest.
An even better idea might be to have a UPC field in your next database -- and constrain it as unique.
Database users attempt to put an-already-existing UPC into the database -- they get an error.
Database maintains its integrity.
And if such a database maintained its integrity -- the necessity of doing what you suggest never arises.
This probably doesn't help much with your current task (apologies) -- but for a future similar database -- you might wish to think about it...
I`d advise you to use some fulltext search engine, like sphinx. It has possibilities to implement any algorithm you want. For example, you may use "quorom" or "any" searches.
It seems that you might always want to return the shortest string?? That's more or a question than anything. But then you might have something like...
SELECT * FROM products LIMIT 1
WHERE product_name like '%LG%'
ORDER BY LENGTH(product_name) ASC
This is a clustering problem, which can be resolved by a data mining method. ( http://en.wikipedia.org/wiki/Cluster_analysis) It requires a lot of memory and computation intensive operations which is not suitable for database engine. Otherwise, separate data mining, text mining, or business analytics software wouldn't have existed.
This question is similar :) to this one:
What is the best way to implement a substring search in SQL?
Trigram can easily find similar rows, and in that question i posted a php+mysql+trigram solution.
You can use LIKE to find similar product names within the table. For example:
SELECT * FROM product WHERE product_name LIKE 'LG%';
Here is another idea (but I'm voting for levenshtein()):
Create a temporary table of all words used in names and their frequencies.
Choose range of results (most popular words are probably words like LCD or LED, most unique words could be good, they might be product actual names).
Suggest for each of result words either:
results with those words
results containing longest substring (like this: http://forums.mysql.com/read.php?10,277997,278020#msg-278020 ) of those words.
Ok, I think I was trying to implement very much similar thing. It can work the same as the google chrome address box. When you type the address it gives you the suggestions. This is what you are trying to achieve as far I am concerned.
I cannot give you exact solution to that but some advice.
You need to implement the dropdown box where someone starts to enter the product they are looking for
Then you need to get the current value of the dropdown and then run query like guy posted above. Can be "SELECT * FROM product WHERE product_name LIKE 'LG%';"
Save results of the query
Refresh the page
Add the results of the query to the dropdown
Note:
You need to save the query results somewhere like the text file with the HTML code i.e. "option" LG TS 600"/option" (add <> brackets to option of course). This values will be used for populating your option box after the page refresh. You need to set up the users session for the user to get the same results for the same user, otherwise if more users would use the search at the same time it could clash. So, with the search id and session id you can match them then. You can save it in the file or the table. Table would be more convenient. It is actually in my sense the whole subsystem for that what are you looking for.
I hope it helps.

Categories