I am doing a php mysql search script which is searching in a very big database (over 2 000 000 rows) and i want it to be fast. I want it to have a spell checking and a smart word detection for example to query phone when user input is phrone or hpone. So the best way i found is with REGEXP . But when i use regexp with mysql with a complicated expression it is kind of slow. Do you have any advice for me?
Regexp example for phrone to match phone
[a-zA-Z]*[phrone]{3,}[a-zA-Z]{3,}
Please read this doc:
https://dev.mysql.com/doc/refman/5.7/en/fulltext-search.html
this enables you to perform search for things you have described.
if you want more power and features, please use elasticsearch or other search engine (SOLR), they are blazingly fast and have more features.
Related
I'm busy with a program that needs to find similar text on a webpage. In SQL we have 400.000 search terms. For example, the search terms can be ‘San Miguel Pale Pilsen’, ‘Schaumburger Bali’ and ‘Rizmajer Cortez’.
Now I'm checking each word on the webpage in the database. For each word on the webpage I send a select query with a %like% operator. For each result I use similar text with php. If the word and the search term aren’t equal to the amount of words in it, it will get some extra words of the webpage to make it equal.
(And yes I know that it isn’t smart)
The problem is it takes a lot of time and server must work hard for it.
What is the best and fastest way to find similar text on a webpage?
The LIKE operator will be always slow if you start the pattern with a % wild card. This happens since you are negating the ability of MariaDB to use any indexing.
Considering you need to find words in any location of the VARCHAR column the best solution is to implement bona fide Full Text Search. See MariaDB's Full-Text Index Overview.
Searches will become orders of magnitude faster, not to mention scalability.
I'm finding a solution for search. There are few product with name:
USB Kingston 8GB
USB Kingmax 8GB
USB Transcend 8GB
USB Sandisk 4GB
I'm using mysql database, I've tried FullText Search.
SELECT * FROM PRODUCTS WHERE MATCH('productName') AGAINST ('usb 8g').
and also sphinx but i did't get any results when type "usb 8g". But "usb 8gb", it's worked.
And I also need when user type 'ubs 8gb', it's will return correct results too.
Any solution to auto-recognize like Google ?
You have to use wildcard character % to match part of data string.
Have not tested, but should work like:
SELECT * FROM PRODUCTS WHERE MATCH('productName') AGAINST ('usb 8g%')
P.S. Please sanitize user input before sending to SQL statement.
On Sphinx for this specific situation, using say min_prefix_len=2 and expand_keywords=1 would work. This makes part word matches possible. Ie so that '8g' will match '8gb', in effect the query becomes '8g*'. There is also a wildcard on the end of 'usb' as in its also matching 'usb*' - that shouldnt really affect anything,as unlikely yo have many other words beginning with those chars.
Ultimately its a tradeoff, on how 'fuzzy' to make the search, as this could introduce all sorts of side effects. Difficult to think of a good example, but something like searching for 'case' would then match 'casebook'. But case and casebook at compeltely different things.
I am in need of a lightweight fast search solution.
Today I use Fulltext in boolean mode, where every searchword is mandatory in the results.
The function is fast, working and meets the requirements.
BUT some of the fulltext limitations, http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html, have appeared to be a problem. The site is on a hosted server and Im not allowed to change the mysql settings (e.g. minimum lenght)
E.g.
the search must be able to find red, 11 and ab.cdwhich todays full text solution can't.
http://sphinxsearch.com/ is what you're looking for
though you have to understand that smaller words you find the bigger indexes you use.
Use Lucene, it's very often implemented with MySQL and it'll be both faster and more featureful.
Using the built-in FTS engine is relatively bad practice, especially since it doesn't work with the slightly more reliable InnoDB engine.
The only thing that would come to mind, would to be basing your search off the number of occurrences you can find. Your actual index method could vary, depending on what the DB offers.
Assuming DB size isn't an issue, a (very) basic approach would be to break the search blobs (say, a post on stackoverflow) into each word, normalize it (remove plurals, strip 'logic' words such as and, etc.) then insert each word as a new record, together with the ID that identifies your indexed resource.
Count the instances of the ID, order by count, higher number = more relevant.
Not exactly my field though, so tred carefully! =]
I'd recommend you try distance searching: Levenshtein
Or search for "N-gram fulltext indexing".
I haven't mucked around with it, but I read the theory of full text searching (with mysql at least) a little while back.
If memory serves me correctly you can use full text search for what you want, but you need to configure (and I think a recompile) to get it to work on smaller number of search characters. I think it is set to a default number of 4 characters. You'll want to change it to 2 characters in length with a few other options thrown in and test the results you get.
Someone correct me if this is incorrect. I would rather not throw him on a red herring.
How do you do so that when you search for "alien vs predator" you also get results with the string "alienS vs predator" with the "S"
example http://www.torrentz.com/search?q=alien+vs+predator
how have they implemented this?
is this advanced search engine stuff?
This is known as Word Stemming. When the text is indexed, words are "stemmed" to their "roots". So fighting becomes fight, skiing becomes ski, runs becomes run, etc. The same thing is done to the text that a user enters at search time, so when the search terms are compared to the values in the index, they match.
The Lucene project supports this. I wouldn't consider it an advanced feature. Especially with the expectations that Google has set.
Checking for plurals is a form of stemming. Stemming is a common feature of search engines and other text matching. See the wikipedia page: http://en.wikipedia.org/wiki/Stemming for a host of algorithms to perform stemming.
Typically when one sets up a search engine to search for text, one will construct a query that's something like:
SELECT * FROM TBLMOVIES WHERE NAME LIKE '%ALIEN%'
This means that the substring ALIEN can appear anywhere in the NAME field, so you'll get back strings like ALIENS.
When words are indexed they are indexed by root form. For example for "aliens", "alien", "alien's", "aliens'" are all stored as "alien".
And when words are search search engine also searches only the root form "alien".
This is often called as Porter Stemming Algorithm. You can download its realization for your favorite language here - http://tartarus.org/~martin/PorterStemmer/
This is a basic feature of a search engine, rather than just a program that matches your query with a set of pre-defined results.
If you have the time, this is a great read, all about different algorithms, and how they are implemented.
You could try using soundex() as a fuzzy match on your strings. If you save the soundex with the title then compare that index vs a substring using LIKE 'XXX%' you should have a decent match. The higher the substring count the closer they will match.
see docs: http://dev.mysql.com/doc/refman/5.1/en/string-functions.html#function_soundex
I'm trying to search for a string in my DB , but it should be able to match any word and not the whole phrase.
Suppose, a table data has text like a b c d e f g. Then if I search for d c b it should be able to show the results.
field LIKE '%d c b%' doesn't work in this way.
Can someone suggest a more robust way to search, possible showing the relevance counter also.
I don't mind using PHP also for the above, but prefer to do the search at DB level.
For best results, you need to create FULLTEXT index on your data.
CREATE TABLE mytable (id INT NOT NULL, data TEXT NOT NULL, FULLTEXT KEY fx_mytable_data) ENGINE=MyISAM
SELECT *
FROM mytable
WHERE MATCH(data) AGAINST ('+word1 +word2 +word3' IN BOOLEAN MODE)
Note that to index one-letter words (as in your example), you'll need to set ft_min_word_len to 1 in MySQL confguration.
This syntax can work even if you don't have an index (as long as your table is MyISAM), but will be quite slow.
I think what you want to do is, for any of the letters:
field LIKE '%d%' or field like '%c%' or field like '%b%'
for all of the letters
field LIKE '%d%' and field like '%c%' and field like '%b%'
If you table is in MyISAM, you can use the FULLTEXT search integrated in MySQL : 11.8. Full-Text Search Functions
Though there will be some restrictions (for instance, if I remember correctly, you cannot search on word shorter than X characters -- X generally being 3 or 4).
Another solution would be to use some Fulltext engine, like Lucene, Solr, or Sphinx -- those generally do a better job when it comes to fulltext-searching : it is their job (MySQL's job being to store data, not do fulltext-search)
There have been lots of questions about those on SO ; for instance :
php mysql fulltext search: lucene, sphinx, or ?
Choosing a stand-alone full-text search server: Sphinx or SOLR?
Pros & cons of full text search engine Lucene, Sphinx, Postgresql full text search, MySQL full text search
how much more performant is sphinx than MySQL default fulltext search?
And many others (use the... search engine... on the top right of the site ;-) )
If you are using PHP and cannot install anything else, there is a full-PHP implementation of Lucene : Zend_Search_Lucene
In the end, MySQL LIKE clauses are not meant to be used as 'powerful' search tools to do word-based matching. It's a simple tool to find partial phrases. It also isn't known for scaling well, so if you are doing this on a high-end throughput website, you probably will want another solution.
So that being said, there ARE some options for you, to get what you are wanting:
REGEX support, there is support in MySQL for doing REGEX based searches. Using that, and with a complicated enough REGEX, you can find what you are looking for.
True Full Text Indexing in MySQL. MySQL does have a way to create FULLTEXT indexes. You need to be using MyISAM data engine, and there are restrictions on what exactly you can, or can't do. But it's much more powerful than the basic 'like' functionality that SQL has. I'd recommend reading up on it if you are interested.
3rd party indexers. This is actually the route that most people go. They will use Lucene / Solr, or other similar indexing technologies that are specifically designed for doing full text searching of words with various logic, just like how modern web search engines work. They are extremely efficient because they, essentially, keep their own database where they break everything up and store it in a manner that works best for exactly those types of searches.
Hopefully one of those three options will work for you.
When using the like clause take care that it is %variable% or variable% not %variable.
Secondly. to make an affective search use the explode function to break the words, like if I search "learn php" it should search like this: "learn+php" as in Google. It's explode() function.