mySQL LIKE query - php

I'm trying to search for a string in my DB , but it should be able to match any word and not the whole phrase.
Suppose, a table data has text like a b c d e f g. Then if I search for d c b it should be able to show the results.
field LIKE '%d c b%' doesn't work in this way.
Can someone suggest a more robust way to search, possible showing the relevance counter also.
I don't mind using PHP also for the above, but prefer to do the search at DB level.

For best results, you need to create FULLTEXT index on your data.
CREATE TABLE mytable (id INT NOT NULL, data TEXT NOT NULL, FULLTEXT KEY fx_mytable_data) ENGINE=MyISAM
SELECT *
FROM mytable
WHERE MATCH(data) AGAINST ('+word1 +word2 +word3' IN BOOLEAN MODE)
Note that to index one-letter words (as in your example), you'll need to set ft_min_word_len to 1 in MySQL confguration.
This syntax can work even if you don't have an index (as long as your table is MyISAM), but will be quite slow.

I think what you want to do is, for any of the letters:
field LIKE '%d%' or field like '%c%' or field like '%b%'
for all of the letters
field LIKE '%d%' and field like '%c%' and field like '%b%'

If you table is in MyISAM, you can use the FULLTEXT search integrated in MySQL : 11.8. Full-Text Search Functions
Though there will be some restrictions (for instance, if I remember correctly, you cannot search on word shorter than X characters -- X generally being 3 or 4).
Another solution would be to use some Fulltext engine, like Lucene, Solr, or Sphinx -- those generally do a better job when it comes to fulltext-searching : it is their job (MySQL's job being to store data, not do fulltext-search)
There have been lots of questions about those on SO ; for instance :
php mysql fulltext search: lucene, sphinx, or ?
Choosing a stand-alone full-text search server: Sphinx or SOLR?
Pros & cons of full text search engine Lucene, Sphinx, Postgresql full text search, MySQL full text search
how much more performant is sphinx than MySQL default fulltext search?
And many others (use the... search engine... on the top right of the site ;-) )
If you are using PHP and cannot install anything else, there is a full-PHP implementation of Lucene : Zend_Search_Lucene

In the end, MySQL LIKE clauses are not meant to be used as 'powerful' search tools to do word-based matching. It's a simple tool to find partial phrases. It also isn't known for scaling well, so if you are doing this on a high-end throughput website, you probably will want another solution.
So that being said, there ARE some options for you, to get what you are wanting:
REGEX support, there is support in MySQL for doing REGEX based searches. Using that, and with a complicated enough REGEX, you can find what you are looking for.
True Full Text Indexing in MySQL. MySQL does have a way to create FULLTEXT indexes. You need to be using MyISAM data engine, and there are restrictions on what exactly you can, or can't do. But it's much more powerful than the basic 'like' functionality that SQL has. I'd recommend reading up on it if you are interested.
3rd party indexers. This is actually the route that most people go. They will use Lucene / Solr, or other similar indexing technologies that are specifically designed for doing full text searching of words with various logic, just like how modern web search engines work. They are extremely efficient because they, essentially, keep their own database where they break everything up and store it in a manner that works best for exactly those types of searches.
Hopefully one of those three options will work for you.

When using the like clause take care that it is %variable% or variable% not %variable.
Secondly. to make an affective search use the explode function to break the words, like if I search "learn php" it should search like this: "learn+php" as in Google. It's explode() function.

Related

PHP: searching with search terms for similar text on webpage

I'm busy with a program that needs to find similar text on a webpage. In SQL we have 400.000 search terms. For example, the search terms can be ‘San Miguel Pale Pilsen’, ‘Schaumburger Bali’ and ‘Rizmajer Cortez’.
Now I'm checking each word on the webpage in the database. For each word on the webpage I send a select query with a %like% operator. For each result I use similar text with php. If the word and the search term aren’t equal to the amount of words in it, it will get some extra words of the webpage to make it equal.
(And yes I know that it isn’t smart)
The problem is it takes a lot of time and server must work hard for it.
What is the best and fastest way to find similar text on a webpage?
The LIKE operator will be always slow if you start the pattern with a % wild card. This happens since you are negating the ability of MariaDB to use any indexing.
Considering you need to find words in any location of the VARCHAR column the best solution is to implement bona fide Full Text Search. See MariaDB's Full-Text Index Overview.
Searches will become orders of magnitude faster, not to mention scalability.

Php mysql search advice

I am doing a php mysql search script which is searching in a very big database (over 2 000 000 rows) and i want it to be fast. I want it to have a spell checking and a smart word detection for example to query phone when user input is phrone or hpone. So the best way i found is with REGEXP . But when i use regexp with mysql with a complicated expression it is kind of slow. Do you have any advice for me?
Regexp example for phrone to match phone
[a-zA-Z]*[phrone]{3,}[a-zA-Z]{3,}
Please read this doc:
https://dev.mysql.com/doc/refman/5.7/en/fulltext-search.html
this enables you to perform search for things you have described.
if you want more power and features, please use elasticsearch or other search engine (SOLR), they are blazingly fast and have more features.

PHP/SQL: Multiple fuzzy keyword search based on likeness (Advanced SQL Search)

Current Situation:
I am currently running a keyword search using multiple keywords in PHP and SQL. The field I'm applying the search to is the title field, which is a 250 VARCHAR field.
A user can input a single keyword, e.g. "apple" or also multiple, e.g. "apple banana yellow". The first option is trivial. For the second option, my current algorithm works like this:
Try and find items that match the exact entire string "apple banana yellow" in the title. Order the results by index id.
If no more results matching the exact entire string are found, or if none are found in the first place, search for all titles containing either "apple", "banana", or "yellow". Order the results by index id.
The algorithm is very basic but funny enough works pretty well.
What I'm looking for:
However I am now looking to implement a smarter search algorithm without having to rely on external paid scripts like Amazon services. I'm looking for a way to implement the following:
fuzzy search (I've read about SOUNDEX or levenshtein which may realize this)
smarter keyword search (Don't just either return items that match ALL words or JUST A SINGLE WORD, but maybe also 2 words or 3 words before)
order by relevance/likeness (Order by likeness of the search to the title, and not just the index id)
(Bonus: maybe even implement search for exact strings, like using " " on google to find exactly the words between the quotation marks)
What is the best way to get started with such a search? I am using InnoDB for MySQL.
Assuming MySQL, you can add a FULL Text index. Then, there are a number of functions that will allow you to so basic searches that meet all the needs you list: https://dev.mysql.com/doc/refman/5.7/en/fulltext-search.html
You end up using syntax like:
SELECT * FROM table_name WHERE MATCH(column_with_fulltext_index_on_it)
AGAINST('apple banana yellow' IN NATURAL LANGUAGE MODE)
To see the match score
SELECT column_with_fulltext_index_on_it, MATCH(column_with_fulltext_index_on_it)
AGAINST('apple banana yellow' IN NATURAL LANGUAGE MODE) AS score FROM table_name WHERE MATCH(column_with_fulltext_index_on_it)
AGAINST('apple banana yellow' IN NATURAL LANGUAGE MODE)
It can be a little learning curve to overcome to understand how you can tweak the match clause perfect for your needs, but your examples seem pretty basic though (except the smarter search).
Also, good to note, there are system configs you need to control the the min/max characters of words/tokens to index by. You can read https://dev.mysql.com/doc/refman/5.7/en/fulltext-fine-tuning.html to get deeper understanding of indexing options. Percona is a good resource as well https://www.percona.com/blog/2013/02/26/myisam-vs-innodb-full-text-search-in-mysql-5-6-part-1/ (typically more human digestible than the MySQL Doc's).
If you need to do more complex searches, you can look at adding other technologies like Solr, but I've always recommended, get the basic working with what you got, only adopt a new tech if you hit a brick wall, or have good metric on existing solution and know the new tech will somehow improve (speed, storage space, quality of results, etc...). If you can't quantify, stick to basic until you can.
Here's a good tutorial: http://www.w3resource.com/mysql/mysql-full-text-search-functions.php

Need suggestion of alternative to Fulltext search

I am in need of a lightweight fast search solution.
Today I use Fulltext in boolean mode, where every searchword is mandatory in the results.
The function is fast, working and meets the requirements.
BUT some of the fulltext limitations, http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html, have appeared to be a problem. The site is on a hosted server and Im not allowed to change the mysql settings (e.g. minimum lenght)
E.g.
the search must be able to find red, 11 and ab.cdwhich todays full text solution can't.
http://sphinxsearch.com/ is what you're looking for
though you have to understand that smaller words you find the bigger indexes you use.
Use Lucene, it's very often implemented with MySQL and it'll be both faster and more featureful.
Using the built-in FTS engine is relatively bad practice, especially since it doesn't work with the slightly more reliable InnoDB engine.
The only thing that would come to mind, would to be basing your search off the number of occurrences you can find. Your actual index method could vary, depending on what the DB offers.
Assuming DB size isn't an issue, a (very) basic approach would be to break the search blobs (say, a post on stackoverflow) into each word, normalize it (remove plurals, strip 'logic' words such as and, etc.) then insert each word as a new record, together with the ID that identifies your indexed resource.
Count the instances of the ID, order by count, higher number = more relevant.
Not exactly my field though, so tred carefully! =]
I'd recommend you try distance searching: Levenshtein
Or search for "N-gram fulltext indexing".
I haven't mucked around with it, but I read the theory of full text searching (with mysql at least) a little while back.
If memory serves me correctly you can use full text search for what you want, but you need to configure (and I think a recompile) to get it to work on smaller number of search characters. I think it is set to a default number of 4 characters. You'll want to change it to 2 characters in length with a few other options thrown in and test the results you get.
Someone correct me if this is incorrect. I would rather not throw him on a red herring.

search query "alien vs predator"

How do you do so that when you search for "alien vs predator" you also get results with the string "alienS vs predator" with the "S"
example http://www.torrentz.com/search?q=alien+vs+predator
how have they implemented this?
is this advanced search engine stuff?
This is known as Word Stemming. When the text is indexed, words are "stemmed" to their "roots". So fighting becomes fight, skiing becomes ski, runs becomes run, etc. The same thing is done to the text that a user enters at search time, so when the search terms are compared to the values in the index, they match.
The Lucene project supports this. I wouldn't consider it an advanced feature. Especially with the expectations that Google has set.
Checking for plurals is a form of stemming. Stemming is a common feature of search engines and other text matching. See the wikipedia page: http://en.wikipedia.org/wiki/Stemming for a host of algorithms to perform stemming.
Typically when one sets up a search engine to search for text, one will construct a query that's something like:
SELECT * FROM TBLMOVIES WHERE NAME LIKE '%ALIEN%'
This means that the substring ALIEN can appear anywhere in the NAME field, so you'll get back strings like ALIENS.
When words are indexed they are indexed by root form. For example for "aliens", "alien", "alien's", "aliens'" are all stored as "alien".
And when words are search search engine also searches only the root form "alien".
This is often called as Porter Stemming Algorithm. You can download its realization for your favorite language here - http://tartarus.org/~martin/PorterStemmer/
This is a basic feature of a search engine, rather than just a program that matches your query with a set of pre-defined results.
If you have the time, this is a great read, all about different algorithms, and how they are implemented.
You could try using soundex() as a fuzzy match on your strings. If you save the soundex with the title then compare that index vs a substring using LIKE 'XXX%' you should have a decent match. The higher the substring count the closer they will match.
see docs: http://dev.mysql.com/doc/refman/5.1/en/string-functions.html#function_soundex

Categories