Say if I had a table of books in a MySQL database and I wanted to search the 'title' field for keywords (input by the user in a search field); what's the best way of doing this in PHP? Is the MySQL LIKE command the most efficient way to search?
Yes, the most efficient way usually is searching in the database. To do that you have three alternatives:
LIKE, ILIKE to match exact substrings
RLIKE to match POSIX regexes
FULLTEXT indexes to match another three different kinds of search aimed at natural language processing
So it depends on what will you be actually searching for to decide what would the best be. For book titles I'd offer a LIKE search for exact substring match, useful when people know the book they're looking for and also a FULLTEXT search to help find titles similar to a word or phrase. I'd give them different names on the interface of course, probably something like exact for the substring search and similar for the fulltext search.
An example about fulltext: http://www.onlamp.com/pub/a/onlamp/2003/06/26/fulltext.html
Here's a simple way you can break apart some keywords to build some clauses for filtering a column on those keywords, either ANDed or ORed together.
$terms=explode(',', $_GET['keywords']);
$clauses=array();
foreach($terms as $term)
{
//remove any chars you don't want to be searching - adjust to suit
//your requirements
$clean=trim(preg_replace('/[^a-z0-9]/i', '', $term));
if (!empty($clean))
{
//note use of mysql_escape_string - while not strictly required
//in this example due to the preg_replace earlier, it's good
//practice to sanitize your DB inputs in case you modify that
//filter...
$clauses[]="title like '%".mysql_escape_string($clean)."%'";
}
}
if (!empty($clauses))
{
//concatenate the clauses together with AND or OR, depending on
//your requirements
$filter='('.implode(' AND ', $clauses).')';
//build and execute the required SQL
$sql="select * from foo where $filter";
}
else
{
//no search term, do something else, find everything?
}
Consider using sphinx. It's an open source full text engine that can consume your mysql database directly. It's far more scalable and flexible than hand coding LIKE statements (and far less susceptible to SQL injection)
You may also check soundex functions (soundex, sounds like) in mysql manual http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_soundex
Its functional to return these matches if for example strict checking (by LIKE or =) did not return any results.
Paul Dixon's code example gets the main idea across well for the LIKE-based approach.
I'll just add this usability idea: Provide an (AND | OR) radio button set in the interface, default to AND, then if a user's query results in zero (0) matches and contain at least two words, respond with an option to the effect:
"Sorry, No matches were found for your search phrase. Expand search to match on ANY word in your phrase?
Maybe there's a better way to word this, but the basic idea is to guide the person toward another query (that may be successful) without the user having to think in terms of the Boolean logic of AND and ORs.
I think Like is the most efficient way if it's a word. Multi words may be split with explode function as said already. It may then be looped and used to search individually through the database. If same result is returned twice, it may be checked by reading the values into an array. If it already exists in the array, ignore it. Then with count function, you'll know where to stop while printing with a loop. Sorting may be done with similar_text function. The percentage is used to sort the array. That's the best.
Related
I have a table dictionary which contains a list of words Like:
ID|word
---------
1|hello
2|google
3|similar
...
so i want if somebody writes a text like
"helo iam looking for simlar engines for gogle".
Now I want to check every word if it exists in the database, if not it should
get me the similar word for the word. For example: helo = hello, simlar = similar, gogle = google.
Well, i want to fix the spelling errors. In my database i have a full dictionary of all english words. I coudn't find any mysql function which helps me. LIKE isn't helpfull in my situation.
you can use soundex() function for comparing phonetically
your query should be something like:
select * from table where soundex(word) like soundex('helo');
and this will return you the hello row
There is a function that does roughly want you want, but it's intensive and will slow queries down. You might be able to use in your circumstances, I have used it before. It's called Levenshtein. You can get it here How to add levenshtein function in mysql?
What you want to do is called a fuzzy search. You could use the SOUNDEX function in MySQL, documented here:
http://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_soundex
You query would look like:
SELECT * FROM dictionary where SOUNDEX(word) = SOUNDEX(:yourSearchTerm)
... where your search term is bound to the :yourSearchTerm parameter value.
A next step would be to try implementing and making use of a Levenshtein function in MySQL. One is described here:
http://www.artfulsoftware.com/infotree/qrytip.php?id=552
The Levenshtein distance between two strings is the minimum number of
operations needed to transform one string into the other, where an
operation may be insertion, deletion or substitution of one character.
You might also consider looking into databases that are aimed at full text searching, such as Elastic Search, which provides this natively:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-fuzzy-query.html
I am building an application in Laravel. And I can't decide to go with Match() or Like for text searching.
I only want to do a text search on one column, that is a Varchar(42).
I will also filter out the query by some Where() statements, so it will not do a text search on all rows.
I am using mysql 5.6+ so Match works with my innobd engine.
Does Match() do good in a table that has about 30k rows?
Laravel ORM doesnt support match so my query looks like this:
$q = Input::get('query');
Post::whereRaw("MATCH(title) AGAINST(? IN BOOLEAN MODE)", array($q))->get();
Do I need to sanitize the "$q" in order to be safe from SQL injections? Since I'm using whereRaw()
The two capabilities are quite different, so the choice should be easy. MATCH is focused on words within the text. So, if you want to search by one or more words, then MATCH should be faster. However, MATCH is focused on words, so searching on numbers, stop words, and short words requires extra effort.
LIKE generally cannot make use of an index. This slows down such queries because every row needs to be processed. Of course, if the rest of the filtering reduces this to 100 rows, then it is not a big deal.
Also, LIKE can use an index for "prefix" searches -- that is, searches at the beginning of the string. So, LIKE 'abc%' can use an index. `LIKE '%abc%' cannot.
Is there a PHP or MySQL function which will check how relevant a matching field is? Could it review the string and match against a percentage of characters?
For example I am doing a basic search script pulling back results but how can I make the more relevant results appear at the top?
A lot depends on your data and the type of searches that you are expecting. But basically, you could be looking for a fuzzy search. Soundex and Levenshtein distance are two of the many functions that you can use for string matches
http://php.net/manual/en/function.levenshtein.php
Well, you are asking a few complicated questions here. Mostly, I think you are looking for information retrieval techniques. Some answers are all over Stack OVerflow.
What tried and true algorithms for suggesting related articles are out there? is great I think
You might want to use the levenshtein distance if you are just looking for how closely a keyword matches an existing keyword.
I tried :P
Mysql has a function MATCH
You can youse it like
SELECT * FROM `table` WHERE MATCH(content) AGAINST('search text')
So it will look within content how relevancy it is.
But you need to index field content to FULLTEXT which requires an table type "MYISAM".
The output will automaticly sorted ascending.
hope this helps
I am building a site with a requirement to include plural words but exclude singlular words, as well as include longer phrases but exclude shorter phrases found within it.
For example:
a search for "Breads" should return results with 'breads' within it, but not 'bread' or 'read'.
a search for "Paperback book" should return results with 'paperback book' within it, but not 'paperback' or 'book'.
The query I have tried is:
SELECT * FROM table WHERE (field LIKE '%breads%') AND (field NOT LIKE '%bread%')
...which clearly returned no results, even though there are records with 'breads' and 'bread' in it.
I understand why this query is failing (I'm telling it to both include and exclude the same strings) but I cannot think of the correct logic to apply to the code to get it working.
Searching for %breads% would NEVER return bread or read, as the 's' is a required character for the match. So just eliminate the and clause:
SELECT ... WHERE (field LIKE '%breads%')
SELECT ... WHERE (field LIKE '%paperback book%');
You should consider using FULL TEXT SEARCH.
This will solve your Bread/read issue.
I believe use of wildcards here isn't useful. Lets say you are using '%read%', now this would also return bread, breads etc, which is why I recommended Full Text Search
With MySQL you can use REGEXP instead of like which would give you better control over your query...
SELECT * FROM table WHERE field REGEXP '\s+read\s+'
That would at least enforce word boundaries around your query and gives you much better control over your matching - with the downside of a performance hit though.
I have a table that lists people and all their contact info. I want for users to be able to perform an intelligent search on the table by simply typing in some stuff and getting back results where each term they entered matches at least one of the columns in the table. To start I have made a query like
SELECT * FROM contacts WHERE
firstname LIKE '%Bob%'
OR lastname LIKE '%Bob%'
OR phone LIKE '%Bob%' OR
...
But now I realize that that will completely fail on something as simple as 'Bob Jenkins' because it is not smart enough to search for the first an last name separately. What I need to do is split up the the search terms and search for them individually and then intersect the results from each term somehow. At least that seems like the solution to me. But what is the best way to go about it?
I have heard about fulltext and MATCH()...AGAINST() but that sounds like a rather fuzzy search and I don't know how much work it is to set up. I would like precise yes or no results with reasonable performance. The search needs to be done on about 20 columns by 120,000 rows. Hopefully users wouldn't type in more than two or three terms.
Oh sorry, I forgot to mention I am using MySQL (and PHP).
I just figured out fulltext search and it is a cool option to consider (is there a way to adjust how strict it is? LIMIT would just chop of the results regardless of how well it matched). But this requires a fulltext index and my website is using a view and you can't index a view right? So...
I would suggest using MATCH / AGAINST. Full-text searches are more advanced searches, more like Google's, less elementary.
It can match across multiple tables and rank them to how many matches they have.
Otherwise, if the word is there at all, esp. across multiple tables, you have no ranking. You can do ranking server-side, but that is going to take more programming/time.
Depending on what database you're using, the ability to do cross columns can become more or less difficult. You probably don't want to do 20 JOINs as that will be a very slow query.
There are also engines such as Sphinx and Lucene dedicated to do these types of searches.
BOOLEAN MODE
SELECT * FROM contacts WHERE
MATCH(firstname,lastname,email,webpage,country,city,street...)
AGAINST('+bob +jenkins' IN BOOLEAN MODE)
Boolean mode is very powerful. It might even fulfil all my needs. I will have to do some testing. By placing + in front of the search terms those terms become required. (The row must match 'bob' AND 'jenkins' instead of 'bob' OR 'jenkins'). This mode even works on non-indexed columns, and thus I can use it on a view although it will be slower (that is what I need to test). One final problem I had was that it wasn't matching partial search terms, so 'bob' wouldn't find 'bobby' for example. The usual % wildcard doesn't work, instead you use an asterisk *.