How do I include plurals but exclude singulars? [duplicate] - php

I am building a site with a requirement to include plural words but exclude singlular words, as well as include longer phrases but exclude shorter phrases found within it.
For example:
a search for "Breads" should return results with 'breads' within it, but not 'bread' or 'read'.
a search for "Paperback book" should return results with 'paperback book' within it, but not 'paperback' or 'book'.
The query I have tried is:
SELECT * FROM table WHERE (field LIKE '%breads%') AND (field NOT LIKE '%bread%')
...which clearly returned no results, even though there are records with 'breads' and 'bread' in it.
I understand why this query is failing (I'm telling it to both include and exclude the same strings) but I cannot think of the correct logic to apply to the code to get it working.

Searching for %breads% would NEVER return bread or read, as the 's' is a required character for the match. So just eliminate the and clause:
SELECT ... WHERE (field LIKE '%breads%')
SELECT ... WHERE (field LIKE '%paperback book%');

You should consider using FULL TEXT SEARCH.
This will solve your Bread/read issue.
I believe use of wildcards here isn't useful. Lets say you are using '%read%', now this would also return bread, breads etc, which is why I recommended Full Text Search

With MySQL you can use REGEXP instead of like which would give you better control over your query...
SELECT * FROM table WHERE field REGEXP '\s+read\s+'
That would at least enforce word boundaries around your query and gives you much better control over your matching - with the downside of a performance hit though.

Related

Searching for a term using SQL MATCH which includes spaces

This may be a newbie question, as I'm not an expert in SQL. However, couldn't find the answer using Google.
I have a table called record_fields which contains the majority of my system's content, which I want to search in. The content cell is defined as LONGTEXT as it can include extremely long input.
Originally, I used (simplifying the query a bit for clarity sake):
SELECT * FROM record_fields WHERE LOWER(content) LIKE LOWER('%{$keyword}%')
Execution time aside, this query has one major issue. If I search for the term "post" it will return all content which has words like "poster", "posting" and others. I wanted to add a FULLTEXT search.
Now the query looks like this (again, simplified):
SELECT * FROM record_fields WHERE MATCH (content) AGAINST ('{$keyword}')
However, this is still problematic. With MATCH, if my system's users search for the words "Bank of America", for example, all records that either have the word "Bank" and "America" will be returned.
TL;DR - my question is this:
how do I use MATCH to search for exact phrases with space in them?
Any help would be highly appreciated, thanks in advance!
%{keyword}% matches all text sub-strings that include your keyword anywhere in the string. MATCH usually takes all keywords in the match string as individual search terms, and matches against each. You can use boolean mode and use a + symbol before each required keyword. Take a look at the MySQL reference for this.
Edited the answer to reflect Idan's response in not getting the results from the suggested %keyword solution.
You can use Match Against With Boolean Mode and you can put your input string inside '"{$keyword}"'.
Check last example in below link
https://dev.mysql.com/doc/refman/5.5/en/fulltext-boolean.html
SELECT * FROM record_fields WHERE MATCH (content) AGAINST ('"{$keyword}"' IN BOOLEAN MODE )

Querying a table where the field contains any order of given text strings

I want to query a table as follows:
I have a field called "category" and my input match contains N separate words. I want the query to match all rows that contain all N words, but in any order.
For example if the field category contains "hello good morning world", my input query can contain "hello morning" or "good" or "world hello" and all are matches to the query.
How do I formulate such an SQL expression?
Also it would be good if the query can be made case insensitive.
If you are using MySQL you can use the boolean fulltext search feature to achieve this. You can put a + in front of each term and then only results with all the terms, in any order, will be returned. You will need to make sure the column containing the category field has a fulltext index specified on it for this to work. Other database engines probably have similar features. So for example you might do something like the following assuming there were a fulltext index over the category column...
SELECT * FROM myTable WHERE MATCH (category) AGAINST ('+term1 +term2 +term3' IN BOOLEAN MODE);
I would avoid using the "LIKE" operator as others have suggested you would have to worry about the headache of mixed upper/lower case and if you have a large database using a % in the front of a LIKE search term is going to cause a full table scan instead of using an index which is horrible for performance.
I'm not writing the loop that will build this query for you. This will get the job done, but it will be pretty inefficient.
SELECT * FROM table
WHERE (
TOUPPER(category) LIKE '*HELLO*' AND
TOUPPER(category) LIKE '*GOOD*' AND
TOUPPER(category) LIKE '*MORNING*' AND
TOUPPER(category) LIKE '*WORLD*'
);
You could also research using REGEXes with SQL.

Searching a field for the whole word, and nothing but the word

I have a PHP interface with a keyword search, working off a DB(MySQL) which has a Keywords field.
The way in which the keywords field is set up is as follows, it is a varchar with all the words formatted as shown below...
the, there, theyre, their, thermal etc...
if i want to just return the exact word 'the' from the search how would this be achieved?
I have tried using 'the%' and '%the' in the PHP and it fails to work by not returning all of the rows where the keyword appears in.
is there a better (more accurate) way to go about this?
Thanks
If you want to select the rows that have exactly the keyword the:
SELECT * FROM table WHERE keyword='the'
If you want to select the rows that have the keyword the anywhere in them:
SELECT * FROM table WHERE keyword LIKE '%the%'
If you want to select the rows that start with the keyword the:
SELECT * FROM table WHERE keyword LIKE 'the%'
If you want to select the rows that end with the keyword the:
SELECT * FROM table WHERE keyword LIKE '%the'
Try this
SELECT * FROM tablename
WHERE fieldname REGEXP '[[:<:]]test[[:>:]]'
[[:<:]] and [[:>:]] are markers for word boundaries.
MySQL Regular Expressions
if you also search for the commas, you can be sure you are getting the whole word.
where keywordField like '%, the, %'
or keywordField like '%, the'
or keywordField like 'the, %'
maybe I didn't understand the question properly... but If you want all the words where 'the' appears, a LIKE '%word%' should work.
If the DB of words is HUGE MySQL may fail to retrieve some of the words, that can be solved in 2 ways...
1- get a DB that support bigger sizes (not many ppl would chose this one tho). For example SQL Server has a 'CONTAINS' function that works better than LIKE '%word%'.
2- use a external search tool that uses inverted index search. I used Sphinx for a project and it works quite good. This is better if you rarely UPDATE the rows of the data you want to search from, which should be the case.
Sphinx for example would generate a file from your MySQL table and use this file to solve the search (it's very fast), this file should be re-indexed everytime you do a insert or update on the table, making it a much better solution if you rarely update or insert new rows.
It looks like you have a one to many relationship going on within a column. It might be better to create a separate table for keywords with a row for each keyword and a foreign key to whatever it is you're searching on.
Doing like '%???%' is generally a bad idea because the DB can't make use of an index so it will scan the whole table. Whether this matters will depend on the size of data you're working with but its worth considering up front. The single best way to help DB performance is in the initial table design. This can be tricky to change later.

Exclude keywords from an empty query in Sphinx

Is it possible to exclude certain keywords from an empty query through Sphinx?
What I had in mind is to use the Extended2 match mode, and in order to exclude keywords, I'll be using the - or ! operator. I only need to fetch data through Sphinx without using any query (except for the exclusion operators).
In Sphinx, I fetch data using the following method:
$data = $sphinx->query('');
This query returns data which doesn't have to match anything (it means it'll return all data, and of course limited to the query limit). The problem is, if I add a keyword with the ! or - operator, it doesn't return anything. For instance:
$data = $sphinx->query('-google');
$data is returned as false
Maybe there is another method for this to work. Please help.
Thank you.
Sphinx doesnt like negation only queries. If you check GetLastError()/GetLastWarning() it will explicitly say so.
The main reason, is it can't efficently use its index. Sphinx is based on the concept of inverted indexes. So to run this query, it needs to fetch a list of every document, then remove the ones matching the keyword.
But you make it work. Just need to give sphinx a keyword that will be in every single document. Then can just do
$data = $sphinx->query('popularword -google');
If you dont have a word taht will be every document, just add a fake one :)
sql_query = SELECT id, '__ALL__' as dummy, title .......
can then just do
$data = $sphinx->query('__ALL__ -google');
As the word will be on every document.
Dont expect the query to be very fast.
As far as know it is not possible. Sphinx will fail at computing any query that involves searching through an entire collection.
For both options allowing you to use -, it is explicitly stated in the documentation that is is impossible:
SPH_MATCH_BOOLEAN:
Queries like "-dog", which implicitly include all documents from the collection, can not be evaluated.
SPH_MATCH_EXTENDED:
However, the query must be possible to compute without involving an implicit list of all documents
Short answer is: no, it is not possible. The only alternative is if you want to implement this for a small list of keywords, then you can add in your database a flag and set the value to true if the text contains this keyword. You'll be able to exclude them with a SetFilter() from your search results. I'm using this trick to exclude documents containing a certain set of keywords from my listings.

What's the best way to search a MySQL database with PHP?

Say if I had a table of books in a MySQL database and I wanted to search the 'title' field for keywords (input by the user in a search field); what's the best way of doing this in PHP? Is the MySQL LIKE command the most efficient way to search?
Yes, the most efficient way usually is searching in the database. To do that you have three alternatives:
LIKE, ILIKE to match exact substrings
RLIKE to match POSIX regexes
FULLTEXT indexes to match another three different kinds of search aimed at natural language processing
So it depends on what will you be actually searching for to decide what would the best be. For book titles I'd offer a LIKE search for exact substring match, useful when people know the book they're looking for and also a FULLTEXT search to help find titles similar to a word or phrase. I'd give them different names on the interface of course, probably something like exact for the substring search and similar for the fulltext search.
An example about fulltext: http://www.onlamp.com/pub/a/onlamp/2003/06/26/fulltext.html
Here's a simple way you can break apart some keywords to build some clauses for filtering a column on those keywords, either ANDed or ORed together.
$terms=explode(',', $_GET['keywords']);
$clauses=array();
foreach($terms as $term)
{
//remove any chars you don't want to be searching - adjust to suit
//your requirements
$clean=trim(preg_replace('/[^a-z0-9]/i', '', $term));
if (!empty($clean))
{
//note use of mysql_escape_string - while not strictly required
//in this example due to the preg_replace earlier, it's good
//practice to sanitize your DB inputs in case you modify that
//filter...
$clauses[]="title like '%".mysql_escape_string($clean)."%'";
}
}
if (!empty($clauses))
{
//concatenate the clauses together with AND or OR, depending on
//your requirements
$filter='('.implode(' AND ', $clauses).')';
//build and execute the required SQL
$sql="select * from foo where $filter";
}
else
{
//no search term, do something else, find everything?
}
Consider using sphinx. It's an open source full text engine that can consume your mysql database directly. It's far more scalable and flexible than hand coding LIKE statements (and far less susceptible to SQL injection)
You may also check soundex functions (soundex, sounds like) in mysql manual http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_soundex
Its functional to return these matches if for example strict checking (by LIKE or =) did not return any results.
Paul Dixon's code example gets the main idea across well for the LIKE-based approach.
I'll just add this usability idea: Provide an (AND | OR) radio button set in the interface, default to AND, then if a user's query results in zero (0) matches and contain at least two words, respond with an option to the effect:
"Sorry, No matches were found for your search phrase. Expand search to match on ANY word in your phrase?
Maybe there's a better way to word this, but the basic idea is to guide the person toward another query (that may be successful) without the user having to think in terms of the Boolean logic of AND and ORs.
I think Like is the most efficient way if it's a word. Multi words may be split with explode function as said already. It may then be looped and used to search individually through the database. If same result is returned twice, it may be checked by reading the values into an array. If it already exists in the array, ignore it. Then with count function, you'll know where to stop while printing with a loop. Sorting may be done with similar_text function. The percentage is used to sort the array. That's the best.

Categories