MySQL full text fails on a particular table - php

I've been tearing my hair as to why this fails I have the following code
$query = "
SELECT DISTINCT title, caption, message, url, MATCH(title, caption, message, url) AGAINST ('$searchstring' ) AS score FROM news WHERE (valid = 1) AND MATCH(title, caption, message, url) AGAINST ('$searchstring' ) UNION ALL
SELECT DISTINCT title, caption, message, url, MATCH(title, caption, message, url) AGAINST ('$searchstring' ) AS score FROM paged WHERE (valid = 1) AND MATCH(title, caption, message, url) AGAINST ('$searchstring' ) ORDER BY score DESC";
I'm able to get search results from the paged table but not from the news table

My guess as to what the problem is: stopwords.
From the documentation:
The stopword list applies. In addition, words that are present in 50% or more of the rows are considered common and do not match.
If paged didn't meet the criteria but news did then you'd get results for one and not the other.

Are both tables MyISAM? Or is news InnoDB?
Use this query to find out if you don't know.
select table_name
, engine
from information_schema.tables
where table_name in('news','paged');
Because InnoDB type tables don't support fulltext.

GOT IT AT LAST, Thanks Guys.... HERE IS cause of the problem
If a word is present in more than 50%
of the rows it will have a weight of
zero. This has advantages on large
datasets, but can make testing
difficult on small ones.
A natural language search interprets
the search string as a phrase in
natural human language (a phrase in
free text). There are no special
operators. The stopword list applies.
In addition, words that are present in
50% or more of the rows are considered
common and do not match. Full-text
searches are natural language searches
if the IN NATURAL LANGUAGE MODE
modifier is given or if no modifier is
given.
one the one hand the table had only one entry thus the 50% benchmark was overshot
even when I duplicated the entry 5 times the 50% benchmark was still an issue and relevance 0, so I added the modifier e.g.
SELECT * FROM table WHERE MATCH(col1,col2) AGAINST('search_term' IN BOOLEAN MODE)
This is my first time posting on stackoverflow...didn't expect to get responses so fast,
Thanks

Related

MYSQL - Search words in multiple coumns

I want to make a search tool for my website.
if i search for a phrase i want it to search multiple columns
for example if i search dewalt drill and the title has the works dewalt power drill i want it to come up.
also if i search dewalt drill and the tile has dewalt and the description has drill i want it to come up.
but all words of the search must be contained in any combination of fields.
can someone help me with the query?
Currently:
{Select * from products where sku like '%{$searchwords}%' or title like '%{$searchwords}%' or desc like '%{$searchwords}%}
If your table is myisam you can create a fulltext index then use in boolean mode
to add the key:
alter table products ADD FULLTEXT (sku, title, desc)
then your query would be:
$searchwords = join(' +', explode(' ', $searchwords));
$query = "SELECT * FROM products WHERE MATCH (sku, title, desc) AGAINST ('{$searchwords}' IN BOOLEAN MODE)";
Your probably want FULLTEXT searching (starting with MySQL 5.6, this is also available for InnoDB tables). You can require all words with BOOLEAN MODE.

keyword relevance PHP MySQL Search Engine

I don't know why I can't find this anywhere. I would think this would be pretty common request. I am writing a search engine in PHP to search a MySQL database of For Sale listings for keywords inputted by the user.
There are several columns in the table but only 2 that will need to be searched. They are named file_Title & file_Desc. Think of it like a classified ad. An item title and a description.
So for example a user would search for 'John Deere Lawn Tractor'. What I would like to happen is classifieds that have all 4 of those words show up at the top of the list. Then results that only have 3 an so on.
I've read a very good webpage at http://www.roscripts.com/PHP_search_engine-119.html
From that authors example I have the following code below:
<?php
$search = 'John Deere Lawn Tractors';
$keywords = split(' ', $search);
$sql = "SELECT DISTINCT COUNT(*) As relevance, id, file_Title, file_Desc FROM Listings WHERE (";
foreach ($keywords as $keyword) {
echo 'Keyword is ' . $keyword . '<br />';
$sql .= "(file_Title LIKE '%$keyword%' OR file_Desc LIKE '%$keyword%') OR ";
}
$sql=substr($sql,0,(strLen($sql)-3));//this will eat the last OR
$sql .= ") GROUP BY id ORDER BY relevance DESC";
echo 'SQL is ' . $sql;
$query = mysql_query($sql) or die(mysql_error());
$Count = mysql_num_rows($query);
if($Count != 0) {
echo '<br />' . $Count . ' RESULTS FOUND';
while ($row_sql = mysql_fetch_assoc($query)) {//echo out the results
echo '<h3>'.$row_sql['file_Title'].'</h3><br /><p>'.$row_sql['file_Desc'].'</p>';
}
} else {
echo "No results to display";
}
?>
The SQL String outputted is this:
SELECT DISTINCT COUNT(*) As relevance, id, file_Title, file_Desc FROM Listings
WHERE ((file_Title LIKE '%John%'
OR file_Desc LIKE '%John%')
OR (file_Title LIKE '%Deere%'
OR file_Desc LIKE '%Deere%')
OR (file_Title LIKE '%Lawn%'
OR file_Desc LIKE '%Lawn%')
OR (file_Title LIKE '%Tractors%'
OR file_Desc LIKE '%Tractors%') )
GROUP BY id
ORDER BY relevance DESC
With this code I get 275 results from my DB. My problem is it really doesn't order by the number of keywords found in the row. It seems to order the results by id instead. If I remove 'GROUP BY id' then it only returns 1 result instead of all of them, which is really messing with me!
I've also tried shifting to FULLTEXT in the db but can't seem to get that going either so I'd prefer to stick with LIKE %Keyword% syntax.
Any help is appreciated! Thanks!
I would suggest a totally different approach. Your approach is cumbersome, inefficient, heavy on the DB and will likely be very slow with more and more records added to your database.
What I would suggest is the following:
Create a separate table for keywords.
Create a list of non keywords you don't want to index (like the common English prepositions etc.) so that they are not included. You
can probably find a list of them online, readily available.
When a new entry is added, you split the string into separate keywords, omitting the ones in step 2., and inserting them in the
table created in step 3 (if not already in it).
In a separate table, with a foreign key pointing to the keywords table, associate the classifed_ad to the keyword.
Steps 3 and 4 must happen again if your classified_ad is edited (i.e. any keywords inserted in step 4 deleted from the association table and the keywords analysed again and reassociated with the classified ad).
Once you have this structure, all you have to do is search the association table and order by the number of matched keywords. You can even add an extra column to it and put the number of occurrences of that keyword in the article, so that you order by that too.
That will be much faster.
I had used a script once called Sphider which does something similar. Not sure if it is still maintained, but it works in a very similar way on web pages it parses.
I know you said you had problems with FULLTEXT, but I would highly encourage you to go back and try that again. FULLTEXT indexes and search is designed to do what you are doing, and when the MATCH command is used in the WHERE clause, MySQL automatically sorts the rows from highest to lowest relevance.
For more information on FULLTEXT, check out http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html
Also, pay special note to the comment by Patrick O'Lone on the same page, some of which is quoted below...
It should be noted in the documentation that IN
BOOLEAN MODE will almost always return a
relevance of 1.0. In order to get a relevance that is
meaningful, you'll need to:
SELECT MATCH('Content') AGAINST ('keyword1
keyword2') as Relevance FROM table WHERE MATCH
('Content') AGAINST('+keyword1 +keyword2' IN
BOOLEAN MODE) HAVING Relevance > 0.2 ORDER
BY Relevance DESC
Notice that you are doing a regular relevance query
to obtain relevance factors combined with a WHERE
clause that uses BOOLEAN MODE. The BOOLEAN
MODE gives you the subset that fulfills the
requirements of the BOOLEAN search, the relevance
query fulfills the relevance factor, and the HAVING
clause (in this case) ensures that the document is
relevant to the search (i.e. documents that score
less than 0.2 are considered irrelevant). This also
allows you to order by relevance.

Optimizing auto-complete FULLTEXT SQL query

I have the following query which is used in order to do an auto-complete of a search box:
SELECT *, MATCH (screen_name, name) AGAINST ('+query*' IN BOOLEAN MODE) AS SCORE
FROM users
WHERE MATCH (screen_name, name) AGAINST ('+query*' IN BOOLEAN MODE)
ORDER BY SCORE DESC LIMIT 3
I also have a FULL TEXT index on screen_name & name (together). When this table was relatively small (50k) this worked great. Now the table is ~200k and it takes seconds(!) to complete each query. I'm using MySql MyISAM. Is this reasonable? What directions might I check in order to improve this as surely it doesn't satisfy the needs of an auto-complete query.
MYSQL Match against is really slow, you should look into alternatives like Sphinx Search Server.

MySQL match against - IN BOOLEAN MODE?

I'm using PDO to execute a MATCH AGAINST query.
The following returns nothing:
SELECT title, author, isbn, MATCH(title, isbn) AGAINST (:term) AS score
FROM books
WHERE MATCH(title, isbn) AGAINST (:term)
ORDER BY score DESC LIMIT 0,10
Where as this returns perfectly:
SELECT title, author, isbn, MATCH(title, isbn) AGAINST (:term) AS score
FROM books
WHERE MATCH(title, isbn) AGAINST (:term IN BOOLEAN MODE)
ORDER BY score DESC LIMIT 0,10
Could anyone tell me why IN BOOLEAN MODE is making such a difference, and whether or not I should be using it in my query?
The second query is running as a "natural language search" as that is the default when no natural language search type is specified. This type of search filters additionally filters out words that are present in 50% or more of the rows automatically.
"IN BOOLEAN MODE" does do this additional filtering, and thus, may return matches if you are searching on a common term.
Whether or not you should be using a boolean search depends on what the specifics of your situation and cannot be determined without more information. However, some considerations may include, size of the input data set vs how large of a matching dataset you want returned and whether you want to return results for words that occur frequently.
(Ref: http://dev.mysql.com/doc/refman/5.1/en/fulltext-search.html)

Mysql Unique Query

I have a programme listing database with all the information needed for one programme packed into one table (I should have split programmes and episodes into their own) Now since there are multiple episodes for any given show I wish to display the main page with just the title names in ascending and chosen letter. Now I know how to do the basic query but this is all i know
SELECT DISTINCT title FROM programme_table WHERE title LIKE '$letter%'
I know that works i use it. But I am using a dynamic image loading that requires a series number to return that image full so how do I get the title to be distinct but also load the series number from that title?
I hope I have been clear.
Thanks for any help
Paul
You can substitute the DISTINCT keyword for a GROUP BY clause.
SELECT
title
, series_number
FROM
programme_table
WHERE title LIKE '$letter%'
GROUP BY
title
, series_number
There are currently two other valid options:
The option suggested by Mohammad is to use a HAVING clause in stead of the WHERE clause this is actually less optimal:
The WHERE clause is used to restrict records, and is also used by the query optimizer to determine which indexes and tables to use. HAVING is a "filter" on the final result set, and is applied after ORDER BY and GROUP BY, so MySQL cannot use it to optimize the query.
So HAVING is a lot less optimal and you should only use it when you cannot use 'WHERE' to get your results.
quosoo points out that the DISTINCT keyword is valid for all listed columns in the query. This is true, but generally people do not recommend it (there is no performance difference *In some specific cases there is a performance difference***)**. The MySQL optimizer however spits out the same query for both so there is no actual performance difference.
Update
Although MySQL does apply the same optimization to both queries, there is actually a difference: when DISTINCT is used in combination with a LIMIT clause, MySQL stops as soon as it finds enough unique rows. so
SELECT DISTINCT
title
, series_number
FROM
programme_table
WHERE
title LIKE '$letter%'
is actually the best option.
select title,series_number from programme_table group by title,series_number having title like '$letter%';
DISTINCT keyword works actually for a list of colums so if you just add the series to your query it should return a set of unique title, series combinations:
SELECT DISTINCT title, series FROM programme_table WHERE title LIKE '$letter%'
Hey thanks for that but i have about 1000 entries with the same series so it would single out the series as well rendering about 999 programmes useless and donot show.
I however found out away to make it unique and show the series number
SELECT * FROM four a INNER JOIN (SELECT title, MIN(series) AS MinPid FROM four WHERE title LIKE '$letter%' GROUP BY title) b ON a.title = b.title AND a.series = b.MinPid
Hopefully it helps anyone in the future and thank you for the replies :)

Categories