MySQL fulltext search Boolean mode confusion - php

I'm getting a bit confused when trying to set up a search utilizing fulltext search in boolean mode. Here is the query I'm using:
$query = "SELECT *,
MATCH(title) AGAINST('$q' IN BOOLEAN MODE) AS score
FROM results
WHERE MATCH(title) AGAINST('$q' IN BOOLEAN MODE)
ORDER BY score DESC";
When I run a search for +divorce+refinance, the results returned are:
1) Divorce: Paying Off Spouse = Rate/Term Refinance
2) Divorce - What to Look Out For Regarding Divorced Borrowers
Am I right in thinking that the second result should not be appearing, as it does not have both words? If not, how can I create that functionality?

Maybe I am mistaken, but if you search this string +divorce+refinance you get a weird result. If you want to search both words, your should search for +divorce +refinance (with a space between).
I tested it and it returns only one row:
Divorce: Paying Off Spouse = Rate/Term Refinance

Your problem relates to the create a prioritized boolean query and for this type of query one has to go in depth of Boolean search and to now how the Boolean search is performed. In simple words let me explain you why the second number result of result is shown.
Once should first understand what does Boolean means in programming?
It means either condition is true or false i,e 0 to 1.
Now let me explain for the Boolean search is performed? You have given two words. Let us search the row by row in Boolean mode. Search engine start and searches the row by row now where ever the First word is found, it makes the record true and give score as 1 to the rows in which the first word is found and also prepare the numbers of words found in the row.
Now it moves the next word and do the same process gives the record True and makes a list of records wherever the word is found and also prepare the number of words found in the row.
Now there are two rows of results are available and they are clubbed and with the priority is given to the words with the maximum number of words and row here is the main problem lies.
Example
First >>> total nos. >> Second >> total nos. >>> Final >> row
Word >>> Results >> Word >>>> of words > > > Results >>no >>Answer
1 >>>>>>>> 2 >>>>>>>>1>>>>>>>>>1>>>>>>>>1.33>>>> 1 >>> 1.33
0 >>>>>>>> 0 >>>>>>>>2>>>>>>>>>2>>>>>>>>1.25>>>> 2 >>> 1.25
0 >>>>>>>> 0 >>>>>>>>1>>>>>>>>>0>>>>>>>>1.25>>>> 3 >>> 1
While clubbing two results lists when true added with false then result is true, as if you add 1 + 0 = 1 and the results are should with value more than 1. So, while scoring the relevancy to the words found it is always found that the search engine shows the results where it found any word.
Scoring relevancy queries are done in two types either ignore the scores which are equal to one and only do calculations on the records who's score is greater than 1. Second is to make such a query that it never shows the records equal to one. As in your case you can so the below things also to get the correct results for two words:
SELECT *, ( (1.3 * (MATCH(title) AGAINST ('+term +term2' IN BOOLEAN MODE))) + (0.6 * (MATCH(text) AGAINST ('+term +term2' IN BOOLEAN MODE))) ) AS score FROM results WHERE ( MATCH(title, text) AGAINST ('+term +term2' IN BOOLEAN MODE) ) HAVING relevance > 0 ORDER BY relevance DESC;
I know that using the word HAVING make the query little slow but there is no other solution available. Hope this solves your query.

Related

MySQL Query, make ORDER BY rand() put a specific row in the bottom

I have an
ORDER BY rand()
SQL query for choosing a random row in my table, is there a possibilty to make so that it wont choose a specific row? For example, if I have a column called "Boolean" in the table, and I only want to have the
rand()
choose the rows where "Boolean" is equal to "1", and put the rest in the bottom, is that possible?
Sorry if this is a stupid question, or if Ive explained it badly, but Im pretty new to SQL.
You can do this using order by:
order by (boolean = 1) desc,
rand()
order by can take more than one key value. The first says to put values with boolean = 1 first. In a numeric context, a boolean expression is treated as a number, with 1 for true and 0 for false. Hence the desc for the first expression.

Mysql Select Query with match and against issue

This is my code to perform a search
SELECT d.deal_id,
d.deal_title,
d.friendly_url
FROM wp_deals AS d
WHERE MATCH (d.deal_title) AGAINST ('Historic China eight day tour' IN BOOLEAN MODE)
GROUP BY d.deal_id
It works fine.and give 14 results.
one of them is
Great Sandy Straits two night Natural Encounters Tour for two with sunset cruise & more. Up to $1,071 off!
But when I search for "with" or "more" it becomes
SELECT d.deal_id,
d.deal_title,
d.friendly_url
FROM wp_deals AS d
WHERE MATCH (d.deal_title) AGAINST ('more' IN BOOLEAN MODE)
GROUP BY d.deal_id
SELECT d.deal_id,
d.deal_title,
d.friendly_url
FROM wp_deals AS d
WHERE MATCH (d.deal_title) AGAINST ('with' IN BOOLEAN MODE)
GROUP BY d.deal_id
and does not give any result although with and more both are their. I am not an expert with this type of search query.
But
When i search with "tour" that is also their it works fine.whats going on their.
could not understand as "tour","with" and "more" all contains four letters and also their in title.
You People are Genius..
What you suggest so it will workout.
Thanks in advance.
In Boolean full-text searches the stopword list applies and common words such as “some” or “then”... are stopwords and do not match if present in the search string : http://dev.mysql.com/doc/refman/5.5/en//fulltext-boolean.html
Stopword list : http://dev.mysql.com/doc/refman/5.0/en/fulltext-stopwords.html
The best known solutions are disable the list or add / remove values ​​from it : How to reset stop words in MYSQL?
Also :
SELECT d.deal_id,
d.deal_title,
d.friendly_url
FROM
wp_deals AS d
WHERE MATCH (d.deal_title) AGAINST ('+more*' IN BOOLEAN MODE)
GROUP BY d.deal_id

count rows in result width fulltext search mysql

I would like to ask if anybody can help me with this query
SELECT count(MATCH(product_text) AGAINST('lorem*' in boolean mode)) AS score FROM table_products WHERE MATCH (text) AGAINST ('lorem*' in boolean mode) limit 0,50000
There is fulltext index on column text. A.m. query return sum of rows as score.
What I wont is to count fulltext search results. If the number of results(rows)
is higher than 50000, than count-sum 50000 will be returned, otherwise the exact count of
of results is returned.
Problem is that it is not fast on table width 1,5 million rows if user try to find for example word "lorem" and this word appears in table f.e. more than 500 000 x.
I tried also
SELECT id,name,product_text,price, MATCH(product_text) AGAINST('lorem*' in boolean mode) AS score FROM table_products WHERE MATCH(product_text) AGAINST('lorem*' in boolean mode) and show_product='1' limit 0,50000
... width php mysql_num_rows
Another problem is that in following query mysql use fulltext index only
and sort by another column or by score is than slow
SELECT id,name,product_text,price, MATCH(product_text) AGAINST('lorem*' in boolean mode) AS score FROM table_products WHERE MATCH(product_text) AGAINST('lorem*' in boolean mode) and show_product='1' order by score desc limit 0,50000
resp. (..order by name, order by price)
Is there another better and faster way?
Many thanks for any help.

MySQL Fulltext won't find rows

I've got a car searching website and when I set up the search system it failed to give any dynamic searching. For instance, it's got 500 cars on their and ~5 are 'Toyota Ist'.
So when I search 'Ist' I get no results. (See query)
SELECT *,MATCH(aTitle, aDescribe, aCarModel, aCarWheels, aCarStereo, aCarIntTrim, aCarTrans, aCarDrive, aCarFuel, aCarPlate, aCarColour) AGAINST('toyota ist' IN BOOLEAN MODE) AS score FROM at_auction WHERE status = '1' AND aCarYear >= 1992 AND aCarYear <= 2012 AND startBid >= 0 AND startBid <= 20000 AND MATCH(aTitle, aDescribe, aCarModel, aCarWheels, aCarStereo, aCarIntTrim, aCarTrans, aCarDrive, aCarFuel, aCarPlate, aCarColour) AGAINST('toyota ist' IN BOOLEAN MODE) AND closeTime >= '201201060842' ORDER BY opt_feature DESC, score DESC, score DESC LIMIT 0,10
But if I search 'Toyota Ist', i'll get a whole lot of Toyota Results. And the Ist car isn't neccessarily high in the list.
So the problem underlying is how do I set it up so that if someone searches just one word, say the cars model, it'll return the row... and how it can return the row if they search with multiple words like 'toyota camry', although that actually seems to work.
The MATCH fields are all FULLTEXT and aCarModel etc store the cars model and they're usually just one word like 'Ist' or 'Camry'.
Thanks.
If over 50% of the cars are "Toyota", then "Toyota" will be ignored in the search. Also, the FULLTEXT doesn't index words under 4 letters.
50% limit and other tuning bits about FULLTEXT
http://dev.mysql.com/doc/refman/5.1/en/fulltext-fine-tuning.html
Minimum word length
http://dev.mysql.com/doc/refman/5.5/en/server-system-variables.html#sysvar_ft_min_word_len

Select a random row but with odds

I have a dataset of rows each with an 'odds' number between 1 and 100. I am looking to do it in the most efficient way possible. The odds do not necessarily add up to 100.
I have had a few ideas.
a)
Select the whole dataset and then add all the odds up and generate a random number between 1 and that number. Then loop through the dataset deducting the odds from the number until it is 0.
I was hoping to minimize the impact on the database so I considered if I could only select the rows I needed.
b)
SELECT * FROM table WHERE (100*RAND()) < odds
I considered LIMIT 0,1
But then if items have the same probability only one of the will be returned
Alternatively take the whole dataset and pick a random one from there... but then the odds are affected as it becomes a random with odds and then a random without odds thus the odds become tilted in favour of the higher odds (even more so).
I guess I could order by odds ASC then take the whole dataset and then with PHP take a random out of the rows with the same odds as the first record (the lowest).
Seems like a clumsy solution.
Does anyone have a superior solution? If not which one of the above is best?
Do some up-front work, add some columns to your table that help the selection. For example suppose you have these rows
X 2
Y 3
Z 1
We add some cumulative values
Key Odds Start End
X 2 0 1 // range 0->1, 2 values == odds
Y 3 2 4 // range 2->4, 3 values == odds
Z 1 5 5 // range 5->5, 1 value == odds
Start and End are chosen as follows. The first row has a start of zero. Subsequent rows have a start one more than previous end. End is the (Start + Odds - 1).
Now pick a random number R in the range 0 to Max(End)
Select * from T where R >= T.Start and R <= T.End
If the database is sufficiently clever we may we be able to use
Select * from T where R >= T.Start and R <= (T.Start + T.Odds - 1)
I'm speculating that having an End column with an index may give the better performance. Also the Max(End) perhaps gets stashed somewhere and updated by a trigger when ncessary.
Clearly there's some hassle in updating the Start/End. This may not be too bad if either
The table contents are stable
or insertions are in someway naturally ordered, so that each new row just continues from the old highest.
What if you took your code, and added an ORDER BY RAND() and LIMIT 1?
SELECT * FROM table WHERE (100*RAND()) < odds ORDER BY RAND() LIMIT 1
This way, even if you have multiples of the same probability, it will always come back randomly ordered, then you just take the first entry.
select * from table
where id between 1 and 100 and ((id % 2) <> 0)
order by NewId()
Hmm. Not entirely clear what result you want, so bear with me if this is a bit crazy. That being said, how about:
Make a new table. The table is a fixed data table, and looks like this:
Odds
====
1
2
2
3
3
3
4
4
4
4
etc,
etc.
Then join from your dataset to that table on the odds column. You'll get as many rows back for each row in your table as the given odds of that row.
Then just pick one of that set at random.
If you have an index on the odds column, and a primary key, this would be very efficient:
SELECT id, odds FROM table WHERE odds > 0
The database wouldn't even have to read from the table, it would get everything it needed from the odds index.
Then, you'll select a random value between 1 and the number of rows returned.
Then select that row from the array of rows returned.
Then, finally, select the whole target row:
SELECT * FROM table WHERE id = ?
This assures an even distribution between all rows with an odds value.
Alternatively, put the odds in a different table, with an autoincrement primary key.
Odds
ID odds
1 4
2 9
3 56
4 12
Store the ID foreign key in the main table instead of the odds value, and index it.
First, get the max value. This never touches the database. It uses the index:
SELECT MAX(ID) FROM Odds
Get a random value between 1 and the max.
Then select the record.
SELECT * FROM table
JOIN Odds ON Odds.ID = table.ID
WHERE Odds.ID >= ?
LIMIT 1
This will require some maintenance if you tend to delete Odds value or roll back inserts to keep the distribution even.
There is a whole chapter on random selection in the book SQL Antipatterns.
I didn't try it, but maybe something like this (with ? a random number from 0 to SUM(odds) - 1)?
SET #prob := 0;
SELECT
T.*,
(#prob := #prob + T.odds) AS prob
FROM table T
WHERE prob > ?
LIMIT 1
This is basically the same as your idea a), but entirely within one (well, technically two if you count the variable set-up) SQL commands.
A general solution, suitable for O(log(n)) updates, is something like this:
Store objects as leaves of a (balanced) tree.
At each branch node, store the weights of all objects under it.
When adding, removing, or modifying nodes, update weights of their parents.
Then pick a number between 0 and (total weight - 1) and navigate down the tree until you find the right object.
Since you don't care about the order of things in the tree, you can store them as an array of N pointers and N-1 numbers.

Categories