MySQL Full-text search - search for short words - php

I have a problem. I made a simple search engine which searches by brand and model of car. For reasons of query performance and a lot of data in database, I decided to use full-text search. It's ok, but now I come across the problem:
I would like to find all cars with brand "Audi" and with model "Q7". For now, I have this SQL query, but it doesn't work right, because of word length "Q7":
SELECT `a`.`id`, `a`.`title`, `a`.`askprice`, `a`.`description`, `a`.`picture`
FROM (`mm_ads` as a)
WHERE `a`.`category` = '227'
AND `a`.`askprice` >= '0'
AND `a`.`askprice` <= '144000'
AND (MATCH(a.title) AGAINST ('+audi +q7' IN BOOLEAN MODE ))
GROUP BY `a`.`id`
ORDER BY `a`.`id` ASC
LIMIT 30
I don't have access to modify MySQL config file, to set ft_min_word_len to value 2. For now value is 3. Is there any other way to deal with that?
Here is another problem:
I would like to get all cars brand "BMW" and model "116". For example, I have a car named BMW, 1, 116i. My SQL query is:
`SELECT `a`.`id`, `a`.`title`, `a`.`askprice`, `a`.`description`, `a`.`picture`
FROM (`mm_ads` as a)
WHERE `a`.`category` = '227'
AND `a`.`askprice` >= '0'
AND `a`.`askprice` <= '144000'
AND (MATCH(a.title) AGAINST ('+bmw +116' IN BOOLEAN MODE))
GROUP BY `a`.`id`
ORDER BY `a`.`id` ASC
LIMIT 30`
Search return 0 rows. Why? All input strings ("BMW", "116") are min length 3. What am I doing wrong?
Regards, Mario

I had a similar issue when dealing with match against (regarding text length) and my answer was to strlen the string first and switch between like and match against for shorter words. Not what I would call graceful, but it was all I could do since I too had no access to the config.
As for the second question, are you sure the default isn't 4? I recall I couldn't search on the term "art" in my case. 3 letters. Had to go with like on everything below 4 chars.

Unless you have access to the config file and can change it I fear there is very little to do.
A change to ft_min_word_len requires a server restart and a full rebuild of the full text index.
As found here

Try this:
for this search: "bmw 116i"
(MATCH(a.title) AGAINST ('+bmw +116i "bmw 116i"' IN BOOLEAN MODE ))
not the best solution but might help...

Related

MySQL: Search for a sentence and ignore order of words

Is there something in MySQL will allow to search for sentence without care about the order of words, ex: Search for Europe league match league Europe, same with sentences contain 3 words ... etc
I know I can handle this in Programming language and generate MySQL query like this:
select p.title
from post p
where p.title = 'Europe league' or p.title = 'league Europe'
order by case
when text = 'Europe league' then 1
when text = 'league Europe' then 2
end
limit 10;
But is there a self handle in MySQL for this issue? thx
You're looking for full text search. Read up on the docs, so you can use the following kind of query:
SELECT p.title
FROM post p
WHERE MATCH (p.title) AGAINST ("+europe +league" IN BOOLEAN MODE)
LIMIT 10
Obviously the number of iterations becomes prohibitive as you add more words; one solution to this would be to use the instr function (http://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_instr). That way you could do something along the lines of the following:
select p.title
from post p
where instr('Europe',p.title)>0
AND instr('League ',p.title)>0
AND instr('Match',p.title)>0
limit 10;

MySQL - Full Text Search - Losing index

So here is the scenario:
MySQL
have 1 MYISAM Table
colum named v.value has a full text index
Basic query works fine, uses the index as expected:
SELECT p.online_identifier
FROM (...)
WHERE r.area_id = 3 AND s.state_id= 4
AND (snap.area_has_catalogues_attributes_id = 7028
AND MATCH (v.value) AGAINST('+SomeBrand' IN BOOLEAN MODE))
Now when I add an OR, the full text search index (on v.value) is not used.
I run Explain to verify it.
The query would look something like this:
(...)
WHERE r.area_id = 3 AND s.state_id= 4 AND
(snap.area_has_catalogues_attributes_id = 7028 AND MATCH (v.value) AGAINST('+SomeBrand' IN BOOLEAN MODE))
OR (snap.area_has_catalogues_attributes_id = 7045 AND MATCH (v.value) AGAINST('+OtherBrand' IN BOOLEAN MODE))
I dont understand why.
Any ideas?
Here is something interesting: The query you have is actually causing the FULLTEXT indexing to be ignored. Don't worry, it is not your fault. I wrote about this before : Is there a way to hint to query optimizer to MySQL which constraints should be done first?
After looking over my answer and its subsequent links, now let's look at your original query:
SELECT p.online_identifier
FROM (...)
WHERE r.area_id = 3 AND s.state_id= 4
AND (snap.area_has_catalogues_attributes_id = 7028
AND MATCH (v.value) AGAINST('+SomeBrand' IN BOOLEAN MODE))
You will have to execute the FULLTEXT search alone and retrieve IDs only.
Use those IDs to join to the other table aliases.
Let me take a query from one of my other links to demonstrate what to do to your query
Instead of this query
select * from seeds WHERE MATCH(text) AGAINST
("mount cameroon" IN BOOLEAN MODE) = 4;
you must refactor it into something like this:
SELECT B.* FROM
(
SELECT id,MATCH(text) AGAINST
("mount cameroon" IN BOOLEAN MODE) score
FROM seeds
WHERE MATCH(text) AGAINST
("mount cameroon" IN BOOLEAN MODE)
) A
INNER JOIN seeds B USING (id)
WHERE A.score = 4;
Give it a Try !!!

SQL Query for closest match at the beginning of a string

Am currently using Mysql and PHP.
Looking for a query that will take a number and find the closet match for the begining of a set of digits, for example I have the number 019235678910, 026725678910, 026825678910 and my table looks like this.
Table - Destintation
Name Number
Watford 01923
Oxford 026
Romford 026
Crawford 0267
Topford 02672
So when I pass 019235678910 the result would be Watford, 026725678910 would be Topford and 026825678910 would be Oxford and Romford.
I'm also not sure if MYSQL can do this directly or would need to work in conjunction with PHP?
Here one way for getting all of them:
select d.*
from Destination d join
(select length(Number) as maxlen, number
from destination d
where YOURVALUE like concat(Number, '%')
order by maxlen desc
limit 1
) dsum
on d.Number = dsum.Number
Because you are looking for initial sequences, there is only one maximum match on the numbers (hence the limit 1 works).
By the way, the field called number is clearly a character field. Personally, I think it bad practice to call a character field "number" -- something called cognitive dissonance.
SELECT Name, Number
FROM Destintation
WHERE LEFT('026725678910', LENGTH(Number)) = Number
or perhaps
WHERE '026725678910' LIKE CONCAT(Number, '%')

PHP mysql search queries

I'm trying to create a search engine for an inventory based site. The issue is that I have information inside bbtags (like in [b]test[/b] sentence, the test should be valued at 3, whereas sentence should be valued at 1).
Here is an example of an index:
My test sentence, my my (has a SKU of TST-DFS)
The Database:
|Product| word |relevancy|
| 1 | my | 3 |
| 1 | test | 1 |
| 1 |sentence| 1 |
| 1 | TST-DFS| 10 |
But how would I match TST-DFS if the user typed in TST DFS? I would like that SKU to have a relevancy of say 8, instead of the full 10..
I have heard that the FULL TEXT search feature in MySQL would help, but I can't seem to find a good way to do it. I would like to avoid things like UNIONS, and to keep the query as optimized as possible.
Any help with coming up with a good system for this would be great.
Thanks,
Max
But how would I match TST-DFS if the user typed in TST DFS?
I would like that SKU to have a relevancy of say 8, instead of the full 10..
If I got the question right, the answer is actually easy.
Well, if you forge your query a little before sending it to mysql.
Ok, let's say we have $query and it contains TST-DFS.
Are we gonna focus on word spans?
I suppose we should, as most search engines do, so:
$ok=preg_match_all('#\w+#',$query,$m);
Now if that pattern matched... $m[0] contains the list of words in $query.
This can be fine-tuned to your SKU, but matching against full words in a AND fashion is pretty much what the user presumes is happening. (as it happens over google and yahoo)
Then we need to cook a $expr expression that will be injected into our final query.
if(!$ok) { // the search string is non-alphanumeric
$expr="false";
} else { // the search contains words that are no in $m[0]
$expr='';
foreach($m[0] as $word) {
if($expr)
$expr.=" AND "; // put an AND inbetween "LIKE" subexpressions
$s_word=addslashes($word); // I put a s_ to remind me the variable
// is safe to include in a SQL statement, that's me
$expr.="word LIKE '%$s_word%'";
}
}
Now $expr should look like "words LIKE '%TST%' AND words LIKE '%DFS%'"
With that value, we can build the final query:
$s_expr="($expr)";
$s_query=addslashes($query);
$s_fullquery=
"SELECT (Product,word,if((word LIKE '$s_query'),relevancy,relevancy-2) as relevancy) ".
"FROM some_index ".
"WHERE word LIKE '$s_query' OR $s_expr";
Which shall read, for "TST-DFS":
SELECT (Product,word,if((word LIKE 'TST-DFS'),relevancy,relevancy-2) as relevancy)
FROM some_index
WHERE word LIKE 'TST-DFS' OR (word LIKE '%TST%' AND word LIKE '%DFS%')
As you can see, in the first SELECT line, if the match is partial, mysql will return relevancy-2
In the third one, the WHERE clause, if the full match fails, $s_expr, the partial match query we cooked in advance, is tried instead.
I like to lower case everything and strip out special characters (like in a phone number or credit card I take everything out on both sides that isn't a number)
Rather than try to create your own FTS solution, you could try to fit the MySQL FTS engine to your requirements. What I've seen done is create a new table to store your FTS data. Create a column for each different piece of data that you want to have a different relevance. For your sku field you could store the raw sku, with spaces, underscores, hyphens and any other special character intact. Then store a stripped down version with all these things removed. You may also want to store a version with leading zeros removed, as people often leave things like that out. You can store all these variations in the same column. Store your product name in another column, and the product description in another column. Create a separate index on each column. Then when you do your search, you can search each column individually, and multiply the rank of the results based on how important you think that column is. So you could multiply sku results by 10, title by 5 and leave description results as is. You may have to do a little experimentation to get the results you want, but it may ultimately be simpler than creating your own index.
Create a keywords table. Something along the lines of:
integer keywordId (autoincrement) | varchar keyword | int pointValue
Assign all possible keywords, skus, etc, into this table. Create another table, a post-keywords bridge, (assuming postId is the id you've assigned in your original table) along the lines of:
integer keywordId | integer postId
Once you have this, you can easily add keywords to each post as it is interested. To calculate total point value for a given post, a query such as the following should do the trick:
SELECT sum(pointValue) FROM keywordPostsBridge kpb
JOIN keywords k ON k.keywordId = kpb.keywordId
WHERE kpb.postId = YOUR_INTENDED_POST
I think the solution is quite straightforward unless I missed something.
Basically run two search, one is exact match, the other is like match or regex match.
Join two resultsets together, like match left join exact match. Then for example:
final_relevancy = (IFNULL(like_relevancy, 0) + IFNULL(exact_relevancy, 0) * 3) / 4
I didn't try this myself though. Just an idea.
I would add a column that is stripped of all special character's, misspellings, and then upcased (or create a function that compares on text that has been stripped and upcased). That way your relevancy will be consistent.
/*
q and q1 - you table
this query takes too much resources,
make from it update-query ( scheduled task or call it on_save if you develop new system )
*/
SELECT
CASE
WHEN word NOT REGEXP "^[a-zA-Z]+$"
/*many replace with junk characters
or create custom function
or if you have full db access install his https://launchpad.net/mysql-udf-regexp
*/
THEN REPLACE(REPLACE( word, '-', ' ' ), '#', ' ')
ELSE word
END word ,
CASE
WHEN word NOT REGEXP "^[a-zA-Z]+$"
THEN 8
ELSE relevancy
END relevancy
FROM ( SELECT 'my' word,
3 relevancy
UNION
SELECT 'test' word,
1 relevancy
UNION
SELECT 'sentence' word,
1 relevancy
UNION
SELECT 'TST-DFS' word,
10 relevancy
)
q
UNION
SELECT *
FROM ( SELECT 'my' word,
3 relevancy
UNION
SELECT 'test' word,
1 relevancy
UNION
SELECT 'sentence' word,
1 relevancy
UNION
SELECT 'TST-DFS' word,
10 relevancy
)
q1
it is a page coading where query result shows
**i can not use functions by use them work are more easier**
<html>
<head>
</head>
<body>
<?php
//author S_A_KHAN
//date 10/02/2013
$dbcoonect=mysql_connect("127.0.0.1","root");
if (!$dbcoonect)
{
die ('unable to connect'.mysqli_error());
}
else
{
echo "connection successfully <br>";
}
$data_base=mysql_select_db("connect",$dbcoonect);
if ($data_base==FALSE){
die ('unable to connect'.mysqli_error($dbcoonect));
}
else
{
echo "connection successfully done<br>";
***$SQLString = "select * from user where id= " . $_GET["search"] . "";
$QueryResult=mysql_query($SQLString,$dbcoonect);***
echo "<table width='100%' border='1'>\n";
echo "<tr><th bgcolor=gray>Id</th><th bgcolor=gray>Name</th></tr>\n";
while (($Row = mysql_fetch_row($QueryResult)) !== FALSE) {
echo "<tr><td bgcolor=tan>{$Row[0]}</td>";
echo "<td bgcolor=tan>{$Row[1]}</td></tr>";
}
}
?>
</body>
</html>

Php/ MySql 'Advanced Search' Page

I'm working on an 'advanced search' page on a site where you would enter a keyword such as 'I like apples' and it can search the database using the following options:
Find : With all the words, With the
exact phrase , With at least one of
the words, Without the words
I can take care of the 'Exact phrase' by:
SELECT * FROM myTable WHERE field='$keyword';
'At least one of the words' by:
SELECT * FROM myTable WHERE field LIKE '%$keyword%';//Let me know if this is the wrong approach
But its the 'With at least one of the words' and 'Without the words' that I'm stuck on.
Any suggestions on how to implement these two?
Edit: Regarding 'At least one word' it wouldn't be a good approach to use explode() to break the keywords into words, and run a loop to add
(field='$keywords') OR ($field='$keywords) (OR)....
Because there are some other AND/OR clauses in the query also and I'm not aware of the maximum number of clauses there can be.
I would suggest the use of MySQL FullText Search using this with the Boolean Full-Text Searches functionality you should be able to get your desired result.
Edit:
Requested example based on your requested conditions ("Its just one field and they can pick either of the 4 options (i.e 1 word, exact words, at least 1 word, without the term).")
I am assuming you are using php based on your initial post
<?php
$choice = $_POST['choice'];
$query = $_POST['query'];
if ($choice == "oneWord") {
//Not 100% sure what you mean by one word but this is the simplest form
//This assumes $query = a single word
$result = mysql_query("SELECT * FROM table WHERE MATCH (field) AGAINST ('{$query}' IN BOOLEAN MODE)");
} elseif ($choice == "exactWords") {
$result = mysql_query("SELECT * FROM table WHERE MATCH (field) AGAINST ('\"{$query}\"' IN BOOLEAN MODE)");
} elseif ($choice == "atLeastOneWord") {
//The default with no operators if given multiple words will return rows that contains at least one of the words
$result = mysql_query("SELECT * FROM table WHERE MATCH (field) AGAINST ('{$query}' IN BOOLEAN MODE)");
} elseif ($choice == "withoutTheTerm") {
$result = mysql_query("SELECT * FROM table WHERE MATCH (field) AGAINST ('-{$query}' IN BOOLEAN MODE)");
}
?>
hope this helps for full use of the operators in boolean matches see Boolean Full-Text Searches
You could use
With at least one of the words
SELECT * FROM myTable WHERE field LIKE '%$keyword%'
or field LIKE '%$keyword2%'
or field LIKE '%$keyword3%';
Without the word
SELECT * FROM myTable WHERE field NOT LIKE '%$keyword%';
I'm not sure you could easily do those search options in a naive manner as the other two.
It would be worth your while implementing a better search engine if you need to support those scenarios. A simple one that could probably get you by is something along these lines:
When an item is added to the database, it is split up into the individual words. At this point "common" words (the, a, etc...) are removed (probably based on a common_words table). The remaining words are added to a words table if they are not already present. There is then a link made between the word entry and the item entry.
When searching, it is then a case of getting the word ids from the word table and the appropriate lookup of item ids in the joining table.
Search is notoriously difficult to do well.
You should Consider using a third party search engine using something like Lucene or Sphider.
Giraffe and Re0sless pooseted 2 good answers.
notes:
"SELECT * " sucks... only select the columns that you need.
Re0sless puts a "OR" between keywords.
- you should eliminate common words (" ","i","am","and"..etc)
- mysql has a 8kb i belive limit on the size of the query, so for really long SELECTS you should slipt it into separate queries.
- try to eliminate duplicate keywords (if i search for "you know you like it" the SELECT should basically only search for "you" once and elimnate common words as "it")
Also try to use "LIKE" and "MATCH LIKE" (see mysql man page) it could do wonders for "fuzzy" searches

Categories