I'm helping a friend build a dictionary-of-sorts for a project he's working on. Part of the project is to create a Search functionality. The database is in MySQL, backend in php.
Now, running our simple query was a piece of cake:
SELECT *,
(
(CASE WHEN word LIKE '%$query%' THEN 1 ELSE 0 END) +
(CASE WHEN defin LIKE '%$query%' THEN 1 ELSE 0 END)
) AS relev
FROM dictionary
WHERE word LIKE '%$q%'
OR defin LIKE '%$q%'
ORDER BY relev DESC;
It produced good results; for example, inputting "fire" gave us fire, firemen, firetruck, on fire, etc. However, we also want room for error: We want the mistake "prnk" to give us prank, prink and also pink, or the word "mule" to also suggest the word "mole".
Quite surprisingly, we weren't able to find any information on it. The relevence system is entirely superficial because we don't need actual relevence (just an overall pointer), but we do need something (and that's why we went for the LIKE statement and not the MATCH...AGAINST statement, where we found nowhere to sort by relevence.)
The database only consists of three things: id, word, defin. Simple as that, as that was the required complexity (or simplicity.)
Thanks to anyone in advance.
Try testing if the word sounds like one in the dictionary, so something along the lines of:
SELECT *,
(
(CASE WHEN word LIKE '%$query%' THEN 1 ELSE 0 END) +
(CASE WHEN defin LIKE '%$query%' THEN 1 ELSE 0 END) +
(CASE WHEN LEFT(SOUNDEX(word), 4) = LEFT(SOUNDEX('$query'), 4) THEN 1 ELSE 0 END) +
(CASE WHEN LEFT(SOUNDEX(defin), 4) = LEFT(SOUNDEX('$query'), 4) THEN 1 ELSE 0 END)
) AS relev
FROM dictionary
WHERE word LIKE '%$q%'
OR defin LIKE '%$q%'
ORDER BY relev DESC;
Regarding the prank...
http://webarto.com/80/did-you-mean-api
$q = "prnk"
$dym = new DYM;
$spell = $dym->check($q);
if(!empty($spell)){
echo $spell; // prank
}
(not really API, not really reliable, but it's working in less than 0.5s)
For mule/mole part try finding Levenshtein implementation for SQL...
http://www.artfulsoftware.com/infotree/queries.php?&bw=1280#552 (link not working but Google it)
http://php.net/manual/en/function.levenshtein.php
Try SOUNDS LIKE:
http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#operator_sounds-like
Take a look at this: Soundex (Sounds Like) function in mysql
Apart from Soundex and Levenshtein, you could also look into Metaphone or Double Metaphone, even though the latter ones doesn't have built-in support in MySQL. PHP supports it though, see metaphone - and there are a few double metaphone implementations floating around as well (ie. http://swoodbridge.com/DoubleMetaPhone/).
Related
I'm trying to do a little search engine for my website.
First of all, user type in some keywords and go to result page that use this code :
$result = array();
$keyword_tokens = explode(' ', $keywords);
$keyword_tokens = array_map(
function($keywords) {
return mysql_real_escape_string(trim($keywords));
},
$keyword_tokens
);
$sql = "SELECT * FROM search_table WHERE concat(title, description) LIKE '%";
$sql .= implode("%' OR concat(title, description) LIKE '%", $keyword_tokens) . "%' ORDER BY instr(description, '$keywords') DESC, instr(title, '$keywords') DESC";
$result=mysql_query($sql);
This code allows to search for keywords requested by user and sort by $keywords, so by the full exact string of group of keywords...
What I'm trying to do is to order results by the most occurence of each keyword.
For example, if a row of my sql result contains 5 keywords and other one 3 etc.. the 5 keywords should come up first. I am searching to sort my results by the more keywords matching.
Hope everything is understandable...
Help will be really appreciated !
Lovely thing about MySQL, depending on your point of view. It treats data how it thinks you want it to be treated. 1 + '1'? pretty clear you want to treat the string like a number, so it happily performs the addition for you.
Works for boolean operands too (just have to watch the operator precedence). (a like '%foo%') + (a like '%bar%') recognises a numeric context, treats the boolean result true as 1 and the boolean result false as 0 - essentially counting the keyword matches for you.
How can you use that? Take your where clause, replace all the or with +, and make sure each individual like statement is wrapped in parenthesis, then order by it.
eg:
order by (concat(title, description) like '%keyword1%')
+ (concat(title, description) like '%keyword2%') desc
You can accomplish something similar using a fulltext index, but the weighting can be a little weird. Syntax for that would be:
create fulltext index idx on search_table(title, description)
^^ just do that bit once.
select *
from search_table
where match(title, description) against ('keyword1 keyword2 ...')
order by match(title, description) against ('keyword1 keyword2 ...') desc
This has the severe bonus of being far less annoying to construct a query for.
Here's a proof of concept for you that demonstrates both methods. (albeit only against a single column - but it gets the point across)
How can i find similar word in sql? for example:
i have those words -
hell, llo, hl, lh
I want to find the matching word to "hello" or "helloworld"
And the order will be by most matching letters
i tried to do with "LIKE" but it limited results
You may have some luck with fulltext indexes
I think you are looking for something like this:
SELECT words.*
FROM words
WHERE 'hello' LIKE CONCAT('%', word, '%')
ORDER BY
LENGTH(word) DESC
Please see fiddle here. Please notice that this query won't be very fast, as it can't make use of an index.
Try using the LOCATE function, which returns the first character position when it finds a needle in a haystack. You could do that for each of your terms with an IF > 0 THEN...:
SELECT * FROM tablename
ORDER BY
(
IF(LOCATE('hell', field_to_search) > 0, 1, 0)
+ IF(LOCATE('llo', field_to_search) > 0, 1, 0)
+ IF(LOCATE('hl', field_to_search) > 0, 1, 0)
+ IF(LOCATE('lh', field_to_search) > 0, 1, 0)
) DESC
If field_to_search contains (case insensitively) all four terms, it will have 4 in that order by field, and zero if it doesn't match any etc. This won't limit your results like WHERE field_to_search LIKE '%hell%' etc.
Using the IF(LOCATE structure will also allow you to give different weighting to different search terms, e.g. if it matches hell you might give it 4 points, llo might give it 3. So it could theoretically match hl and lh but not hell or llo and still come up above a term that matches the top two (if you weight the bottom two higher than the top etc).
Documentation: http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_locate
Please help...
I need to be able to search an exact words in my database.., I've already used different methods..
Method 1
$param2 = "SELECT * from item WHERE prodname REGEXP '[[:<:]]($param)[[:>:]]'
order by CASE WHEN instr(prodname, '$param') = 0 then 1 else 0 end,
instr(prodname, '$param') ASC";`
This is working really good but when I tried searching words with \ or " it returns an error. I've already used htmlspecialchars and mysql_real_escape_string but the problem still exist..
Method 2
$param2 = "WHERE prodname LIKE '$param %' OR prodname LIKE '% $param'
OR prodname LIKE ' $param%' OR prodname LIKE '%$param '
OR prodname LIKE '% $param %'
order by CASE WHEN instr(prodname, '$param') = 0 then 1 else 0 end, instr(prodname, '$param') ASC";`
This is also working good but when I type the exact product eg "STAMP PAD INK (RED)" it returns the result "NOT FOUND" but it shows when I only type "STAMP PAD INK".. for some reason It works when I add '%$param%' but when I type an exact word ex. "INK" the word "WRINKLED" also shows and I dont want that.
I can't use fulltext..
For method 2 :
To get the exact product, use: LIKE '$param'
The % character matches any number of characters, even zero characters, but there is still the space characters in '% $param %', for instance.
More info: http://dev.mysql.com/doc/refman/5.0/en/string-comparison-functions.html#operator_like
I think you should try using the PHP function addslashes() for $param as in:
$param = addslashes($param);
This will make method 1 work.
mysql_real_escape_string's behavior is based on the default character set and I don't know which one you are using. Since it's not working for you, then addslashes probably will.
BTW, you may want to look further into how to use fulltext indexing because REGEXP has the potential of becoming mindnumbingly slow as the database grows. And method 2 that you show will perform very slowly, always.
I'm trying to create a search engine for an inventory based site. The issue is that I have information inside bbtags (like in [b]test[/b] sentence, the test should be valued at 3, whereas sentence should be valued at 1).
Here is an example of an index:
My test sentence, my my (has a SKU of TST-DFS)
The Database:
|Product| word |relevancy|
| 1 | my | 3 |
| 1 | test | 1 |
| 1 |sentence| 1 |
| 1 | TST-DFS| 10 |
But how would I match TST-DFS if the user typed in TST DFS? I would like that SKU to have a relevancy of say 8, instead of the full 10..
I have heard that the FULL TEXT search feature in MySQL would help, but I can't seem to find a good way to do it. I would like to avoid things like UNIONS, and to keep the query as optimized as possible.
Any help with coming up with a good system for this would be great.
Thanks,
Max
But how would I match TST-DFS if the user typed in TST DFS?
I would like that SKU to have a relevancy of say 8, instead of the full 10..
If I got the question right, the answer is actually easy.
Well, if you forge your query a little before sending it to mysql.
Ok, let's say we have $query and it contains TST-DFS.
Are we gonna focus on word spans?
I suppose we should, as most search engines do, so:
$ok=preg_match_all('#\w+#',$query,$m);
Now if that pattern matched... $m[0] contains the list of words in $query.
This can be fine-tuned to your SKU, but matching against full words in a AND fashion is pretty much what the user presumes is happening. (as it happens over google and yahoo)
Then we need to cook a $expr expression that will be injected into our final query.
if(!$ok) { // the search string is non-alphanumeric
$expr="false";
} else { // the search contains words that are no in $m[0]
$expr='';
foreach($m[0] as $word) {
if($expr)
$expr.=" AND "; // put an AND inbetween "LIKE" subexpressions
$s_word=addslashes($word); // I put a s_ to remind me the variable
// is safe to include in a SQL statement, that's me
$expr.="word LIKE '%$s_word%'";
}
}
Now $expr should look like "words LIKE '%TST%' AND words LIKE '%DFS%'"
With that value, we can build the final query:
$s_expr="($expr)";
$s_query=addslashes($query);
$s_fullquery=
"SELECT (Product,word,if((word LIKE '$s_query'),relevancy,relevancy-2) as relevancy) ".
"FROM some_index ".
"WHERE word LIKE '$s_query' OR $s_expr";
Which shall read, for "TST-DFS":
SELECT (Product,word,if((word LIKE 'TST-DFS'),relevancy,relevancy-2) as relevancy)
FROM some_index
WHERE word LIKE 'TST-DFS' OR (word LIKE '%TST%' AND word LIKE '%DFS%')
As you can see, in the first SELECT line, if the match is partial, mysql will return relevancy-2
In the third one, the WHERE clause, if the full match fails, $s_expr, the partial match query we cooked in advance, is tried instead.
I like to lower case everything and strip out special characters (like in a phone number or credit card I take everything out on both sides that isn't a number)
Rather than try to create your own FTS solution, you could try to fit the MySQL FTS engine to your requirements. What I've seen done is create a new table to store your FTS data. Create a column for each different piece of data that you want to have a different relevance. For your sku field you could store the raw sku, with spaces, underscores, hyphens and any other special character intact. Then store a stripped down version with all these things removed. You may also want to store a version with leading zeros removed, as people often leave things like that out. You can store all these variations in the same column. Store your product name in another column, and the product description in another column. Create a separate index on each column. Then when you do your search, you can search each column individually, and multiply the rank of the results based on how important you think that column is. So you could multiply sku results by 10, title by 5 and leave description results as is. You may have to do a little experimentation to get the results you want, but it may ultimately be simpler than creating your own index.
Create a keywords table. Something along the lines of:
integer keywordId (autoincrement) | varchar keyword | int pointValue
Assign all possible keywords, skus, etc, into this table. Create another table, a post-keywords bridge, (assuming postId is the id you've assigned in your original table) along the lines of:
integer keywordId | integer postId
Once you have this, you can easily add keywords to each post as it is interested. To calculate total point value for a given post, a query such as the following should do the trick:
SELECT sum(pointValue) FROM keywordPostsBridge kpb
JOIN keywords k ON k.keywordId = kpb.keywordId
WHERE kpb.postId = YOUR_INTENDED_POST
I think the solution is quite straightforward unless I missed something.
Basically run two search, one is exact match, the other is like match or regex match.
Join two resultsets together, like match left join exact match. Then for example:
final_relevancy = (IFNULL(like_relevancy, 0) + IFNULL(exact_relevancy, 0) * 3) / 4
I didn't try this myself though. Just an idea.
I would add a column that is stripped of all special character's, misspellings, and then upcased (or create a function that compares on text that has been stripped and upcased). That way your relevancy will be consistent.
/*
q and q1 - you table
this query takes too much resources,
make from it update-query ( scheduled task or call it on_save if you develop new system )
*/
SELECT
CASE
WHEN word NOT REGEXP "^[a-zA-Z]+$"
/*many replace with junk characters
or create custom function
or if you have full db access install his https://launchpad.net/mysql-udf-regexp
*/
THEN REPLACE(REPLACE( word, '-', ' ' ), '#', ' ')
ELSE word
END word ,
CASE
WHEN word NOT REGEXP "^[a-zA-Z]+$"
THEN 8
ELSE relevancy
END relevancy
FROM ( SELECT 'my' word,
3 relevancy
UNION
SELECT 'test' word,
1 relevancy
UNION
SELECT 'sentence' word,
1 relevancy
UNION
SELECT 'TST-DFS' word,
10 relevancy
)
q
UNION
SELECT *
FROM ( SELECT 'my' word,
3 relevancy
UNION
SELECT 'test' word,
1 relevancy
UNION
SELECT 'sentence' word,
1 relevancy
UNION
SELECT 'TST-DFS' word,
10 relevancy
)
q1
it is a page coading where query result shows
**i can not use functions by use them work are more easier**
<html>
<head>
</head>
<body>
<?php
//author S_A_KHAN
//date 10/02/2013
$dbcoonect=mysql_connect("127.0.0.1","root");
if (!$dbcoonect)
{
die ('unable to connect'.mysqli_error());
}
else
{
echo "connection successfully <br>";
}
$data_base=mysql_select_db("connect",$dbcoonect);
if ($data_base==FALSE){
die ('unable to connect'.mysqli_error($dbcoonect));
}
else
{
echo "connection successfully done<br>";
***$SQLString = "select * from user where id= " . $_GET["search"] . "";
$QueryResult=mysql_query($SQLString,$dbcoonect);***
echo "<table width='100%' border='1'>\n";
echo "<tr><th bgcolor=gray>Id</th><th bgcolor=gray>Name</th></tr>\n";
while (($Row = mysql_fetch_row($QueryResult)) !== FALSE) {
echo "<tr><td bgcolor=tan>{$Row[0]}</td>";
echo "<td bgcolor=tan>{$Row[1]}</td></tr>";
}
}
?>
</body>
</html>
I'm a developer/designer for a community driven website: http://www.thegamesdb.net
The problem we have is quite straight forward:
Pac-Man is a game on the site. A person should be able to search "pacman" or "pac man" and the "Pac-Man" result should be shown. Currently, this does not happen.
Search code snippet is below but full code can be seen at: http://code.google.com/p/thegamesdb/source/browse/trunk/tab_listseries.php
if ($function == 'Search')
{
$query = "SELECT g.*, p.name FROM games as g, platforms as p WHERE GameTitle LIKE '%$string%' and g.Platform = p.id";
if(!empty($sortBy))
{
$query .= " ORDER BY $sortBy, GameTitle ASC";
}
else
{
$query .= " ORDER BY GameTitle";
}
}
I'm not that familiar with coding search techniques, so any help would be appreciated... I've tried searching around on the net and all I've found is some pre-fabricated site search engines. We do not really want to go down this route... it's a little overkill for our needs.
Looking forward to some discussion,
Alex
MySQL has a string comparison feature called SOUNDS LIKE. This may be a good use case for it.
http://dev.mysql.com/doc/refman/5.1/en/string-functions.html#operator_sounds-like
You would probably modify like so:
SELECT * FROM blah WHERE SOUNDEX(column) LIKE CONCAT('%', SOUNDEX($search_string), '%')
ultimately, you will need to decide how to perform a fuzzy match.
for your example - you might consider removing all whitespace, and any '-' character - then perform the like '%$string%' match. you might also consider UPPER on both sides of that LIKE.
you can do more or less in your own way of determining fuzziness.