I'm trying to create a search engine for an inventory based site. The issue is that I have information inside bbtags (like in [b]test[/b] sentence, the test should be valued at 3, whereas sentence should be valued at 1).
Here is an example of an index:
My test sentence, my my (has a SKU of TST-DFS)
The Database:
|Product| word |relevancy|
| 1 | my | 3 |
| 1 | test | 1 |
| 1 |sentence| 1 |
| 1 | TST-DFS| 10 |
But how would I match TST-DFS if the user typed in TST DFS? I would like that SKU to have a relevancy of say 8, instead of the full 10..
I have heard that the FULL TEXT search feature in MySQL would help, but I can't seem to find a good way to do it. I would like to avoid things like UNIONS, and to keep the query as optimized as possible.
Any help with coming up with a good system for this would be great.
Thanks,
Max
But how would I match TST-DFS if the user typed in TST DFS?
I would like that SKU to have a relevancy of say 8, instead of the full 10..
If I got the question right, the answer is actually easy.
Well, if you forge your query a little before sending it to mysql.
Ok, let's say we have $query and it contains TST-DFS.
Are we gonna focus on word spans?
I suppose we should, as most search engines do, so:
$ok=preg_match_all('#\w+#',$query,$m);
Now if that pattern matched... $m[0] contains the list of words in $query.
This can be fine-tuned to your SKU, but matching against full words in a AND fashion is pretty much what the user presumes is happening. (as it happens over google and yahoo)
Then we need to cook a $expr expression that will be injected into our final query.
if(!$ok) { // the search string is non-alphanumeric
$expr="false";
} else { // the search contains words that are no in $m[0]
$expr='';
foreach($m[0] as $word) {
if($expr)
$expr.=" AND "; // put an AND inbetween "LIKE" subexpressions
$s_word=addslashes($word); // I put a s_ to remind me the variable
// is safe to include in a SQL statement, that's me
$expr.="word LIKE '%$s_word%'";
}
}
Now $expr should look like "words LIKE '%TST%' AND words LIKE '%DFS%'"
With that value, we can build the final query:
$s_expr="($expr)";
$s_query=addslashes($query);
$s_fullquery=
"SELECT (Product,word,if((word LIKE '$s_query'),relevancy,relevancy-2) as relevancy) ".
"FROM some_index ".
"WHERE word LIKE '$s_query' OR $s_expr";
Which shall read, for "TST-DFS":
SELECT (Product,word,if((word LIKE 'TST-DFS'),relevancy,relevancy-2) as relevancy)
FROM some_index
WHERE word LIKE 'TST-DFS' OR (word LIKE '%TST%' AND word LIKE '%DFS%')
As you can see, in the first SELECT line, if the match is partial, mysql will return relevancy-2
In the third one, the WHERE clause, if the full match fails, $s_expr, the partial match query we cooked in advance, is tried instead.
I like to lower case everything and strip out special characters (like in a phone number or credit card I take everything out on both sides that isn't a number)
Rather than try to create your own FTS solution, you could try to fit the MySQL FTS engine to your requirements. What I've seen done is create a new table to store your FTS data. Create a column for each different piece of data that you want to have a different relevance. For your sku field you could store the raw sku, with spaces, underscores, hyphens and any other special character intact. Then store a stripped down version with all these things removed. You may also want to store a version with leading zeros removed, as people often leave things like that out. You can store all these variations in the same column. Store your product name in another column, and the product description in another column. Create a separate index on each column. Then when you do your search, you can search each column individually, and multiply the rank of the results based on how important you think that column is. So you could multiply sku results by 10, title by 5 and leave description results as is. You may have to do a little experimentation to get the results you want, but it may ultimately be simpler than creating your own index.
Create a keywords table. Something along the lines of:
integer keywordId (autoincrement) | varchar keyword | int pointValue
Assign all possible keywords, skus, etc, into this table. Create another table, a post-keywords bridge, (assuming postId is the id you've assigned in your original table) along the lines of:
integer keywordId | integer postId
Once you have this, you can easily add keywords to each post as it is interested. To calculate total point value for a given post, a query such as the following should do the trick:
SELECT sum(pointValue) FROM keywordPostsBridge kpb
JOIN keywords k ON k.keywordId = kpb.keywordId
WHERE kpb.postId = YOUR_INTENDED_POST
I think the solution is quite straightforward unless I missed something.
Basically run two search, one is exact match, the other is like match or regex match.
Join two resultsets together, like match left join exact match. Then for example:
final_relevancy = (IFNULL(like_relevancy, 0) + IFNULL(exact_relevancy, 0) * 3) / 4
I didn't try this myself though. Just an idea.
I would add a column that is stripped of all special character's, misspellings, and then upcased (or create a function that compares on text that has been stripped and upcased). That way your relevancy will be consistent.
/*
q and q1 - you table
this query takes too much resources,
make from it update-query ( scheduled task or call it on_save if you develop new system )
*/
SELECT
CASE
WHEN word NOT REGEXP "^[a-zA-Z]+$"
/*many replace with junk characters
or create custom function
or if you have full db access install his https://launchpad.net/mysql-udf-regexp
*/
THEN REPLACE(REPLACE( word, '-', ' ' ), '#', ' ')
ELSE word
END word ,
CASE
WHEN word NOT REGEXP "^[a-zA-Z]+$"
THEN 8
ELSE relevancy
END relevancy
FROM ( SELECT 'my' word,
3 relevancy
UNION
SELECT 'test' word,
1 relevancy
UNION
SELECT 'sentence' word,
1 relevancy
UNION
SELECT 'TST-DFS' word,
10 relevancy
)
q
UNION
SELECT *
FROM ( SELECT 'my' word,
3 relevancy
UNION
SELECT 'test' word,
1 relevancy
UNION
SELECT 'sentence' word,
1 relevancy
UNION
SELECT 'TST-DFS' word,
10 relevancy
)
q1
it is a page coading where query result shows
**i can not use functions by use them work are more easier**
<html>
<head>
</head>
<body>
<?php
//author S_A_KHAN
//date 10/02/2013
$dbcoonect=mysql_connect("127.0.0.1","root");
if (!$dbcoonect)
{
die ('unable to connect'.mysqli_error());
}
else
{
echo "connection successfully <br>";
}
$data_base=mysql_select_db("connect",$dbcoonect);
if ($data_base==FALSE){
die ('unable to connect'.mysqli_error($dbcoonect));
}
else
{
echo "connection successfully done<br>";
***$SQLString = "select * from user where id= " . $_GET["search"] . "";
$QueryResult=mysql_query($SQLString,$dbcoonect);***
echo "<table width='100%' border='1'>\n";
echo "<tr><th bgcolor=gray>Id</th><th bgcolor=gray>Name</th></tr>\n";
while (($Row = mysql_fetch_row($QueryResult)) !== FALSE) {
echo "<tr><td bgcolor=tan>{$Row[0]}</td>";
echo "<td bgcolor=tan>{$Row[1]}</td></tr>";
}
}
?>
</body>
</html>
Related
I am searching welds.welder_id and welds.bal_welder_id which are lists of unique welder IDs separated by spaces by the users.
The record set looks like 99,199,99 w259,w259 259 5-a
99,199,259,5-a and w259 are unique welder id numbers
I cannot use the MYSQL INSTR() function by itself as a search for "99" will pull up records with "199"
Users on each project format their welder IDs a different way (000,a000,0aa) usually to match their customer's records.
I really want to avoid using PHP code for a number of reasons.
To select records with "w259" in the welder_id OR in the bal_welder_id columns, my query looks like this.
SELECT * FROM `welds`
WHERE `omit`=0
AND( (`welder_id`='w259' OR `bal_welder_id`='w259')
OR (`welder_id` LIKE 'w259 %' OR `bal_welder_id` LIKE 'w259 %')
OR (`welder_id` LIKE '% w259' OR `bal_welder_id` LIKE '% w259')
OR (INSTR(`welder_id`, ' w259 ') > 0 OR INSTR(`bal_welder_id`,' w259 ') > 0))
ORDER BY `date_welded` DESC
LIMIT 100;
It works but it takes 0.0030 seconds with 1300 test records on my workstation's SSD.
The actual DB will have hundreds of thousands after a year or two.
Is there a better way?
Thanks.
If I understand your question correctly, one option is to use FIND_IN_SET(str, strlist) string function, which returns the position of the string str in the comma separated string list strlist, for example:
SELECT FIND_IN_SET('b','a,b,c,d');
will return 2. Since your string is not separated by commas, but by spaces, you could use REPLACE() to replace spaces with commas. Your query can be like this:
SELECT * FROM `welds`
WHERE
`omit`=0
AND
(FIND_IN_SET('w259', REPLACE(welder_id, ' ', ','))>0
OR
FIND_IN_SET('w259', REPLACE(bal_welder_id, ' ', ','))>0)
The optimizer however cannot to much, since FIND_IN_SET cannot make use of an index, if present. I would suggest you to normalize your table, if it is possible.
Am currently using Mysql and PHP.
Looking for a query that will take a number and find the closet match for the begining of a set of digits, for example I have the number 019235678910, 026725678910, 026825678910 and my table looks like this.
Table - Destintation
Name Number
Watford 01923
Oxford 026
Romford 026
Crawford 0267
Topford 02672
So when I pass 019235678910 the result would be Watford, 026725678910 would be Topford and 026825678910 would be Oxford and Romford.
I'm also not sure if MYSQL can do this directly or would need to work in conjunction with PHP?
Here one way for getting all of them:
select d.*
from Destination d join
(select length(Number) as maxlen, number
from destination d
where YOURVALUE like concat(Number, '%')
order by maxlen desc
limit 1
) dsum
on d.Number = dsum.Number
Because you are looking for initial sequences, there is only one maximum match on the numbers (hence the limit 1 works).
By the way, the field called number is clearly a character field. Personally, I think it bad practice to call a character field "number" -- something called cognitive dissonance.
SELECT Name, Number
FROM Destintation
WHERE LEFT('026725678910', LENGTH(Number)) = Number
or perhaps
WHERE '026725678910' LIKE CONCAT(Number, '%')
I am trying to make a search box for an ecommerce website.
The search works as follows
When a user searches for a product, the search value is being sent to a file called searchResults.php using post method via ajax
$searchVal=$_POST['searchVal'];
And then its being searched in the database from a table named product by the following query
$searchResult = mysql_query("SELECT * FROM products WHERE name LIKE '$searchVal'")
and the results are sent back as ajax response by the following if condition
if($searchResult){
echo "result";
}
else{
echo "No products found";
}
Above all everything works fine as expected.
lets assume an user is searching for cellphones and he/she types cell phone . But we have products only for category cellphones and not for cell phone. So it results No products found even though the records for cellphones are present.
I want to make it search regardless the white space, singular or plural . How can i do that ?
The right way to implement a search engine is to maintain a separate table of words and links to the record they appear in. Then....
$qry="SELECT p.*, COUNT(*)
FROM products p
INNER JOIN srchwords s
ON p.id=s.product_id ";
$searchVals=explode(' ',$_POST['searchVal']);
foreach ($searchvals as $k=>$s) {
$searchvals[$k]="'" . mysql_real_escape_string(trim($s)) . "'";
}
$qry.="WHERE s.word IN (" . implode(",",$searchvals) . ") ORDER BY COUNT(*) DESC";
An ugly and innefficient hack would be:
$qry="SELECT p.*
FROM products p";
$join=" WHERE "
$searchVals=explode(' ',$_POST['searchVal']);
foreach ($searchvals as $k=>$s) {
$qry.=$join . " p.desc LIKE '%" . mysql_real_escape_string(trim($s)) . "%'
$join=' OR ';
}
Both methods still don't not cater for plurals (just add an additional comparison for words ending in S, removing the S). You should also clean up the string to remove multiple spaces and punctuation (/^[a-z0-9 ]/i).
Or just use one of the many, well written off-the-shelf search engine solutions (e.g. the mnogo engine or Google's site search service).
Step 1: remove leading and trailling spaces:
$searchResult = mysql_query("SELECT * FROM products WHERE name LIKE trim('$searchVal')")
Step 2: replace existent spaces by '%' (it's wildcard in LIKE syntax):
$searchResult = mysql_query("SELECT * FROM products WHERE name LIKE str_replace(trim('$searchVal'), ' ', '%'")
A first step would be to explode() the search term on spaces: $terms = explode(' ', $query) and then do a 'SELECT * FROM products WHERE name LIKE "%'.$terms[0].'%" AND name LIKE "%'.$terms[1].'%" ...'.
Of course, this doesn't really solve your plurals issue.. Also, it can be very, very slow because MySQL can't use indexes on LIKE queries starting with a wildcard.
Another course of action could be to just have an "aliases" table that would look something like this:
cellphone | cell phone
cellphone | cell phones
cellphone | cellphones
...
Then you would replace the all occurances in a search query with the one on the left before querying the database for it.
The third and best and most complicated way is to use an index table. You wouldn't want to write that yourself, but I'd bet there are some great solutions out there. Personally, I'm using Doctrine, which has this feature built in.
You can use trim() in php to strip whitespace (or other characters) from the beginning and end of a string
One of my standard behaviors for pagination within my CMSs is to show an alphabetic quickbar when sorting by an alpha column. For example, if the results are being sorted by Last Name, under the pagination I output a series of links, A to Z, to take you directly to the page for a particular first character.
Example:
Currently I'm doing this by getting all the results for that column, sorted alphabetically, and then looping through them all in PHP and recording what page the record appears on. This works fine when you're only dealing with a few hundred results, but I'm now working on a project that could potentially have several hundred thousand rows and it simply isn't a viable option.
Is there a more efficient method to produce this kind of index? Note that it also needs to handle more than just A-Z, since rows may begin with numbers or punctuation.
Edit for clarification:
I'm not looking for a simple list of all the first characters, that's easy. I need to calculate what page of the total results the field starting with that character would be on. So say we're looking for someone named Walter, and I have 1000 rows, I need to know where in that 1-1000 range the W's start at.
I presume it's a varchar field, so have you considered the following:
SELECT DISTINCT SUBSTRING(lastname FROM 1 FOR 1) FROM mytable;
This will get you a distinct list of the first letters of the last name.
You can also use UPPER() to ensure you just get upper case characters. LEFT() will also achieve something similar, so you should experiment to see which performs quickest on your dataset.
Edit: If you also want counts:
SELECT DISTINCT SUBSTRING(lastname FROM 1 FOR 1) AS firstletter, COUNT(*) AS counter FROM mytable GROUP BY firstletter;
No need to do a second query for each letter.
$sql = "SELECT left(name, 1) AS firstchar FROM mytable ORDER BY name";
$result = mysql_query($sql) or die(mysql_error());
$letters = array();
$row = 0;
while($row = mysql_fetch_assoc($result)) {
$row++;
if (!isset($letters[$row['firstchar']])) {
$letters[$row['firstchar']] = $row;
}
}
This would give you an array keyed by the first letters, and the row number they first appeared on for the value:
a => 1,
b => 50,
c => 51,
etc...
There's probably some way of doing it purely in SQL, but MySQL itself doesn't have any 'row number' support built in, so it'd be a highly ugly query.
Just like on standrd pagination is just a matter of fetching and ordering - simply add WHERE with A% (dont forget to create index on this column)
<?php
$result1 = mysql_query("SELECT LEFT(name, 1) AS fl FROM comics GROUP BY fl");
while ($row = mysql_fetch_array($result1))
{
$result11 = mysql_query("SELECT * FROM comics WHERE name LIKE '".$row['fl']."%'");
$countresult11 = mysql_num_rows($result11);
?>
<?php echo $row['fl']; ?>
<?php } ?>
might be kinda what you are looking for if you replace my variables/table names with yours.
that will check the table, pull the first letter from each, group by that letter and output it as
1 3 7 9 A B R W X Y Z
depending on what you have in the table
I'm working on an 'advanced search' page on a site where you would enter a keyword such as 'I like apples' and it can search the database using the following options:
Find : With all the words, With the
exact phrase , With at least one of
the words, Without the words
I can take care of the 'Exact phrase' by:
SELECT * FROM myTable WHERE field='$keyword';
'At least one of the words' by:
SELECT * FROM myTable WHERE field LIKE '%$keyword%';//Let me know if this is the wrong approach
But its the 'With at least one of the words' and 'Without the words' that I'm stuck on.
Any suggestions on how to implement these two?
Edit: Regarding 'At least one word' it wouldn't be a good approach to use explode() to break the keywords into words, and run a loop to add
(field='$keywords') OR ($field='$keywords) (OR)....
Because there are some other AND/OR clauses in the query also and I'm not aware of the maximum number of clauses there can be.
I would suggest the use of MySQL FullText Search using this with the Boolean Full-Text Searches functionality you should be able to get your desired result.
Edit:
Requested example based on your requested conditions ("Its just one field and they can pick either of the 4 options (i.e 1 word, exact words, at least 1 word, without the term).")
I am assuming you are using php based on your initial post
<?php
$choice = $_POST['choice'];
$query = $_POST['query'];
if ($choice == "oneWord") {
//Not 100% sure what you mean by one word but this is the simplest form
//This assumes $query = a single word
$result = mysql_query("SELECT * FROM table WHERE MATCH (field) AGAINST ('{$query}' IN BOOLEAN MODE)");
} elseif ($choice == "exactWords") {
$result = mysql_query("SELECT * FROM table WHERE MATCH (field) AGAINST ('\"{$query}\"' IN BOOLEAN MODE)");
} elseif ($choice == "atLeastOneWord") {
//The default with no operators if given multiple words will return rows that contains at least one of the words
$result = mysql_query("SELECT * FROM table WHERE MATCH (field) AGAINST ('{$query}' IN BOOLEAN MODE)");
} elseif ($choice == "withoutTheTerm") {
$result = mysql_query("SELECT * FROM table WHERE MATCH (field) AGAINST ('-{$query}' IN BOOLEAN MODE)");
}
?>
hope this helps for full use of the operators in boolean matches see Boolean Full-Text Searches
You could use
With at least one of the words
SELECT * FROM myTable WHERE field LIKE '%$keyword%'
or field LIKE '%$keyword2%'
or field LIKE '%$keyword3%';
Without the word
SELECT * FROM myTable WHERE field NOT LIKE '%$keyword%';
I'm not sure you could easily do those search options in a naive manner as the other two.
It would be worth your while implementing a better search engine if you need to support those scenarios. A simple one that could probably get you by is something along these lines:
When an item is added to the database, it is split up into the individual words. At this point "common" words (the, a, etc...) are removed (probably based on a common_words table). The remaining words are added to a words table if they are not already present. There is then a link made between the word entry and the item entry.
When searching, it is then a case of getting the word ids from the word table and the appropriate lookup of item ids in the joining table.
Search is notoriously difficult to do well.
You should Consider using a third party search engine using something like Lucene or Sphider.
Giraffe and Re0sless pooseted 2 good answers.
notes:
"SELECT * " sucks... only select the columns that you need.
Re0sless puts a "OR" between keywords.
- you should eliminate common words (" ","i","am","and"..etc)
- mysql has a 8kb i belive limit on the size of the query, so for really long SELECTS you should slipt it into separate queries.
- try to eliminate duplicate keywords (if i search for "you know you like it" the SELECT should basically only search for "you" once and elimnate common words as "it")
Also try to use "LIKE" and "MATCH LIKE" (see mysql man page) it could do wonders for "fuzzy" searches