keyword relevance PHP MySQL Search Engine - php

I don't know why I can't find this anywhere. I would think this would be pretty common request. I am writing a search engine in PHP to search a MySQL database of For Sale listings for keywords inputted by the user.
There are several columns in the table but only 2 that will need to be searched. They are named file_Title & file_Desc. Think of it like a classified ad. An item title and a description.
So for example a user would search for 'John Deere Lawn Tractor'. What I would like to happen is classifieds that have all 4 of those words show up at the top of the list. Then results that only have 3 an so on.
I've read a very good webpage at http://www.roscripts.com/PHP_search_engine-119.html
From that authors example I have the following code below:
<?php
$search = 'John Deere Lawn Tractors';
$keywords = split(' ', $search);
$sql = "SELECT DISTINCT COUNT(*) As relevance, id, file_Title, file_Desc FROM Listings WHERE (";
foreach ($keywords as $keyword) {
echo 'Keyword is ' . $keyword . '<br />';
$sql .= "(file_Title LIKE '%$keyword%' OR file_Desc LIKE '%$keyword%') OR ";
}
$sql=substr($sql,0,(strLen($sql)-3));//this will eat the last OR
$sql .= ") GROUP BY id ORDER BY relevance DESC";
echo 'SQL is ' . $sql;
$query = mysql_query($sql) or die(mysql_error());
$Count = mysql_num_rows($query);
if($Count != 0) {
echo '<br />' . $Count . ' RESULTS FOUND';
while ($row_sql = mysql_fetch_assoc($query)) {//echo out the results
echo '<h3>'.$row_sql['file_Title'].'</h3><br /><p>'.$row_sql['file_Desc'].'</p>';
}
} else {
echo "No results to display";
}
?>
The SQL String outputted is this:
SELECT DISTINCT COUNT(*) As relevance, id, file_Title, file_Desc FROM Listings
WHERE ((file_Title LIKE '%John%'
OR file_Desc LIKE '%John%')
OR (file_Title LIKE '%Deere%'
OR file_Desc LIKE '%Deere%')
OR (file_Title LIKE '%Lawn%'
OR file_Desc LIKE '%Lawn%')
OR (file_Title LIKE '%Tractors%'
OR file_Desc LIKE '%Tractors%') )
GROUP BY id
ORDER BY relevance DESC
With this code I get 275 results from my DB. My problem is it really doesn't order by the number of keywords found in the row. It seems to order the results by id instead. If I remove 'GROUP BY id' then it only returns 1 result instead of all of them, which is really messing with me!
I've also tried shifting to FULLTEXT in the db but can't seem to get that going either so I'd prefer to stick with LIKE %Keyword% syntax.
Any help is appreciated! Thanks!

I would suggest a totally different approach. Your approach is cumbersome, inefficient, heavy on the DB and will likely be very slow with more and more records added to your database.
What I would suggest is the following:
Create a separate table for keywords.
Create a list of non keywords you don't want to index (like the common English prepositions etc.) so that they are not included. You
can probably find a list of them online, readily available.
When a new entry is added, you split the string into separate keywords, omitting the ones in step 2., and inserting them in the
table created in step 3 (if not already in it).
In a separate table, with a foreign key pointing to the keywords table, associate the classifed_ad to the keyword.
Steps 3 and 4 must happen again if your classified_ad is edited (i.e. any keywords inserted in step 4 deleted from the association table and the keywords analysed again and reassociated with the classified ad).
Once you have this structure, all you have to do is search the association table and order by the number of matched keywords. You can even add an extra column to it and put the number of occurrences of that keyword in the article, so that you order by that too.
That will be much faster.
I had used a script once called Sphider which does something similar. Not sure if it is still maintained, but it works in a very similar way on web pages it parses.

I know you said you had problems with FULLTEXT, but I would highly encourage you to go back and try that again. FULLTEXT indexes and search is designed to do what you are doing, and when the MATCH command is used in the WHERE clause, MySQL automatically sorts the rows from highest to lowest relevance.
For more information on FULLTEXT, check out http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html
Also, pay special note to the comment by Patrick O'Lone on the same page, some of which is quoted below...
It should be noted in the documentation that IN
BOOLEAN MODE will almost always return a
relevance of 1.0. In order to get a relevance that is
meaningful, you'll need to:
SELECT MATCH('Content') AGAINST ('keyword1
keyword2') as Relevance FROM table WHERE MATCH
('Content') AGAINST('+keyword1 +keyword2' IN
BOOLEAN MODE) HAVING Relevance > 0.2 ORDER
BY Relevance DESC
Notice that you are doing a regular relevance query
to obtain relevance factors combined with a WHERE
clause that uses BOOLEAN MODE. The BOOLEAN
MODE gives you the subset that fulfills the
requirements of the BOOLEAN search, the relevance
query fulfills the relevance factor, and the HAVING
clause (in this case) ensures that the document is
relevant to the search (i.e. documents that score
less than 0.2 are considered irrelevant). This also
allows you to order by relevance.

Related

Is it possible to treat hyphens and spaces the same in SQL?

So I'm working on a search function for a social networking site, and it searches user posts. Some users like to put hyphens instead of spaces, but I would like for the search function look for both hyphens and spaces in a result.
For example, if they have a post named "SQL-IS AWESOME" and I search for "SQL IS AWESOME", can I still find that post? I tried using 2 sql queries, one for the original search query, and one modified to change all spaces to hyphens.
But if I search "SQL IS-AWESOME" it still won't find it. Is there an easier way?
My current code:
$sql = "SELECT * FROM posts
WHERE (post_title='".$query."'
OR post_title LIKE '%".$query."'
OR post_title LIKE '%".$query."%'
OR post_title LIKE '".$query."%')
".$locquery."
".$cat."
ORDER BY date DESC
LIMIT 18";
As someone has suggested, you could just adapt and use fulltext searching.
If you choose to take this route, you will need to enable fulltext searching on the fields required.
I'll assume you will check post_title and post_body (?), which needs you to run this;
ALTER TABLE `posts` ADD FULLTEXT KEY `post_search` (`post_title`,`post_body`);
When that is done, your search query can easily be edited to become;
$sql = "SELECT * FROM `posts` WHERE MATCH(post_title,post_body) AGAINST '$search'";
If you'd like better matching, it is also possible to give it a score and order by that, which would require code similar to this:
$sql = "SELECT *, MATCH(post_title, post_body) AGAINST ('{$search}') AS score ".
"FROM `posts` WHERE MATCH(post_title, post_body) AGAINST ('{$search}') ORDER BY `score` DESC";
--- NOTES
For the search, you need to work out how you will be searching.
In the last instance I used similar, I simply had a form for the search term (Named "Search") resulting in $_POST['search'] being sent to the server.
I then used;
$search = (array_key_exists('search', $_POST) && is_string($_POST['search'])) ? mysql_real_escape_string($_POST['search'], $c) : FALSE ;
if ($search) {
// Do the fulltext query requires (See above)
}
Since fulltext search will disregard the hyphen, you are left with just spaces, which works great for fulltext, if you opt to use scored results.

MYSQL - Search words in multiple coumns

I want to make a search tool for my website.
if i search for a phrase i want it to search multiple columns
for example if i search dewalt drill and the title has the works dewalt power drill i want it to come up.
also if i search dewalt drill and the tile has dewalt and the description has drill i want it to come up.
but all words of the search must be contained in any combination of fields.
can someone help me with the query?
Currently:
{Select * from products where sku like '%{$searchwords}%' or title like '%{$searchwords}%' or desc like '%{$searchwords}%}
If your table is myisam you can create a fulltext index then use in boolean mode
to add the key:
alter table products ADD FULLTEXT (sku, title, desc)
then your query would be:
$searchwords = join(' +', explode(' ', $searchwords));
$query = "SELECT * FROM products WHERE MATCH (sku, title, desc) AGAINST ('{$searchwords}' IN BOOLEAN MODE)";
Your probably want FULLTEXT searching (starting with MySQL 5.6, this is also available for InnoDB tables). You can require all words with BOOLEAN MODE.

how can i add condition in mysql query executed in another table?

I develope a website for E-books, i have in database table for authors and table for publishers .. sometimes the author name is added also in publishers table as a publisher
Now i have his name as an author and a publisher .. when i search in the site for his name, it return twice because i search in authors table and in publishers table then merge two queries
this is my code :-
function generate_results($keyword, $row = 0) {
$result1 = $this->db->query("SELECT au_id,au_name,au_state,SUBSTR(au_info,1,190) AS au_info,au_img FROM d_author where (au_name LIKE '%$keyword%' or au_info LIKE '%$keyword%') and au_state = '1' limit $row,20");
$result2 = $this->db->query("SELECT pub_id,pub_name,pub_state,SUBSTR(pub_info,1,190) AS pub_info,pub_img FROM d_publishing where (pub_name LIKE '%$keyword%' or pub_info LIKE '%$keyword%') and and pub_state = '1' limit $row,20");
$results = array_merge($result1->result_array(), $result2->result_array());
return $results;
}
Now i want to modify the second query to something like that :
select all publishers from "publishers table" where the name of publisher is like $keyword and this $keyword doesn't exist in authors table ..
I mean if this name exist in authors don't select it in publishers
How can i translate that meaning to Mysql Query
First, check out sql-injections before continuing to develop in your ebook-application. Looks like your keyword is not checked to be a safe parameter. And just to be sure, do you know about csrf and xss? If not, check about that too. This is very important.
Secondaly, you should consider working on your database design to avoid having duplicated values. Check out "database normalization" for more information. Seems like you could do another table to extract your "contact information" like name, state, id etc. This would make it possible for your author-table and publisher-table to use a "contact_id" referencing the contact-information-table.
Last but not least, to answer your question, you can generelly solve such problems with an "anti join". Use a left join on the authors table in the second query and check for "IS NULL" on matches with the publisher table. More information here: http://explainextended.com/2009/09/18/not-in-vs-not-exists-vs-left-join-is-null-mysql/
I do not know if I understand your database design completly right, but it also seems like an UNION combined with a DISTINCT could help you - and you wouldn't even need this array_merge-stuff. I would suggest you to check out these two commands in the mysql docs.

PHP/MySQL If FROM Statements

sorry for the extremely basic question!! I'm very new to PHP/MySQL
How would it be possible to do an if statement referencing wherever or not data was from a certain table.
Would this work?
if ( $search FROM "table2" )
{
function for table2.. etc
}
TABLE:INSTRUMENTS
COLUMNS:
id instrument grade standard comments
instrument2 grade2 standard2 comments2
instrument3 grade3 standard3 comments3
instrument4 grade4 standard4 comments4
instrument5 grade5 standard5 comments5
TABLE: PEOPLE
COLUMNS:
id first last snumber course email graduate inumber
Basically a person from the PEOPLE table is linked via ID to instruments in the INSTRUMENTS table, I have an e-mail search function that I need to send out the relative data to the relative instruments.
I want to get the comments[i] , grade[i], standard[i] of the matching instrument[i]
Somewhere in your code you must already have specified the table to select from. You can use the same logic to see which table you initially used?
For example:
$sTable = 'table2';
$rResult = mysql_query(sprintf($sQuery, $sTable));
if ($sTable == 'table2') { // use $rResult }
You know what table it came from, because you retrieved it from that table.
Update
Perhaps:
$search = trim(stripslashes($_POST['search']));
$search = mysql_real_escape_string($search);
//Find instruments searched for:
$query = "
SELECT *,
('$search' IN `instrument`) AS `matched1`,
('$search' IN `instrument2`) AS `matched2`,
('$search' IN `instrument3`) AS `matched3`,
('$search' IN `instrument4`) AS `matched4`,
('$search' IN `instrument5`) AS `matched5`
FROM `instruments`
WHERE '$search' IN (`instrument`, `instrument2`, `instrument3`, `instrument4`, `instrument5`)
";
Explore the results and find out how to use them to your advantage.
This isn't the most glamourous solution and you can no doubt improve this with a different approach, but it's something to get you started.
Hmm,
The tables need a redesign IMHO.
I would change table instruments to.
id, people_id, instrument, grade, standard, comments
This query:
SELECT
people.email
,instruments.instrument
,instruments.grade
,instruments.standard
,instruments.comment
FROM people
INNER JOIN instruments ON (instruments.people_id = people.id)
WHERE people.id = 10
Will give you all instruments for person number 10.
You can change the where clause to where people.snumber = x or whatever you feel like.
Or add extra clauses to limit the number of instruments by adding something like : AND instrument.grade > 7 at the end
With this setup people can have any number of instruments.
It feels a bit odd to -reverse link- the instruments to people like this, but believe me... it works.
I recommend reading up on joins (google 'mysql join') and always use explicit joins using the join keyword, it makes understanding your queries much easier.
If this is the result you where looking for, then you can drop the inumber field from the people table, since we moved the link between people and instruments to the instrument table.
On an 1-to-many link the linking field should always be on the 'many'-table.
You would need to store what table the search result was from in a variable, and compare the two variables.
if($search === "table2")
There is no from operator in PHP.

I'm not getting the expected result from an SQL query

I'm developing a search function for a website. I have a table called keywords with two fields id and keyword. I have two separate search queries for AND and OR. The problem is with the AND query. It is not returning the result that I expect.
The printed SQL is :
SELECT COUNT(DISTINCT tg_id)
FROM tg_keywords
WHERE tg_keyword='keyword_1'
AND tg_keyword='keyword_2'
The count returned is 0, while if I perform the same SQL with OR instead of AND the count returned is 1. I expected the count to be 1 in both cases, and I need it to be this way as the AND results will take priority over the OR results.
Any advice will be much appreciated.
Thanks
Archie
It will always return 0, unless keyword_1=keyword_2. tg_keyword can only have one value, and when you say AND, you're asking for both conditions to be true.
It's the same, logically speaking, as asking "How many friends do I have whose name is 'JACK' and 'JILL'"? None, nobody is called both JACK and JILL.
I don't know what your table looks like and how things are related to each other, but this query makes no sense. You're returning rows where the keyword is one thing and another thing at the same time? That's impossible.
You probably have another table that links to the keywords? You should search with that, using a join, and search for both keywords. We could give you a more precise answer if you could tell us what your tables look like.
EDIT: Based on what you wrote in a comment below (please edit your question!!), you're probably looking for this:
SELECT COUNT(DISTINCT tg_id)
FROM tg_keywords AS kw1, tg_keywords AS kw2
WHERE kw1.tg_id = kw2.tg_id
AND kw1.tg_keyword='keyword_1'
AND kw2.tg_keyword='keyword_2'
your query can't work because you have a condition which is always false so no record will be selected!
tg_keyword='keyword_1' AND tg_keyword='keyword_2'
what are you trying to do? Could you post the columns of this table?
tg_keyword='keyword_1' AND tg_keyword='keyword_2'
Logically this cannot be true, ever. It cannot be both. Did you mean something like:
SELECT * FROM keywords
WHERE tg_keyword LIKE '%keyword_1%' OR tg_keyword LIKE '%keyword_2%'
ORDER BY tg_keyword LIKE '%keyword_1%' + tg_keyword LIKE '%keyword_2%' DESC;
Based on the OP's clarification:
I have a table with multiple keywords with the same id. How can I get more than one keyword compared for the same id, as the search results need to be based on how many keywords from a search array match keywords in the keywords table from each unique id. Any ideas?
I assume you're looking to return search results based on a ranking of how many of the selected keywords are a match with those results? In other words, is the ID field that multiple keywords share the ID of a potential search result?
If so, assuming you pass in an array of keywords of the form {k1, k2, k3, k4}, you might use a query like this:
SELECT ID, COUNT(ID) AS ResultRank FROM tg_keywords WHERE tg_keyword IN (k1, k2, k3, k4) GROUP BY ID ORDER BY ResultRank DESC
This example also assumes a given keyword might appear in the tables multiple times with different IDs (because a keyword might apply to multiple search results). The query will return a list of IDs in descending order based on the number of times they appear with any of the selected keywords. In the given example, the highest rank for a given ID should be 4, meaning ALL keywords apply to the result with that ID...
I think you will need to join tg_keywords to itself. Try playing around with something like
select *
from tg_keywords k1
join tg_keywords k2 on k1.tg_id = k2.tg_id
where k1.tg_keyword = 'keyword_1' and k2.tg_keyword = 'keyword_2'
Try:
SELECT tg_id
FROM tg_keywords
WHERE tg_keyword in ('keyword_1','keyword_2')
GROUP BY tg_id
HAVING COUNT(DISTINCT tg_keyword) = 2

Categories