fulltext search results showing where most repeated words - php

I inserted in my web browser and it works great, but it shows me messy results, I would like to show me at the top where over the word or words in the search is repeated.
I looked online tutorials but I can not do it and always come messy, do not understand why.
Right now I have it like this:
$sql="SELECT art,tit,tem,cred,info,que,ano,url
FROM contenido
WHERE MATCH (art,tit,tem,cred,info)
AGAINST ('" .$busqueda. "' IN BOOLEAN MODE)
ORDER BY id DESC";
There is not much information on the Internet about refine or optimize searches for Mysql FULLTEXT. See if experts come through here and so we all learn.
How could refine your search? Thank you.

I think the issue is that you're sorting by the id.
The fulltext sorts by the match score it calculates, showing stronger matches first. When you apply ORDER BY id DESC, you loose this sort-by-match ordering.
You can see the actual score in your result set if you want by:
SELECT art,tit,tem,cred,info,que,ano,url,
MATCH (art,tit,tem,cred,info)
AGAINST ('your term' IN BOOLEAN MODE) AS score
FROM contenido
WHERE MATCH (art,tit,tem,cred,info)
AGAINST ('your term' IN BOOLEAN MODE)
ORDER BY id DESC
BTW: Use prepared statements for the 'your term' portion.
If your search string has spaces but each term matter, you need to treat them as separate pieces. So if it's important to have BOTH "Mercedes" AND "Benz":
Don't: AGAINST ('Mercedes Benz' IN BOOLEAN MODE) <--- This means either Mercedes or Benz
Do: AGAINST ('+Mercedes +Benz' IN BOOLEAN MODE)
If you want to have anything that must have the first term, but optionally the second term (ranking higher when both found) do:
AGAINST ('+Mercedes Benz' IN BOOLEAN MODE)
Here's a long list of combinations: https://dev.mysql.com/doc/refman/5.6/en/fulltext-boolean.html
AND dont forget, get rid of the ORDER BY id DESC. I think you're final query should look something like this for "Mercedes Benz"
SELECT art,tit,tem,cred,info,que,ano,url
FROM contenido
WHERE MATCH (art,tit,tem,cred,info)
AGAINST ('+Mercedes Benz' IN BOOLEAN MODE);
Yep, freetext in MySQL has a lot of quirks, but play around, you'll get the hang of it.

Related

How to make FULL TEXT SEARCH more accurate using PHP

I am implementing a search feature for my project. I am using a FULL TEXT SEARCH query to derive accurate results to User. I am beginner in PHP programming and I do not have enough information about FULL TEXT SEARCH.
This is my query:
$sql = $conn->prepare("SELECT *, MATCH(title, keyword) AGAINST(? IN BOOLEAN MODE) AS relevance FROM table ORDER BY relevance DESC LIMIT 20");
$sql->bind_param("s", $q);
$sql->execute();
$rs = $sql->get_result();
This query works good but this is only showing old results first instead of accurate results, and second thing is this query is not working correctly when the length of keyword is not more than 1 (e.g. keyword = Google).
Please do not give suggestions about Elastic search, Sphinx,
Algolia etc.
When MATCH() is used in a WHERE clause, the rows returned are automatically sorted with the highest relevance first.
So all you have to do is, remove the match from select and put it in where condition.
Source: https://dev.mysql.com/doc/refman/8.0/en/fulltext-natural-language.html
Why are you not using the sql like operator, I am providing you the example for multiple words in column named product in table named products
$db=mysqli_connect('localhost','root','','project');
$search=$_GET['userinput'];
$searcharray = explode(' ', $search);
$searchpdo=array();
$finalstate="";
foreach ( $searcharray as $ind=> $query){
$sql=array();
$exp='%'.$query.'%';
array_push($sql,"(title LIKE ?)");
array_push($searchpdo,$exp);
array_push($sql,"(keywords LIKE ?)");
array_push($searchpdo,$exp);
if($finalstate==""){
$finalstate = "(".implode(" OR ",$sql).")";
}
else{
$finalstate = $finalstate." AND "."(".implode(" OR ",$sql).")";
}
}
$stmt = $db->prepare("SELECT * FROM products WHERE (".$finalstate.") ");
$types=str_repeat('s',count($searchpdo));
$stmt->bind_param($types,...$searchpdo);
$stmt->execute();
$result = $stmt->get_result();
This will provide you the correct result with single word or multiple words
I think you have to tweak you query little bit and you would get desired results as under:
$sql = mysql_query("SELECT * FROM
patient_db WHERE
MATCH ( Name, id_number )
AGAINST ('+firstWord +SecondWord +ThirdWord' IN BOOLEAN MODE);");
and if you want to do exact search:
$sql = mysql_query("SELECT *
FROM patient_db
WHERE MATCH ( Name, id_number )
AGAINST ('"Exact phrase/Words"' IN BOOLEAN MODE);");
I had also posted the same answer in SO post somewhere but didn't know the post
There are multiple aspect to your question
If available, use mysql client to run the query instead of PHP first, until your query is ready to the like you want
If you recent documents (record) to show up on top of the search result, you need to change your ORDER BY clause. Currently, it is supposed to return the closest match (i.e. by relevance).
You need to strike a balance between relevance and recency (not clear how you define this) in your custom logic. A simple example that prioritize last week over last month and last month over the rest:
SELECT
....
, DATEDIFF (ItemDate, CURDATE() ) ItemAgeInDays
ORDER BY
relevance
* 100
* CASE
WHEN ItemAgeInDays BETWEEN 0 AND 7 --- last week
THEN 20
WHEN ItemAgeInDays BETWEEN 0 AND 30 --- last month
THEN 10
ELSE 1
END
DESC
You say single word item cannot be searched. In BOOLEAN MODE, you build a boolean logic for your search and such it uses special characters for that. For example +apple means 'apple' must exist. It is possible your single word might be conflicting with these characters.
Please review this reference, it explains the BOOLEAN MODE in great detail.
https://dev.mysql.com/doc/refman/8.0/en/fulltext-boolean.html
You say the query is not returning correct result. FULL TEXT search searches for your login in each document(row) and finds how many times it appears in each document. It then offset that by how many times your search appears in ALL documents. This means it prioritizes records where your search appears much more than the average. If your search is not distinguishing enough, it might seem not correct if most documents are similar to each in terms of that search. See the above link for more details.
BOOLEAN MODE does not sort the result by relevance by default. You need to add ORDER BY yourself, which you already did. Just wanted to note it here for others

MySQL Fulltext Boolean Mode search returns too many results

I'm having an issue with my fulltext search query and I've pretty much gone through all the forums and threads I could find but I'm still having this issue.
I am using the MySQL Fulltext Boolean Mode search to return matching rows based on two columns (artist name and track title). The user can enter any phrase they wish into the search bar, and the results are supposed to be only rows that contain ALL parts of the search query in EITHER of the columns.
This is the query I have so far, and it works with most queries but for some it return results too loosely and I'm not sure why.
SELECT * FROM tracks WHERE MATCH(artist, title) AGAINST('+paul +van +dyk ' IN BOOLEAN MODE)
This query however, returns rows containing Paul, without the 'Van' or 'Dyk'. Again, I want the query to return only rows that contain ALL of the keywords in EITHER the Artist or Track Name column.
Thanks in advance
To enhance sorting of the results in boolean mode you can use the following:
SELECT column_names, MATCH (text) AGAINST ('word1 word2 word3')
AS col1 FROM table1
WHERE MATCH (text) AGAINST ('+word1 +word2 +word3' in boolean mode)
order by col1 desc;
Using the first MATCH() we get the score in non-boolean search mode (more distinctive). The second MATCH() ensures we really get back only the results we want (with all 3 words).
So your query will become:
SELECT *, MATCH (artist, title) AGAINST ('paul van dyk')
AS score FROM tracks
WHERE MATCH (artist, title)
AGAINST ('+paul +van +dyk' in boolean mode)
order by score desc;
Hopefully; you will get better results now.
If it works or do not work; please let me know.
Piggybacking off of #Avidan's answer. These are the operations that can be performed on full-text searches to help create the query you desire.
The following examples demonstrate some search strings that use
boolean full-text operators:
'apple banana'
Find rows that contain at least one of the two words.
'+apple +juice'
Find rows that contain both words.
'+apple macintosh'
Find rows that contain the word “apple”, but rank rows higher if they
also contain “macintosh”.
'+apple -macintosh'
Find rows that contain the word “apple” but not “macintosh”.
'+apple ~macintosh'
Find rows that contain the word “apple”, but if the row also contains
the word “macintosh”, rate it lower than if row does not. This is
“softer” than a search for '+apple -macintosh', for which the presence
of “macintosh” causes the row not to be returned at all.
'+apple +(>turnover
Find rows that contain the words “apple” and “turnover”, or “apple”
and “strudel” (in any order), but rank “apple turnover” higher than
“apple strudel”.
'apple*'
Find rows that contain words such as “apple”, “apples”, “applesauce”,
or “applet”.
'"some words"'
Docs: https://dev.mysql.com/doc/refman/5.5/en/fulltext-boolean.html

MySQL Full text search on multiple keywords one-many

I'm trying to use MySQL's FTS to search through indexed content to look for certain keywords. For what I'm trying to make, it needs either one or more of the keywords in the text, and the keywords must be exact word matches. However, it doesn't matter if the keywords is in the middle of another word, for example, when searching for "STACK", it should match:
Hi, I have a stack of overflows
Stacked against the wall
These bookcases are overstacked completely
I was using the following method before:
SELECT ... FROM ... WHERE text LIKE '%keyword1%' OR LIKE '%keyword2%' OR LIKE '%keyword3%'
This would return any text that would contain any of the keywords. However, this began to slow down pretty much everything, because most of the indexed content is big (stored in blob) and I have over 500 of those rows to index through. As Like is not using any indexing with this method, I tried converting to FTS using the following:
SELECT ... FROM ... WHERE MATCH(text) AGAINST ('+keyword1 +keyword2 +keyword3' IN BOOLEAN MODE)
This worked good with single keywords, but when entering multiple keywords, this fails because FTS with the + operand NEEDS to find a match with the given words. But without these keywords, the FTS matches fuzzy results, not exact results. I end up with content being missed that most definitly contains the keywords.
What can I use to get all content that contain either one or more of the axact keywords?
try something like this:
SELECT * FROM table MATCH (text) AGAINST ('+"keyword1" +"keyword2"' IN BOOLEAN MODE)
It sounds like you are looking for something like the following:
SELECT ... FROM ...
WHERE MATCH(text) AGAINST ('+keyword1' IN BOOLEAN MODE)
OR MATCH(text) AGAINST ('+keyword2' IN BOOLEAN MODE)
OR MATCH(text) AGAINST ('+keyword3' IN BOOLEAN MODE)

Using a title to determin possible categories in SphinxQL

I have a database with over 60 million records indexed by SphinxQL 2.1.1. Each record has a title and a catid (among other things). When a new record is inserted into the database, I am trying to get sphinx to guess the catid based on the text in the title.
I have managed to get it working for single words like so:
SELECT #groupby, catid, count(*) c FROM sphinx WHERE MATCH('*LANDLORDS*') group by catid order by c desc
However the actual title is likely to be something like this:
Looking for Landlords - Long term lease - No fees!!!
Is there any way to just dump the whole title string into sphinx and have it break down each of the words and perform some sort of fuzzy match, returning the most likely category?
Well as such sphinx isnt 'magical', and it doesn't have a 'fuzzy match' function.
But can approximate one :) Two main steps...
Changing from requiring all 'words', to just requiring some,
changing ranking, to try to make the best 'intersection' between the query and the title, get a high weight, and therefore 'bubble' to the top.
Can then just take the top result, and take it be a 'best guess'.
(there is actully a third, words lie 'for' and 'the' are likly to cause lots of false positives, so may want to exclude them, either using stopwords on the index, or just strip then from the query)
A prototype of such a query might be something like
SELECT catid FROM sphinx WHERE MATCH('"Looking Landlords Long term lease No fees"/1') OPTION ranker=wordcount LIMIT 1;
Thats using quorum to affect matching, and choosing a different ranker.
Using this version with grouping, proabbly wont work, as will include lots of low quality matches. Although could perhap try using avg, or sum to get a composite weight?
SELECT SUM(WEIGHT()) as w, catid FROM sphinx WHERE MATCH('"Looking Landlords Long term lease No fees"/1') GROUP BY catid ORDER BY w DESC OPTION ranker=wordcount LIMIT 1
There are lots of ways to tweak this...
You can try other rankers, eg matchany. Or even some custom ranking expressions.
Or change the quorum, eg rather rank requiring 1 word, could result at least a few.
Or if can extract phrases, eg
'"Looking Landlords" | "Long term lease" | "No fees"'
might work?
ALso could rather than just taking the top result, take the top 5-10 results, and show them all to the user, compenstates for the fact the results are very approximate.

php/mysql/ajax: google style search with suggestions

I have an ajax script that searches database tables for expressions similar to google search. The SELECT statement just uses LIKE and finds matches in the relevant fields. It worked fine at first but as content has grown, it is giving way too many matches for most search strings.
For example, if you search for att, you get att but also attention, attaboy, buratta etc.
Good search engines such as Google seem to have an intermediate table of suggestions that have been vetted by others. Rather than search the data directly, they seem to search the approved phrases such as AT&T and succeed in narrowing the number of results. Has anyone coded something like this and suggest the right dbase schema and query to get relevant results.
Right now I am searching table of say names directly with something like
$sql = "SELECT lastname from people WHERE lastname LIKE '%$searchstring%'";
I imagine besides people I should create some intermediate table along the lines of
people
id|firstname|lastname|description
niceterms
id|niceterm|peopleid
Then the query could be:
$sql = "SELECT p.lastname,p.peopleid, n.niceterm, n.peopleid,
FROM `people` p
LEFT JOIN `niceterms` n
on p.id = n.peopleid
WHERE niceterm LIKE '%$searchterm%'";
..so when you type something in the search box, you get nice search terms that will yield better results.
But how do I populate the niceterms table. Is this the right approach? I'm not trying to create a whole backweb or pagerank. Just want to narrow search results so they are relevant.
Thanks for any suggestions.
You might want to take a look at FULLTEXT search in Mysql. It allowes you to create powerfull query's based on relevance. You can for example create a BOOLEAN search which allowes you to create a scorerow in your result. The score will be based on rules like does the text start with a karakter combination (yes? +2, no but it does contain the combination: +1)
The below code is just another column and it has 3 rules in it:
Does the p1.name field contain Bl or rock? if yes -> add score
Does the p1.name field start with either Bl or rock? if yes -> add score
IS the p1.name equal to Bl rock? if yes -> add score
MATCH p1.name AGAINST('>Bl* >rock* >((+Bl*) (+rock*)) >("Bl rock")' IN BOOLEAN MODE) AS match
Now just order by match and it will show you the most relevant searches. You can also combine the order by with multiple statements and add a limit like below:
Orders by most recent date, highest match and then orders the matches that have the same score by their character length
ORDER BY `date` DESC, `match` DESC, LENGTH(`p1`.`name`) ASC
Keep in mind that the above code somehow creates a relevant result based on common cases. Copying Google will be imposible since their algorithms for optimal results / speed are incredible.
If FULLTEXT search is a step to much, try to make a tag system. Tagging content with unique tag combinations will also result in a more reliable search result

Categories