Keyword search using PHP MySql? - php

I have title (varchar), description (text), keywords (varchar) fields in my mysql table.
I kept keywords field as I thought I would be searching in this field only. But I now require to search among all three fields. so for keywords "word1 word2 word3", my query becomes
SELECT * FROM myTable
WHERE (
name LIKE '%word1%' OR description LIKE '%word1%' OR keywords LIKE '%word1%'
OR name LIKE '%word2%' OR description LIKE '%word2%' OR keywords LIKE '%word2%'
OR name LIKE '%word3%' OR description LIKE '%word3%' OR keywords LIKE '%word3%')
AND status = 'live'
Looks a bit messy but this works. But now I need to implement synonym search. so for a given word assuming there are a few synonyms available this query becomes more messy as I loop through all of the words. As the requirements are getting clearer, I will need to join this myTable to some other tables as well.
So
Do you think the above way is messy and will cause problems as the data grow?
How can I avoid above mess? Is there any cleaner solution I can go by? Any example will help me.
Is there any other method/technique you can recommend to me?
With thanks
EDIT
#Peter Stuifzand suggested me that I could create one search_index table and store all 3 fields (title,keyword,desc) info on that and do full text search. I understand that additionally this table will include reference to myTable primary key as well.
But my advanced search may include joining mytable with Category table, geographic_location table (for searching within 10, 20 miles etc), filtering by someother criteria and of course, sorting of search results. Do you think using mysql fulltext will not slow it down?

When your queries are getting out of hand, it's sometimes better to write parts of it in SQL and other parts in your programming language of choice.
And you could also use fulltext search for searching. You can create separate table with all fields that you want to search and add the FULLTEXT modifier.
CREATE TABLE `search_index` (
`id` INT NOT NULL,
`data` TEXT FULLTEXT,
);
SELECT `id` FROM `search_index` WHERE MATCH(`data`) AGAINST('word1 word2 word3');

One more way (sometimes it's better but it depends...)
SELECT
id, name, description, keywords
FROM
myTable
WHERE
name REGEXP '.*(word1|word2|word3).*' OR
description REGEXP '.*(word1|word2|word3).*' OR
keywords REGEXP '.*(word1|word2|word3).*'
;
PS: But MATCH(cols) AGAINST('expr') possibly is better for your case.

If at all possible, you should look into fulltext search.

Given the expanded requirements, you might want consider using apache solr (see http://lucene.apache.org/solr/) it is a faceted search engine, designed for full text searching. It has a RESTful interface that can return XML or JSON. I am using it with a few projects - works well.
The only area I see you hitting some problems is potentially with the proximity search, but with some additional logic for building the query it should work.

Related

Searching a field for the whole word, and nothing but the word

I have a PHP interface with a keyword search, working off a DB(MySQL) which has a Keywords field.
The way in which the keywords field is set up is as follows, it is a varchar with all the words formatted as shown below...
the, there, theyre, their, thermal etc...
if i want to just return the exact word 'the' from the search how would this be achieved?
I have tried using 'the%' and '%the' in the PHP and it fails to work by not returning all of the rows where the keyword appears in.
is there a better (more accurate) way to go about this?
Thanks
If you want to select the rows that have exactly the keyword the:
SELECT * FROM table WHERE keyword='the'
If you want to select the rows that have the keyword the anywhere in them:
SELECT * FROM table WHERE keyword LIKE '%the%'
If you want to select the rows that start with the keyword the:
SELECT * FROM table WHERE keyword LIKE 'the%'
If you want to select the rows that end with the keyword the:
SELECT * FROM table WHERE keyword LIKE '%the'
Try this
SELECT * FROM tablename
WHERE fieldname REGEXP '[[:<:]]test[[:>:]]'
[[:<:]] and [[:>:]] are markers for word boundaries.
MySQL Regular Expressions
if you also search for the commas, you can be sure you are getting the whole word.
where keywordField like '%, the, %'
or keywordField like '%, the'
or keywordField like 'the, %'
maybe I didn't understand the question properly... but If you want all the words where 'the' appears, a LIKE '%word%' should work.
If the DB of words is HUGE MySQL may fail to retrieve some of the words, that can be solved in 2 ways...
1- get a DB that support bigger sizes (not many ppl would chose this one tho). For example SQL Server has a 'CONTAINS' function that works better than LIKE '%word%'.
2- use a external search tool that uses inverted index search. I used Sphinx for a project and it works quite good. This is better if you rarely UPDATE the rows of the data you want to search from, which should be the case.
Sphinx for example would generate a file from your MySQL table and use this file to solve the search (it's very fast), this file should be re-indexed everytime you do a insert or update on the table, making it a much better solution if you rarely update or insert new rows.
It looks like you have a one to many relationship going on within a column. It might be better to create a separate table for keywords with a row for each keyword and a foreign key to whatever it is you're searching on.
Doing like '%???%' is generally a bad idea because the DB can't make use of an index so it will scan the whole table. Whether this matters will depend on the size of data you're working with but its worth considering up front. The single best way to help DB performance is in the initial table design. This can be tricky to change later.

MySQL PHP match words in query with database column

I have the following structure of a table [id - title - tag - ..]
I want to achieve the following:
If there is a record in table with title "I love my job and it is my hobby"
If a query is submitted having two words from the sentence then this sentence should be selected. E.g. query "love hobby". It should give me the above title and not for example "I love my job". At least the sentence with more words matching the query keywords first then the less ones later.
how can I do this search on the title column of my table?
I apologize if explanation not clear...more than happy to help clarify.
Thank you all
Try this :
SELECT title FROM your_table WHERE title LIKE '%love%' AND title LIKE
'%hobby%'
Look into mysql's built in full text search capabilities. In boolean mode, you could transform your query to +love +hobby and have results returned without full table scans. Be aware that this only works with myisam tables, might want to move the indexed data out of the main tables since myisam doesn't support things like foreign keys or transactions.
For more advanced free text indexing you could try sphinx (have mysql look-and-feel interface too) or solr.
If you're using MyISAM or innoDB, you can use the MySQL fulltext search:
SELECT * FROM table_name WHERE MATCH (title) AGAINST ('love hobby' IN BOOLEAN MODE);
It'll also search for individual words as well.
Read this: https://dev.mysql.com/doc/refman/5.5/en/fulltext-boolean.html
You can also use MySQL REGEXP
SELECT title FROM table WHERE title REGEXP ' love .+ hobby';
If you have it as a single string then try:
SELECT title FROM table WHERE title REGEXP REPLACE('love hobby', ' ', '.+ ');

Search a string in multiple fields of a table

I have a table User which has the fields (id, first_name, middle_name, last_name).
I want to write a query to find a user by his name. The name may be first name, middle name or last name.
$sql = "SELECT * FROM user
WHERE first_name like '%$name%' OR
middle_name like '%$name%' OR
last_name like '%$name%'";
Is it efficient query?
(Leave the security issue for the time being.)
Alter table and add composite Fulltext index on First_name,second_name,last_name then use this query
select *
from table_name
where match (`First_name`,`second_name`,`last_name`) against('name')
It's pretty much faster then your query.
As soon as you have a LIKE '%something%' in your WHERE clause, you force a table scan. So yes, it is inefficient, but one or three LIKE statements will make little difference.
The table scan is the big performance hit.
Consider looking at MySQL's Full Text Search capability. It is designed to answer this type of query much more efficiently.
If you need to search for a pattern in more than one fields, and if you have the permission to change table schema, i would suggest implementing FULL-TEXT SEARCH.
Hope it helps :)

MySQL, PHP Relative search results / Optimization

I understand LIKE results with wildcards etc. What I need to know is a good way to get search results with the most relative at the top.
For Example:
I search for "Front Brake CarModel" or something similar.
Currently I explode the string by spaces and create an addition OR/WHERE state so the query would look something like this .
SELECT * FROM table WHERE article_text LIKE '%Front%' OR article_text LIKE '%Brake%' OR article_text LIKE '%CarModel%'
Due to my novice searching skills, this is not great as it get results for every word in the search term. What I would like to happen is get the result and sort with the articles with the most found words at the top. If that makes sense.
Advice?
EDIT : Table is type InnoDB and cannot change type due to foreign key restraints. Thus removing the ability for me to use FULLTEXT indexing :(
This can be done easily with a fulltext index.
ALTER TABLE table ADD FULLTEXT INDEX `ft_search` (`article_text`);
SELECT *, MATCH(article_text) AGAINST('Front Brake CarModel') AS score
FROM table
WHERE MATCH(article_text) AGAINST('Front Brake CarModel') ORDER BY score DESC;

Tag based searching with MySQL

I want to write a tag based search engine in MySQL, but I don't really know how to get to a pleasant result.
I used LIKE, but as I stored over 18k keywords in the database, it's pretty slow.
What I got is a table like this:
id(int, primary key) article_cloud(text) keyword(varchar(40), FULLTEXT INDEX)
So I store one keyword per row and save all the refering article numbers in article_cloud.
I tried the MATCH() AGAINST() stuff, which works fine as long as the user types in the whole keyword. But I also want a suggest search, so that there are relevant articles popping up, while the user is typing. So I still need a similar statement to LIKE, but faster. And I have no idea what I could do.
Maybe this is the wrong concept of tag based searching. If you know a better one, please let me know. I'm fighting with this for days and can't figure out a satisfying solution. Thanks for reading :)
MATCH() AGAINST() / FULLTEXT searching is a quick fix to a problem - but your schema makes no sense at all - surely there are multiple keywords in each article? And using a fulltext index on a column which only contains a single word is rather dumb.
and save all the refering article numbers in article_cloud
No! storing multiple values in a single column is VERY bad practice. When those values are keys to another table, it's a mortal sin!
It looks like you've got a long journey ahead of you to create something which will work efficiently; the quickest route to the goal is probably to use Google or Yahoo's indexing services on your own data. But if you want to fix it yourself....
See this answer on creating a search engine - the keywords should be in a separate table with a N:1 relationship to your articles, primary key on keyword and article id, e.g.
CREATE TABLE article (
id INTEGER NOT NULL autoincrement,
modified TIMESTAMP,
content TEXT
...
PRIMARY KEY (id)
);
CREATE TABLE keyword (
word VARCHAR(20),
article_id INTEGER, /* references article.id
relevance FLOAT DEFAULT 0.5, /* allow users to record relevance of keyword to article*/
PRIMARY KEY (word, article_id)
);
CREATE TEMPORARY TABLE search (
word VARCHAR(20),
PRIMARY KEY (word)
);
Then split the words entered by the user, convert them to a consistent case (same as used for populating the keyword table) and populate the search table, then find matches using....
SELECT article.id, SUM(keyword.relevance)
FROM article, keyword, search
WHERE article.id=keyword.article_id
AND keyword.word=search.word
GROUP BY article_id
ORDER BY SUM(keyword.relevance) DESC
LIMIT 0,3
It'll be a lot more efficient if you can maintain a list of words or rules about words NOT to use as keywords (e.g. ignore any words of 3 chars or less in mixed or lower case will omit stuff like 'a', 'to', 'was', 'and', 'He'...).
Have a look at Sphinx and Lucene
I tried the MATCH() AGAINST() stuff, which works fine as long as the user types in the whole keyword.
what do you think that FULLTEXT means?
I had 40 000 entries in my table, using no indexes (local use) and it searched for maximally 0.1 sec with LIKE '%SOMETHING%'
You may LIMIT your queries output

Categories