I want to write a tag based search engine in MySQL, but I don't really know how to get to a pleasant result.
I used LIKE, but as I stored over 18k keywords in the database, it's pretty slow.
What I got is a table like this:
id(int, primary key) article_cloud(text) keyword(varchar(40), FULLTEXT INDEX)
So I store one keyword per row and save all the refering article numbers in article_cloud.
I tried the MATCH() AGAINST() stuff, which works fine as long as the user types in the whole keyword. But I also want a suggest search, so that there are relevant articles popping up, while the user is typing. So I still need a similar statement to LIKE, but faster. And I have no idea what I could do.
Maybe this is the wrong concept of tag based searching. If you know a better one, please let me know. I'm fighting with this for days and can't figure out a satisfying solution. Thanks for reading :)
MATCH() AGAINST() / FULLTEXT searching is a quick fix to a problem - but your schema makes no sense at all - surely there are multiple keywords in each article? And using a fulltext index on a column which only contains a single word is rather dumb.
and save all the refering article numbers in article_cloud
No! storing multiple values in a single column is VERY bad practice. When those values are keys to another table, it's a mortal sin!
It looks like you've got a long journey ahead of you to create something which will work efficiently; the quickest route to the goal is probably to use Google or Yahoo's indexing services on your own data. But if you want to fix it yourself....
See this answer on creating a search engine - the keywords should be in a separate table with a N:1 relationship to your articles, primary key on keyword and article id, e.g.
CREATE TABLE article (
id INTEGER NOT NULL autoincrement,
modified TIMESTAMP,
content TEXT
...
PRIMARY KEY (id)
);
CREATE TABLE keyword (
word VARCHAR(20),
article_id INTEGER, /* references article.id
relevance FLOAT DEFAULT 0.5, /* allow users to record relevance of keyword to article*/
PRIMARY KEY (word, article_id)
);
CREATE TEMPORARY TABLE search (
word VARCHAR(20),
PRIMARY KEY (word)
);
Then split the words entered by the user, convert them to a consistent case (same as used for populating the keyword table) and populate the search table, then find matches using....
SELECT article.id, SUM(keyword.relevance)
FROM article, keyword, search
WHERE article.id=keyword.article_id
AND keyword.word=search.word
GROUP BY article_id
ORDER BY SUM(keyword.relevance) DESC
LIMIT 0,3
It'll be a lot more efficient if you can maintain a list of words or rules about words NOT to use as keywords (e.g. ignore any words of 3 chars or less in mixed or lower case will omit stuff like 'a', 'to', 'was', 'and', 'He'...).
Have a look at Sphinx and Lucene
I tried the MATCH() AGAINST() stuff, which works fine as long as the user types in the whole keyword.
what do you think that FULLTEXT means?
I had 40 000 entries in my table, using no indexes (local use) and it searched for maximally 0.1 sec with LIKE '%SOMETHING%'
You may LIMIT your queries output
Related
I know this question was discussed a lot of times.
Anyway, I would like to figure out again.
Well, I have table "articles" contains these fields:
title (varchar 255)
keywords (varchar 255)
content_body_1 (mediumtext)
content_body_2 (mediumtext)
There is an index on "title" and "keywords". However, there is no index on MEDIUMTEXT fields.
I need to perform "whole word" search on all these fields. I am now doing this using REGEXP:
SELECT * FROM `articles` WHERE `content_body_1` REGEXP '[[:<:]]"keyword"[[:>:]]'
And so on. It's okay for 100 articles, but it's VERY slow (2-3 seconds) on 1000 articles. REGEXP does not use indexing in MySQL. What if I have 10000 articles? Is there any way for faster search by whole keyword?
How can I get that? Is FULLTEXT much faster? If yes - how can I design my database? And also what should I do with FULLTEXT limit of minimum characters to search?
Thanks.
FULLTEXT is much faster than REGEXP, when it applies. As a test, I found a word in 4 rows out of 173,979 rows in 0.06 seconds.
You need to do ALTER TABLE tbl ADD FULLTEXT(content_body_1); to build a FT index for that one column.
You can combine multiple columns into a single FT index -- if you want to search across all of them. If you also want to search individual columns, then add single-column FT indexes.
Study the details; MyISAM has one set of caveats; InnoDB has a different set.
Why are you using a regexp for a full text search? You could just as easily use the % character and it's probably much faster than doing a regex.
SELECT * FROM articles WHERE content_body_1 LIKE '%keyword%'
This will find any rows where your content_body_1 contains the keyword somewhere in it.
I have a table with 2 columns as Email and ID. I want to search exact matching Email value in column.
I have setup my Table with MyISAM Engine and set Email column with FullText index. When I run query to search for exact match it sometimes work and sometimes it fails.
this is my table definition
CREATE TABLE `tbl_email` (
`email` varchar(60),
`uid` int(11)
FULLTEXT KEY `EmailIndex` (`email`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
And this is my Query to match against my email value
select uid from tbl_email where MATCH(email) AGAINST ('abcdefghi#yahoo.com')
limit 1;
It sometimes work and sometimes it fails to return matching result even though there is a matching result in table. Am I doing anything wrong? What should I do to match exact value in FullText searching?
I also tried using IN BOOLEAN MODE but that is same no use like this
select uid from tbl_email where MATCH(email) AGAINST ('abcdefghi#yahoo.com'
IN BOOLEAN MODE) limit 1;
As far as I know that FullText index searching interprets the search string as a phrase in natural human language and breaks words if necessary for searching as said on http://dev.mysql.com/doc/refman/5.5/en/fulltext-search.html and most important look here
*The stopword list applies. In addition, words that are present in 50% or more of the rows are considered common and do not match. *
And I believe that your every email will have .com (. as stopword) in it meaning your whole table will be matched against your provided search.
You better go with simple indexing with InnoDB as it will be better for faster inserting of records and make simple where clause.
I don't know what algorithm is used for FullText searching as opposed to normal index for string search but suppose if you are doing it with FullText indexing, I guess due to different interpretations it will take more than normal index because then it will have to look every email value as all have stop words like # and .com etc. But this is just my understanding I am not a Data Search Algorithm Maker.
You don't have to match using full text, you can simply run:
SELECT uid FROM tbl_email WHERE email='abcedfghi#yahoo.com' LIMIT 1;
That query should return exactly what you want to fetch.
According to the MySQL 5.5 reference page when using FULLTEXT to find exact phrases you would enclose them in single and double quotes. The single quotes are delimiters while the double quotes encapsulate your query.
e.g. : ... MATCH(email) AGAINST('"someone#example.com"') ...
However, echoing what others have already said, a simple WHERE clause outta get you by from the looks of your query. I think FULLTEXT is better suited to find keywords in heaps of information within a record, not single value fields like an email field.
I understand LIKE results with wildcards etc. What I need to know is a good way to get search results with the most relative at the top.
For Example:
I search for "Front Brake CarModel" or something similar.
Currently I explode the string by spaces and create an addition OR/WHERE state so the query would look something like this .
SELECT * FROM table WHERE article_text LIKE '%Front%' OR article_text LIKE '%Brake%' OR article_text LIKE '%CarModel%'
Due to my novice searching skills, this is not great as it get results for every word in the search term. What I would like to happen is get the result and sort with the articles with the most found words at the top. If that makes sense.
Advice?
EDIT : Table is type InnoDB and cannot change type due to foreign key restraints. Thus removing the ability for me to use FULLTEXT indexing :(
This can be done easily with a fulltext index.
ALTER TABLE table ADD FULLTEXT INDEX `ft_search` (`article_text`);
SELECT *, MATCH(article_text) AGAINST('Front Brake CarModel') AS score
FROM table
WHERE MATCH(article_text) AGAINST('Front Brake CarModel') ORDER BY score DESC;
I haven't touched any code in a good 4-5 months so just getting back into it today, usually takes me a week or so to get all the info flowing through my brain again once I take months off like that. So my project I am about to start will be a PHP/MySQL backend bookmarks database.
I want to create a nice searchable database with all my favorite websites/bookmarks. Each record will have multiple keywords assigned to it so I can easily search all my bookmarks for the term "php" and all records with "php" in there keyword column or title or otherwise will come back in a result set.
Here is my idea for the database so far...
auto_id = /*Auto incremented ID number for database*/
name/title = /*Name/title of the Website*/
description = /*brief description of the site*/
URL = /*URL to open when I click a link*/
clicks = /*increments by 1 everytime I click the link*/
date_created = /*datetime that URL bookmark was added*/
date_accessed = /*datetime field for when last clicked on*/
category = /*category name or number to create a folder like structure of bookmarks in groups*/
sub_category = /*some categories will have subcategories (ie programming->c## programming->PHP )*/
keywords = /*Keywords used for searching*/
This is pretty straight forward for me on how to build this system all except I am looking for help/advice on the best way to store the keywords. Each website/record I add to the DB can have 1 up to multiple keywords per site. These keywords need to be able to help with the searching part of my app. So how should I store keywords for a site in my database? I know I could just have a "keywords" row in the table and store the keywords for each record like this "php, web, etc, keyword4" so all keywords for each site are saved in 1 column but this does not seem to be the best method when it comes to searching the database.
Please tell me how you would do this part? Thanks for any help
The best way to do this is to create a separate table to contain your keywords and then add an intersection (or join) table to join keywords with bookmarks.
CREATE TABLE bookmarks (
id INT NOT NULL,
... etc.
)
CREATE TABLE keywords (
id INT NOT NULL,
... etc.
)
CREATE TABLE bookmark_keywords (
bookmark_id INT NOT NULL,
keyword_id INT NOT NULL,
PRIMARY KEY (bookmark_id, keyword_id),
FOREIGN KEY bookmark_id REFERENCES bookmarks (id),
FOREIGN KEY keyword_id REFERENCES keywords (id)
)
When you insert a bookmark, you'd also insert any keywords that are being used and aren't already in the keywords table, as well as a row in bookmark_keywords in order to join the keyword with the bookmark.
Then, when you want to query for what keywords a bookmark has:
SELECT k.*
FROM keywords AS k
LEFT JOIN bookmark_keywords AS kb
ON kb.keyword_id = k.id
WHERE kb.bookmark_id = [ID of the bookmark]
And to query for what bookmarks share a particular keyword:
SELECT b.*
FROM bookmarks AS b
LEFT JOIN bookmark_keywords AS kb
ON kb.bookmark_id = b.id
WHERE kb.keyword_id = [ID of the keyword]
You're right, storing a comma-separated list in one column is not a good way to do it (this is called a repeating group and it violates the First Normal Form of relational database design).
Using a LIKE predicate is not a good choice, because it cannot benefit from an index. Searching for keywords this way is hundreds or thousands of times slower than designing a proper database in normal form, and adding indexes.
You need to store a second table listing keywords, and a third many-to-many table to pair keywords to applicable bookmarks. This is a pretty standard design for "tagging" in a relational database.
In non-relational databases like CouchDB or MongoDB, you can make one field a set of keywords, and index them so queries can be efficient. But not in a relational database.
See also:
How do you recommend implementing tags or tagging
Database Design for Tagging
Also when viewing those questions, check the many related questions in the column on the right.
The easiest, and fastest, search technique to implement is the use of MySQL's LIKE statement. LIKE lets you search through a column for a specific string. Consider the following example...
auto_id name description
1 Cool PHP Site you know you love it
2 PLARP! its Ruby gems gems gems!
3 SqlWha sql for the masses
4 FuzzD00dle fun in the sun, with some fuzz
You could find all rows that contain the string 'php' in either the 'name' or 'description' field using the following query...
SELECT * FROM bookmarks WHERE name LIKE '%php%' OR description LIKE '%php%';
'%' is a wildcard character.
Reference on MySQL LIKE: http://www.tutorialspoint.com/mysql/mysql-like-clause.htm
You could also add a 'keywords' column and store the keywords in a comma delimited format (ie: plarp1, plarp2, plarp3), then search through that.
I have title (varchar), description (text), keywords (varchar) fields in my mysql table.
I kept keywords field as I thought I would be searching in this field only. But I now require to search among all three fields. so for keywords "word1 word2 word3", my query becomes
SELECT * FROM myTable
WHERE (
name LIKE '%word1%' OR description LIKE '%word1%' OR keywords LIKE '%word1%'
OR name LIKE '%word2%' OR description LIKE '%word2%' OR keywords LIKE '%word2%'
OR name LIKE '%word3%' OR description LIKE '%word3%' OR keywords LIKE '%word3%')
AND status = 'live'
Looks a bit messy but this works. But now I need to implement synonym search. so for a given word assuming there are a few synonyms available this query becomes more messy as I loop through all of the words. As the requirements are getting clearer, I will need to join this myTable to some other tables as well.
So
Do you think the above way is messy and will cause problems as the data grow?
How can I avoid above mess? Is there any cleaner solution I can go by? Any example will help me.
Is there any other method/technique you can recommend to me?
With thanks
EDIT
#Peter Stuifzand suggested me that I could create one search_index table and store all 3 fields (title,keyword,desc) info on that and do full text search. I understand that additionally this table will include reference to myTable primary key as well.
But my advanced search may include joining mytable with Category table, geographic_location table (for searching within 10, 20 miles etc), filtering by someother criteria and of course, sorting of search results. Do you think using mysql fulltext will not slow it down?
When your queries are getting out of hand, it's sometimes better to write parts of it in SQL and other parts in your programming language of choice.
And you could also use fulltext search for searching. You can create separate table with all fields that you want to search and add the FULLTEXT modifier.
CREATE TABLE `search_index` (
`id` INT NOT NULL,
`data` TEXT FULLTEXT,
);
SELECT `id` FROM `search_index` WHERE MATCH(`data`) AGAINST('word1 word2 word3');
One more way (sometimes it's better but it depends...)
SELECT
id, name, description, keywords
FROM
myTable
WHERE
name REGEXP '.*(word1|word2|word3).*' OR
description REGEXP '.*(word1|word2|word3).*' OR
keywords REGEXP '.*(word1|word2|word3).*'
;
PS: But MATCH(cols) AGAINST('expr') possibly is better for your case.
If at all possible, you should look into fulltext search.
Given the expanded requirements, you might want consider using apache solr (see http://lucene.apache.org/solr/) it is a faceted search engine, designed for full text searching. It has a RESTful interface that can return XML or JSON. I am using it with a few projects - works well.
The only area I see you hitting some problems is potentially with the proximity search, but with some additional logic for building the query it should work.