Match exact value with FullText search in MyISAM - php

I have a table with 2 columns as Email and ID. I want to search exact matching Email value in column.
I have setup my Table with MyISAM Engine and set Email column with FullText index. When I run query to search for exact match it sometimes work and sometimes it fails.
this is my table definition
CREATE TABLE `tbl_email` (
`email` varchar(60),
`uid` int(11)
FULLTEXT KEY `EmailIndex` (`email`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
And this is my Query to match against my email value
select uid from tbl_email where MATCH(email) AGAINST ('abcdefghi#yahoo.com')
limit 1;
It sometimes work and sometimes it fails to return matching result even though there is a matching result in table. Am I doing anything wrong? What should I do to match exact value in FullText searching?
I also tried using IN BOOLEAN MODE but that is same no use like this
select uid from tbl_email where MATCH(email) AGAINST ('abcdefghi#yahoo.com'
IN BOOLEAN MODE) limit 1;

As far as I know that FullText index searching interprets the search string as a phrase in natural human language and breaks words if necessary for searching as said on http://dev.mysql.com/doc/refman/5.5/en/fulltext-search.html and most important look here
*The stopword list applies. In addition, words that are present in 50% or more of the rows are considered common and do not match. *
And I believe that your every email will have .com (. as stopword) in it meaning your whole table will be matched against your provided search.
You better go with simple indexing with InnoDB as it will be better for faster inserting of records and make simple where clause.
I don't know what algorithm is used for FullText searching as opposed to normal index for string search but suppose if you are doing it with FullText indexing, I guess due to different interpretations it will take more than normal index because then it will have to look every email value as all have stop words like # and .com etc. But this is just my understanding I am not a Data Search Algorithm Maker.

You don't have to match using full text, you can simply run:
SELECT uid FROM tbl_email WHERE email='abcedfghi#yahoo.com' LIMIT 1;
That query should return exactly what you want to fetch.

According to the MySQL 5.5 reference page when using FULLTEXT to find exact phrases you would enclose them in single and double quotes. The single quotes are delimiters while the double quotes encapsulate your query.
e.g. : ... MATCH(email) AGAINST('"someone#example.com"') ...
However, echoing what others have already said, a simple WHERE clause outta get you by from the looks of your query. I think FULLTEXT is better suited to find keywords in heaps of information within a record, not single value fields like an email field.

Related

Why is MATCH against less effective than Exact match?

I've a around 20 rows in MySQL table with the its Title column as Elsewhere and with other different column paramters.
I'm currently using a query like this, since most of my searches (via a PHP file) require me to give a close guess. So I use FULLTEXT INDEX
SELECT * FROM `my_db` WHERE MATCH (`Title`) AGAINST ('Elsewhere' IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION) AND (Type ='movie' OR Type='series' OR Type IS NULL)
This works just fine, but sometimes, the most obviously of matches like this one gives me 0 rows. On the contrary, If I do something like :
SELECT * FROM `my_db` WHERE `Title` = "Elsewhere";
It gives me all the 20 rows.
Shouldn't the first case give more results compared to the last one for being less more specific?
Note: I'm using MATCH for a search. I do not want to perform an exact match everytime. Basically, a user input on the client side is being searched for on the DB.
It turns out that elsewhere is a Stopword for MyISAM Full-text Search: Indexes http://dev.mysql.com/doc/refman/5.6/en/fulltext-stopwords.html

Querying a table where the field contains any order of given text strings

I want to query a table as follows:
I have a field called "category" and my input match contains N separate words. I want the query to match all rows that contain all N words, but in any order.
For example if the field category contains "hello good morning world", my input query can contain "hello morning" or "good" or "world hello" and all are matches to the query.
How do I formulate such an SQL expression?
Also it would be good if the query can be made case insensitive.
If you are using MySQL you can use the boolean fulltext search feature to achieve this. You can put a + in front of each term and then only results with all the terms, in any order, will be returned. You will need to make sure the column containing the category field has a fulltext index specified on it for this to work. Other database engines probably have similar features. So for example you might do something like the following assuming there were a fulltext index over the category column...
SELECT * FROM myTable WHERE MATCH (category) AGAINST ('+term1 +term2 +term3' IN BOOLEAN MODE);
I would avoid using the "LIKE" operator as others have suggested you would have to worry about the headache of mixed upper/lower case and if you have a large database using a % in the front of a LIKE search term is going to cause a full table scan instead of using an index which is horrible for performance.
I'm not writing the loop that will build this query for you. This will get the job done, but it will be pretty inefficient.
SELECT * FROM table
WHERE (
TOUPPER(category) LIKE '*HELLO*' AND
TOUPPER(category) LIKE '*GOOD*' AND
TOUPPER(category) LIKE '*MORNING*' AND
TOUPPER(category) LIKE '*WORLD*'
);
You could also research using REGEXes with SQL.

MySQL Match Against Reserved Word in Field

In a database I work with, there are a few million rows of customers. To search this database, we use a match against Boolean expression. All was well and good, until we expanded into an Asian market, and customers are popping up with the name 'In'. Our search algorithm can't find this customer by name, and I'm assuming that it's because it's an InnoDB reserved word. I don't want to convert my query to a LIKE statement because that would reduce performance by a factor of five. Is there a way to find that name in a full text search?
The query in production is very long, but the portion that's not functioning as needed is:
SELECT
`customer`.`name`
FROM
`customer`
WHERE
MATCH(`customer`.`name`) AGAINST("+IN*+KYU*+YANG*" IN BOOLEAN MODE);
Oh, and the innodb_ft_min_token_size variable is set to 1 because our customers "need" to be able to search by middle initial.
It isn't a reserved word, but it is in the stopword list. You can override this with ft_stopword_file, to give your own list of stopwords. 2 possible problems with these are: (1) on altering it, you need to rebuild your fulltext index (2) it's a global variable: you can't alter it on a session / location / language-used basis, so if you really need all the words & are using a lot of different languages in one database, providing an empty one is almost the only way to go, which can hurt a bit for uses where you would like a stopword list to be used.

optimize tables for search using LIKE clause in MySQL

I am building a search feature for the messages part of my site, and have a messages database with a little over 9,000,000 rows, and and index on the sender, subject, and message fields. I was hoping to use the LIKE mysql clause in my query, such as (ex)
SELECT sender, subject, message FROM Messages WHERE message LIKE '%EXAMPLE_QUERY%';
to retrieve results. unfortunately, MySQL doesn't use indexes when a leading wildcard is present , and this is necessary for the search query could appear anywhere in the message (this is how the wildcards work, no?). Queries are very very slow and I cannot use a full text index either, because of the annoying 50% rule (I just can't afford to rule that much out). Is there anyway (or even, any alternative to this) to optimize a query using like and two wildcards? Any help is appreciated.
You should either use full-text indexes (you said you can't), design a full-text search by yourself or offload the search from MySQL and use Sphinx/Lucene. For Lucene you can use Zend_Search_Lucene implementation from Zend Framework or use Solr.
Normal indexes in MySQL are B+Trees, and they can't be used if the starting of the string is not known (and this is the case when you have wildcard in the beginning)
Another option is to implement search on your own, using reference table. Split text in words and create table that contains word, record_id. Then in the search you split the query in words and search for each of the words in the reference table. In this way you are not limitting yourself to the beginning of the whole text, but only to the beginning of the given word (and you'll match the rest of the words anyway)
'%EXAMPLE_QUERY%'; is a very very bad idea .. am going to give you some
A. Avoid wildcards at the start of LIKE queries use 'EXAMPLE_QUERY%'; instead
B. Create Keywords where you can easily use MATCH
If you want to stick with using MySQL, you should use FULL TEXT indexes. Full text indexes index words in a text block. You can then search on word stems and return the results in order of relevance. So you can find the word "example" within a block of text, but you still can't search efficiently on "xampl" to find "example".
MySQL's full text search is not great, but it is functional.
http://dev.mysql.com/doc/refman/5.1/en/fulltext-search.html
select * from emp where ename like '%e';
gives emp_name that ends with letter e.
select * from emp where ename like 'A%';
gives emp_name that begins with letter a.
select * from emp where ename like '_a%';
gives emp_name in which second letter is a.

Tag based searching with MySQL

I want to write a tag based search engine in MySQL, but I don't really know how to get to a pleasant result.
I used LIKE, but as I stored over 18k keywords in the database, it's pretty slow.
What I got is a table like this:
id(int, primary key) article_cloud(text) keyword(varchar(40), FULLTEXT INDEX)
So I store one keyword per row and save all the refering article numbers in article_cloud.
I tried the MATCH() AGAINST() stuff, which works fine as long as the user types in the whole keyword. But I also want a suggest search, so that there are relevant articles popping up, while the user is typing. So I still need a similar statement to LIKE, but faster. And I have no idea what I could do.
Maybe this is the wrong concept of tag based searching. If you know a better one, please let me know. I'm fighting with this for days and can't figure out a satisfying solution. Thanks for reading :)
MATCH() AGAINST() / FULLTEXT searching is a quick fix to a problem - but your schema makes no sense at all - surely there are multiple keywords in each article? And using a fulltext index on a column which only contains a single word is rather dumb.
and save all the refering article numbers in article_cloud
No! storing multiple values in a single column is VERY bad practice. When those values are keys to another table, it's a mortal sin!
It looks like you've got a long journey ahead of you to create something which will work efficiently; the quickest route to the goal is probably to use Google or Yahoo's indexing services on your own data. But if you want to fix it yourself....
See this answer on creating a search engine - the keywords should be in a separate table with a N:1 relationship to your articles, primary key on keyword and article id, e.g.
CREATE TABLE article (
id INTEGER NOT NULL autoincrement,
modified TIMESTAMP,
content TEXT
...
PRIMARY KEY (id)
);
CREATE TABLE keyword (
word VARCHAR(20),
article_id INTEGER, /* references article.id
relevance FLOAT DEFAULT 0.5, /* allow users to record relevance of keyword to article*/
PRIMARY KEY (word, article_id)
);
CREATE TEMPORARY TABLE search (
word VARCHAR(20),
PRIMARY KEY (word)
);
Then split the words entered by the user, convert them to a consistent case (same as used for populating the keyword table) and populate the search table, then find matches using....
SELECT article.id, SUM(keyword.relevance)
FROM article, keyword, search
WHERE article.id=keyword.article_id
AND keyword.word=search.word
GROUP BY article_id
ORDER BY SUM(keyword.relevance) DESC
LIMIT 0,3
It'll be a lot more efficient if you can maintain a list of words or rules about words NOT to use as keywords (e.g. ignore any words of 3 chars or less in mixed or lower case will omit stuff like 'a', 'to', 'was', 'and', 'He'...).
Have a look at Sphinx and Lucene
I tried the MATCH() AGAINST() stuff, which works fine as long as the user types in the whole keyword.
what do you think that FULLTEXT means?
I had 40 000 entries in my table, using no indexes (local use) and it searched for maximally 0.1 sec with LIKE '%SOMETHING%'
You may LIMIT your queries output

Categories