Mysql Match escape email - php

I would like to search for an email address in the content field.
SELECT *
FROM message m
WHERE MATCH (m.content) AGAINST (
'robert.oppenheimer#gmail.com' IN BOOLEAN MODE
)
Well it finds all the messages where a 'robert' or an 'oppenheimer' is. But i would like it to behave like
SELECT *
FROM message m
WHERE m.content LIKE '%robert.oppenheimer#gmail.com%'
Any ideas?

This will give you exactly what you want:
SELECT *
FROM message m
WHERE MATCH (m.content) AGAINST (
'+"robert.oppenheimer#gmail.com"' IN BOOLEAN MODE
)

Combine them.
SELECT *
FROM message m
WHERE MATCH (m.content) AGAINST ('annie.testman#example.com')
AND m.content LIKE '%annie.testman#example.com%';
Many people realize that LIKE '%thing%' is almost always a bad construct, an antipattern even, since every row of the table will be scanned for matches and the query will perform badly (the trailing % prevents the use of an index for lookup)... but what many may not realize is that the MySQL optimizer, when faced with multiple AND conditions, all of which must be true, tries to decide which condition will involve the least amount of work.
Once that's been decided (the "query plan"), the server (ideally, at the storage engine layer) will evaluate that condition first -- no matter which order the conditions appear in the WHERE clause -- and will then filter the resulting rows by any remaining criteria.
Fulltext indexes are considered by the optimizer to be one of the best possible choices, so it's going to find the fulltext matches, first, and then remove anything that doesn't also match the LIKE, before returning results to you.
The only conditions that immediately come to mind that will typically trump a fulltext search, when it comes to which query plan the optimizer will choose, to process first, would be an indexed column with an equality comparison and no matching value, or an IMPOSSIBLE WHERE, such as WHERE 2 < 1, both of which of course would always return 0 rows.

Related

Hey! I'm looking for a MySQL/SQL function for searching strings by percentages . as like similar_text() in PHP [duplicate]

I've been experimenting with fulltext search lately and am curious about the meaning of the Score value. For example I have the following query:
SELECT table. * ,
MATCH (
col1, col2, col3
)
AGAINST (
'+(Term1) +(Term1)'
) AS Score
FROM table
WHERE MATCH (
col1, col2, col3
)
AGAINST (
'+(Term1) +(Term1)'
)
In the results for Score I've seen results, for one query, between 0.4667041301727 to 11.166275978088. I get that it's MySQLs idea of relevance (the higher the more weight).
What I don't get is how MySQL comes up with that score. Why is the number not returned as a decimal or something besides ?
How come if I run a query "IN BOOLEAN MODE" does the score always return a 1 or a 0 ? Wouldn't all the results be a 1?
Just hoping for some enlightenment. Thanks.
Take the query "word1 word2" as an example.
BOOLEAN mode indicates that your entire query matches the document (e.g. it contains both word1 AND word2). Boolean mode is a strict match.
The formula normally used is based on the Vector Space Model of searching. Very simplified, it figures out two measures to determine how important a word is to a query. The term frequency (terms that occur often in a document are more important than other terms) and the inverse document frequency (a term that occurs in many documents is weighted lower than a term that occurs in few documents). This is known as tf-idf, and is used as a basis for the vector space model. These scores form the basis for the Vector Space Model, which someone else can explain thoroughly. :)
Generally relevance is based on how many matches each row has to the words given to the search. The exact value will depend on many things, but it really only matters for comparing to other relevance values in the same query.
If you really want the math behind it, you can find it at the internals manual.

Why is MATCH against less effective than Exact match?

I've a around 20 rows in MySQL table with the its Title column as Elsewhere and with other different column paramters.
I'm currently using a query like this, since most of my searches (via a PHP file) require me to give a close guess. So I use FULLTEXT INDEX
SELECT * FROM `my_db` WHERE MATCH (`Title`) AGAINST ('Elsewhere' IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION) AND (Type ='movie' OR Type='series' OR Type IS NULL)
This works just fine, but sometimes, the most obviously of matches like this one gives me 0 rows. On the contrary, If I do something like :
SELECT * FROM `my_db` WHERE `Title` = "Elsewhere";
It gives me all the 20 rows.
Shouldn't the first case give more results compared to the last one for being less more specific?
Note: I'm using MATCH for a search. I do not want to perform an exact match everytime. Basically, a user input on the client side is being searched for on the DB.
It turns out that elsewhere is a Stopword for MyISAM Full-text Search: Indexes http://dev.mysql.com/doc/refman/5.6/en/fulltext-stopwords.html

Full Text searching functionality not working as expected

I've used full text search functionality, however it's not working up to the expectation.
I've following code in search.php file:
$kws = $_POST['kw'];
$kws = mysql_real_escape_string($kws);
$query = "SELECT * FROM products WHERE MATCH (product_name,brand,short_desc)
AGAINST ('*".$kws."*' IN BOOLEAN MODE) limit 14" ;
$res_old = mysql_query($query);
'kw' is something what I type in search box. Now for an example, if I search for 'Dove Intense', it places Dove Antihairfall on top because that's on top in database.
I understand I'm searching the full text functionality over two separate columns i.e. brand & product_name, this situation can occur. However is there anyway I can have it the other way round so that it actually ranks the search higher if it matches against both the columns. Basically what user types in, I need that thing ranks higher in search result.
Anyone can give some idea how to achieve that?
You (or others) may still be interested in the answer:
From Mysql documentation for match against using boolean mode :
"They do not automatically sort rows in order of decreasing relevance. You can see this from the preceding query result:..." CheckThis
You should use boolean operations to achieve what you expect, so a search for 'Dove Intense' will return rows that contain at least one of the two words (ie. rows with Dove only + rows with Intense only + rows with both dove, intense (in any order) ), simply because having no operation between the two words indicates an OR operation !
This may not be the result you expect, since you may be interested to make an "and" operation between the two words, but what about the order?
If you don't mind the order (ie. any row containing both words will be included in the results ex. 'Intense whatever whatever dove ...') this means that you should match against:
'+Dove +Intense'
in this search, rows containing only one of the two worlds will not be included in result.
if you are trying to implement a strict search ie. only rows containing this phrase "Dove Intense" you should match against '"Dove Intense"'
Now about ranking:
If you want to obtain all results having at least "Dove" but rank rows higher if they also contain "Intense", then you should match against '+Dove Intense'
hope this was useful for what you are trying to implement.

Searching a big mysql database with relevance

I'm building a rather large "search" engine for our company intranet, it has 1miljon plus entries
it's running on a rather fast server and yet it takes up to 1 min for some search queries.
This is how the table looks
I tried create an index for it, but it seems as if i'm missing something, this is how the show index is showing
and this is the query itself, it is the ordering that slows the query mostly but even a query without the sorting is somewhat slow.
SELECT SQL_CALC_FOUND_ROWS *
FROM `businessunit`
INNER JOIN `businessunit-postaddress` ON `businessunit`.`Id` = `businessunit-postaddress`.`BusinessUnit`
WHERE `businessunit`.`Name` LIKE 'tanto%'
ORDER BY `businessunit`.`Premium` DESC ,
CASE WHEN `businessunit`.`Name` = 'tanto'
THEN 0
WHEN `businessunit`.`Name` LIKE 'tanto %'
THEN 1
WHEN `businessunit`.`Name` LIKE 'tanto%'
THEN 2
ELSE 3
END , `businessunit`.`Name`
LIMIT 0 , 30
any help is very much appreciated
Edit:
What's choking this query 99% is ordering by relevance with the wildcharacter %
When i Do an explain it says using where; using fsort
You should try sphinx search solution which is full-text search engine will give you very good performance along with lots of options to set relevancy.
Click here for more details.
Seems like the index doesn't cover Premium, yet that is the first ORDER BY argument.
Use EXPLAIN your query here to figure out the query plan and change your index to remove any table scans as explained in http://dev.mysql.com/doc/refman/5.0/en/using-explain.html
MySQL is good for storing data but not great when it comes down to fast text based search.
Apart from Sphinx which has been already suggested I recommend two fantastic search engines:
Solr with http://pecl.php.net/package/solr - very popular search engine. Used on massive services like NetFlix.
Elastic Search - relatively new software but with very active community and lots of respect
Both solution are based on the same library Apache Lucene
If the "ORDER BY" is really the bottleneck, the straight-forward solution would be to remove the "ORDER BY" logic from your query, and re-implement the sorting directly in your application's code using C# sorting. Unfortunately, this means you'd also have to move your pagination into your application, since you'd need to obtain the complete result set before you can sort & paginate it. I'm just mentioning this because no-one else so far appears to have thought of it.
Frankly (like others have pointed out), the query you showed at the top should not need full-text indexing. A single suffix wildcard (e.g., LIKE 'ABC%') should be very effective as long as a BTREE (and not a HASH) index is available on the column in question.
And, personally, I have no aversion to even double-wildcard (e.g., LIKE '%ABC%"), which of course can never make use of indexes, as long as a full table scan is cheap. Probably 250,000 rows is the point where I'll start to seriously consider full-text indexing. 100,000 is definitely no problem.
I always make sure my SELECT's are dirty-reads, though (no transactionality applied to the select).
It's dirty once it gets to the user's eyeballs in any case!
Most of the search engine oriended sites are use FULL-TEXT-SEARCH.
It will be very faster compare to select and LIKE...
I have added one example and some links ...
I think it will be useful for you...
In this full text search have some conditions also...
STEP:1
CREATE TABLE articles (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
title VARCHAR(200),
body TEXT,
FULLTEXT (title,body)
);
STEP:2
INSERT INTO articles (title,body) VALUES
('MySQL Tutorial','DBMS stands for DataBase ...'),
('How To Use MySQL Well','After you went through a ...'),
('Optimizing MySQL','In this tutorial we will show ...'),
('1001 MySQL Tricks','1. Never run mysqld as root. 2. ...'),
('MySQL vs. YourSQL','In the following database comparison ...'),
('MySQL Security','When configured properly, MySQL ...');
STEP:3
Natural Language Full-Text Searches:
SELECT * FROM articles
WHERE MATCH (title,body) AGAINST ('database');
Boolean Full-Text Searches
SELECT * FROM articles WHERE MATCH (title,body)
AGAINST ('+MySQL -YourSQL' IN BOOLEAN MODE);
Go through this links
viralpatel.net,devzone.zend.com,sqlmag.com,colorado.edu,en.wikipedia.org
It's so strange query :)
Let's try to understand what it does.
The results are less than 30 rows from the table "businessunit" with some conditions.
The first condition is a foreign key of the "businessunit-postaddress" table.
Please check if you have an index on the column businessunit-postaddress.BusinessUnit.
The second one is a filter for returning rows only with businessunit.Name begining with 'tanto'.
If I didn't make a mistake you have a very complex index 'Business' consists of 11 fields!
And field 'Name' is not the first field in this index.
So this index is useless when you run "like tanto%"'s query.
I have strong doubt about necessity of this index at all.
By the way it demands quite big resources for its maintaining and slow down edit operations with this table.
You have to make an index with the only field 'Name'.
After filtering the query is sorting results and do it in some strange way too.
At first it sorts by field businessunit.Premium - it's normal.
However next statements with CASE are useless too.
That's why.
The zero are assigned to Name = 'tanto' (exactly).
The next rows with the one are rows with space after 'tanto' - these will be after 'tanto' in any cases (except special symbols) cause space is lower than any letter.
The next rows with the two are rows with some letters after 'tanto' (include space!). These rows will be in this order too by definition.
And the three is "reserved" for "other" rows but you won't get "other" rows - remeber about [WHERE businessunit.Name LIKE 'tanto%'] condition.
So this part of ORDER BY is meaningless.
And at the end of ORDER BY there is businessunit.Name again...
My advice: you need rebuild the query from scratch keeping in mind what you want to get.
Anyway I guess you can use
SELECT SQL_CALC_FOUND_ROWS *
FROM `businessunit`
INNER JOIN `businessunit-postaddress` ON `businessunit`.`Id` = `businessunit-postaddress`.`BusinessUnit`
WHERE `businessunit`.`Name` LIKE 'tanto%'
ORDER BY `businessunit`.`Premium` DESC,
`businessunit`.`Name`
LIMIT 0 , 30
Don't forget about an index on field businessunit-postaddress.BusinessUnit!
And I have strong assumption about field Premium.
I guess it is designed for storing binary data (yes/no).
So an ordinary (BTREE) index doesn't match.
You have to use bitmap index.
P.S. I'm not sure that you really need to use SQL_CALC_FOUND_ROWS
MySQL: Pagination - SQL_CALC_FOUND_ROWS vs COUNT()-Query
Its either full-text(http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html) or the pattern matching (http://dev.mysql.com/doc/refman/5.0/en/pattern-matching.html) from php and mysql side.
From experience and theory:
Advantages of full-text -
1) Results are very relevant and de-limit characters like spacing in the search query does not hinder the search.
Disadvantages of full-text -
1) There are stopwords that are used as restrictions by webhosters to prevent excess load of data.(E.g. search results containing the word 'one' or 'moz' are not displayed. And this can be avoided if you're running your own server by keeping no stopwords.
2) If I type 'ree' it only displays words containing exactly 'ree' not 'three' or 'reed'.
Advantages of pattern matching -
1) It does not have any stopwords as in full-text and if you search for 'ree', it displays any word containing 'ree' like 'reed' or 'three' unlike fulltext where only the exact word is retreived.
Disadvantages of pattern matching-
1) If delimiters like spaces are used in your search words and if these spaces are not there in the results, because each word is separate from any delimiters, then it returns no result.
If the argument of LIKE doesn't begin with a wildchard character, like in your example, LIKE operator should be able to take advantage of indexes.
In this case, LIKE operator should perform better than LOCATE or LEFT, so I suspect that changing the condition like this could make things worse, but I still think it's worth trying (who knows?):
WHERE LOCATE('tanto', `businessunit`.`Name`)=1
or:
WHERE LEFT(`businessunit`.`Name`,5)='tanto'
I would also change your order by clause:
ORDER BY
`businessunit`.`Premium` DESC ,
CASE WHEN `businessunit`.`Name` LIKE 'tanto %' THEN 1
WHEN `businessunit`.`Name` = 'tanto' THEN 0
ELSE 2 END,
`businessunit`.`Name`
Name has to be LIKE 'tanto%' already, so you can skip a condition (CASE will never return value 3). Of course, make sure that Premium field is indexed.
Hope this helps.
I think you need to collect the keys only, sort them, then join last
SELECT A.*,B.* FROM
(
SELECT * FROM (
SELECT id BusinessUnit,Premium
CASE
WHEN Name = 'tanto' THEN 0
WHEN Name LIKE 'tanto %' THEN 1
WHEN Name LIKE 'tanto%' THEN 2
ELSE 3
END SortOrder
FROM businessunit Name LIKE 'tanto%'
) AA ORDER BY Premium,SortOrder LIMIT 0,30
) A LEFT JOIN `businessunit-postaddress` B USING (BusinessUnit);
This will still generate a filesort.
You may want to consider preloading the needed keys in a separate table you can index.
CREATE TABLE BusinessKeys
(
id int not null auto_increment,
BusinessUnit int not null,
Premium int not null,
SortOrder int not null,
PRIMARY KEY (id),
KEY OrderIndex (Premuim,SortOrder,BusinessUnit)
);
Populate all keys that match
INSERT INTO BusinessKeys (BusinessUnit,Premuim,SortOrder)
SELECT id,Premium
CASE
WHEN Name = 'tanto' THEN 0
WHEN Name LIKE 'tanto %' THEN 1
WHEN Name LIKE 'tanto%' THEN 2
ELSE 3
END
FROM businessunit Name LIKE 'tanto%';
Then, to paginate, run LIMIT on the BusinessKeys only
SELECT A.*,B.*
FROM
(
SELECT FROM BusinessKeys
ORDER BY Premium,SortOrder
LIMIT 0,30
) BK
LEFT JOIN businessunit A ON BK.BusinessUnit = A.id
LEFT JOIN `businessunit-postaddress` B ON A.BusinessUnit = B.BusinessUnit
;
CAVEAT : I use LEFT JOIN instead of INNER JOIN because LEFT JOIN preserves the order of the keys from the left side of the query.
I've read the answer to use Sphinx to optimize the search. But regarding my experience I would advise a different solution. We used Sphinx for some years and had a few nasty problems with segmentation faults and corrupted indice. Perhaps Sphinx isn't as buggy as a few years before, but for a year now we are very happy with a different solution:
http://www.elasticsearch.org/
The great benefits:
Scalability - you can simply add another server with nearly zero configuration. If you know mysql replication, you'll love this feature
Speed - Even under heavy load you get good results in much less than a second
Easy to learn - Only by knowing HTTP and JSON you can use it. If you are a Web-Developer, you feel like home
Easy to install - it is useable without touching the configuration. You just need simple Java (no Tomcat or whatever) and a Firewall to block direct access from the public
Good Javascript integration - even a phpMyAdmin-like Tool is a simple HTML-Page using Javascript: https://github.com/mobz/elasticsearch-head
Good PHP Integration with https://github.com/ruflin/Elastica
Good community support
Good documentation (it is not eye friendly, but it covers nearly every function!)
If you need an additional storage solution, you can easily combine the search engine with http://couchdb.apache.org/

How can I search for multiple terms in multiple table columns?

I have a table that lists people and all their contact info. I want for users to be able to perform an intelligent search on the table by simply typing in some stuff and getting back results where each term they entered matches at least one of the columns in the table. To start I have made a query like
SELECT * FROM contacts WHERE
firstname LIKE '%Bob%'
OR lastname LIKE '%Bob%'
OR phone LIKE '%Bob%' OR
...
But now I realize that that will completely fail on something as simple as 'Bob Jenkins' because it is not smart enough to search for the first an last name separately. What I need to do is split up the the search terms and search for them individually and then intersect the results from each term somehow. At least that seems like the solution to me. But what is the best way to go about it?
I have heard about fulltext and MATCH()...AGAINST() but that sounds like a rather fuzzy search and I don't know how much work it is to set up. I would like precise yes or no results with reasonable performance. The search needs to be done on about 20 columns by 120,000 rows. Hopefully users wouldn't type in more than two or three terms.
Oh sorry, I forgot to mention I am using MySQL (and PHP).
I just figured out fulltext search and it is a cool option to consider (is there a way to adjust how strict it is? LIMIT would just chop of the results regardless of how well it matched). But this requires a fulltext index and my website is using a view and you can't index a view right? So...
I would suggest using MATCH / AGAINST. Full-text searches are more advanced searches, more like Google's, less elementary.
It can match across multiple tables and rank them to how many matches they have.
Otherwise, if the word is there at all, esp. across multiple tables, you have no ranking. You can do ranking server-side, but that is going to take more programming/time.
Depending on what database you're using, the ability to do cross columns can become more or less difficult. You probably don't want to do 20 JOINs as that will be a very slow query.
There are also engines such as Sphinx and Lucene dedicated to do these types of searches.
BOOLEAN MODE
SELECT * FROM contacts WHERE
MATCH(firstname,lastname,email,webpage,country,city,street...)
AGAINST('+bob +jenkins' IN BOOLEAN MODE)
Boolean mode is very powerful. It might even fulfil all my needs. I will have to do some testing. By placing + in front of the search terms those terms become required. (The row must match 'bob' AND 'jenkins' instead of 'bob' OR 'jenkins'). This mode even works on non-indexed columns, and thus I can use it on a view although it will be slower (that is what I need to test). One final problem I had was that it wasn't matching partial search terms, so 'bob' wouldn't find 'bobby' for example. The usual % wildcard doesn't work, instead you use an asterisk *.

Categories