I am building a search feature for the messages part of my site, and have a messages database with a little over 9,000,000 rows, and and index on the sender, subject, and message fields. I was hoping to use the LIKE mysql clause in my query, such as (ex)
SELECT sender, subject, message FROM Messages WHERE message LIKE '%EXAMPLE_QUERY%';
to retrieve results. unfortunately, MySQL doesn't use indexes when a leading wildcard is present , and this is necessary for the search query could appear anywhere in the message (this is how the wildcards work, no?). Queries are very very slow and I cannot use a full text index either, because of the annoying 50% rule (I just can't afford to rule that much out). Is there anyway (or even, any alternative to this) to optimize a query using like and two wildcards? Any help is appreciated.
You should either use full-text indexes (you said you can't), design a full-text search by yourself or offload the search from MySQL and use Sphinx/Lucene. For Lucene you can use Zend_Search_Lucene implementation from Zend Framework or use Solr.
Normal indexes in MySQL are B+Trees, and they can't be used if the starting of the string is not known (and this is the case when you have wildcard in the beginning)
Another option is to implement search on your own, using reference table. Split text in words and create table that contains word, record_id. Then in the search you split the query in words and search for each of the words in the reference table. In this way you are not limitting yourself to the beginning of the whole text, but only to the beginning of the given word (and you'll match the rest of the words anyway)
'%EXAMPLE_QUERY%'; is a very very bad idea .. am going to give you some
A. Avoid wildcards at the start of LIKE queries use 'EXAMPLE_QUERY%'; instead
B. Create Keywords where you can easily use MATCH
If you want to stick with using MySQL, you should use FULL TEXT indexes. Full text indexes index words in a text block. You can then search on word stems and return the results in order of relevance. So you can find the word "example" within a block of text, but you still can't search efficiently on "xampl" to find "example".
MySQL's full text search is not great, but it is functional.
http://dev.mysql.com/doc/refman/5.1/en/fulltext-search.html
select * from emp where ename like '%e';
gives emp_name that ends with letter e.
select * from emp where ename like 'A%';
gives emp_name that begins with letter a.
select * from emp where ename like '_a%';
gives emp_name in which second letter is a.
Related
I have a query,
e.g.
name column have "Rodrigue Dattatray Desilva".
I want to write a query in such a way that,
If I search for 'gtl' and match anywhere in string it should show the result.
I know in PHP I can apply the patch like '%g%t%l%'.
But I want to know MySql way.
Note: I can search for anything, I am just giving above an example.
EDIT:
create table Test(id integer, title varchar(100));
insert into Test(id, title) values(1, "Rodrigue Dattatray Desilva");
select * from Test where title like '%g%t%l%';
Consider the above case. Where "gtl" is string I am trying to search in the title but search string can be anything.
gtl is string where it exists in the current title but not in sequence.
The easy answer is that you need an extra wildcard:
select * from Test where title like '%g%t%l%';
The query you posted does not have a wild card after the 'l', so would only match if the phrase ended with 'l'.
The more complicated answer is that you can also use regular expressions, which give you more power over the search.
The even more complicated answer is that performance of these string matching queries tends to be poor - the wild cards mean that indexes are usually ineffective. If you have a large number of rows in your table, full-text searching is much faster.
You can do the same in Mysql too.
You can use the keyword like in MySql.
% - The percent sign represents zero, one, or multiple characters
_ - The underscore represents a single character
I want to query a table as follows:
I have a field called "category" and my input match contains N separate words. I want the query to match all rows that contain all N words, but in any order.
For example if the field category contains "hello good morning world", my input query can contain "hello morning" or "good" or "world hello" and all are matches to the query.
How do I formulate such an SQL expression?
Also it would be good if the query can be made case insensitive.
If you are using MySQL you can use the boolean fulltext search feature to achieve this. You can put a + in front of each term and then only results with all the terms, in any order, will be returned. You will need to make sure the column containing the category field has a fulltext index specified on it for this to work. Other database engines probably have similar features. So for example you might do something like the following assuming there were a fulltext index over the category column...
SELECT * FROM myTable WHERE MATCH (category) AGAINST ('+term1 +term2 +term3' IN BOOLEAN MODE);
I would avoid using the "LIKE" operator as others have suggested you would have to worry about the headache of mixed upper/lower case and if you have a large database using a % in the front of a LIKE search term is going to cause a full table scan instead of using an index which is horrible for performance.
I'm not writing the loop that will build this query for you. This will get the job done, but it will be pretty inefficient.
SELECT * FROM table
WHERE (
TOUPPER(category) LIKE '*HELLO*' AND
TOUPPER(category) LIKE '*GOOD*' AND
TOUPPER(category) LIKE '*MORNING*' AND
TOUPPER(category) LIKE '*WORLD*'
);
You could also research using REGEXes with SQL.
I have a PHP interface with a keyword search, working off a DB(MySQL) which has a Keywords field.
The way in which the keywords field is set up is as follows, it is a varchar with all the words formatted as shown below...
the, there, theyre, their, thermal etc...
if i want to just return the exact word 'the' from the search how would this be achieved?
I have tried using 'the%' and '%the' in the PHP and it fails to work by not returning all of the rows where the keyword appears in.
is there a better (more accurate) way to go about this?
Thanks
If you want to select the rows that have exactly the keyword the:
SELECT * FROM table WHERE keyword='the'
If you want to select the rows that have the keyword the anywhere in them:
SELECT * FROM table WHERE keyword LIKE '%the%'
If you want to select the rows that start with the keyword the:
SELECT * FROM table WHERE keyword LIKE 'the%'
If you want to select the rows that end with the keyword the:
SELECT * FROM table WHERE keyword LIKE '%the'
Try this
SELECT * FROM tablename
WHERE fieldname REGEXP '[[:<:]]test[[:>:]]'
[[:<:]] and [[:>:]] are markers for word boundaries.
MySQL Regular Expressions
if you also search for the commas, you can be sure you are getting the whole word.
where keywordField like '%, the, %'
or keywordField like '%, the'
or keywordField like 'the, %'
maybe I didn't understand the question properly... but If you want all the words where 'the' appears, a LIKE '%word%' should work.
If the DB of words is HUGE MySQL may fail to retrieve some of the words, that can be solved in 2 ways...
1- get a DB that support bigger sizes (not many ppl would chose this one tho). For example SQL Server has a 'CONTAINS' function that works better than LIKE '%word%'.
2- use a external search tool that uses inverted index search. I used Sphinx for a project and it works quite good. This is better if you rarely UPDATE the rows of the data you want to search from, which should be the case.
Sphinx for example would generate a file from your MySQL table and use this file to solve the search (it's very fast), this file should be re-indexed everytime you do a insert or update on the table, making it a much better solution if you rarely update or insert new rows.
It looks like you have a one to many relationship going on within a column. It might be better to create a separate table for keywords with a row for each keyword and a foreign key to whatever it is you're searching on.
Doing like '%???%' is generally a bad idea because the DB can't make use of an index so it will scan the whole table. Whether this matters will depend on the size of data you're working with but its worth considering up front. The single best way to help DB performance is in the initial table design. This can be tricky to change later.
I'm looking for a way to compare database values against a given query using MySql
(in oppose of searching the query in the db)
I will elaborate:
I have a table that holds comma separated keywords and a result for each block of keywords
for example :
col 1 col 2
Mercedes,BMW,Subaru car
Marlboro,Winston cigarette
today im taking the user query (for example - Marlboroligt)
as you can see if i will search for the value 'Marlboroligt' in the db i will get no results.
for that matter i want to search 'Marlboro' from the db inside the query and return 'cigarette'
Hope my explanation is sufficient and that this is actually possible :-).
Normalization will help you the most. Otherwise, search in comma delimited fields is discussed here:
Select all where field contains string separated by comma
...and to search 'Marlboroligt' instead of 'Marlboro Light', you can try looking into the LEVENSHTEIN function, or maybe the Soundex encoding (which looks like too little bang for too large a buck, but then again, maybe...).
I see the following possible solutions:
setup a keyword-search engine like Sphinx and use it to search keywords in your db
normalize your db - col1 must contain the only keyword
use like patterns
select col2 from mytable where col1 like "%Marlboro%"
like slows down your application and can have substring-related issues.
i am using php and mysql...
i have application in which user enter any text and i want to fiind related data from database without using "LIKE" cause in my mysql query.
is there any possible way to search these string in database.
or any approach in mysql to do this....
Thanks in advance.
You can also check out MATCH clause.
You can use REGEXP, when user put single word you put WHERE field REGEXP '.*TEXT.*' in your query, regex is cool because you can allow user to put regular expression in search field.
If you don't want to use LIKE, and don't give a reason why (it seems fine for everyone else) then here is a solution that gets you araound it. (But it might not be the best real-world option...)
Whenever anything is added to the database that you want to be searched, take each word and break it into every possible combination of 1 or more consecutive letters.
E.g. for stack:
s, t, a, c, k, st, ta, ac, ck, sta, tac, ack, stac, tack, stack
Insert each of these into a table with an identifier that links to the original data.
Then you can match any search query against this list of words eactly (for full and partial matches). If your user is searching for multiple keywords, you split them in the front and and search for each, looking for matches to the same identifier.