What method should be used for searching strings in mysql data? - php

I've got a mysql dataset that contains 86 million rows.
I need to have a relatively fast search through this data.
The data I'll be searching through is all strings.
I also need to do partial matches.
Now, if I have 'foobar' and search for '%oob%' I know it'll be really slow - it has to look at every row to see if there is a match.
What methods can be used to speed up queries like this?

I don't think fulltext search will allow for partial matches (I could be wrong on this).
You might take a look at Sphinx Search which is a full-text and partial match search engine system. You can easily import your mysql data into it, and then use simple PHP queries to search the data. It is far more efficient than using MySQL to do the query.

What you seek is Full Text Search. This would enable you to do partial matches quickly and searches on multiple columns quickly.

Related

MongoDB search - autocomplete

We are currently running our app on MySQL and are planning to move to MongoDB. We have moved some parts already but having issues with MongoRegex performance.
We have an autocomplete search box that joins 6 tables (indexed / non-indexed fields) and returns results super fast on mysql. The same thing on MongoDB performs really slow. It takes about 2.3 seconds only on one collection. The user has to wait for a long time. The connection time is 0.064 secs. Query time 2.36 seconds. I did a bit of Googling and couldn't find a perfect answer. Everyone said MongoRegex is slow. If that's true how are other companies overcoming this problem?
What is best way to improve autocomplete performance / experience when running in on MongoDB?
First of all, you will have to design your query carefully. Carefully as in, selecting properly indexed fields and designing accordingly. Also if you are using regex make sure you are writing the regex in a way which forces the query to use an indexed field. Something like /^prefix/ will do. [ See this link : http://docs.mongodb.org/manual/reference/operator/query/regex/#index-use ]
I have seen many implementations using range query of mongodb, but Im not sure if thats the best one, since instantaneous results are a key thing.
Apart from that, I have seen some one who recommended prefix-trees. Which effectively stores the prefixes in a field, and then stores all the words starting with that particular prefix in the next fields as an array. This solution sounds convincing and fast, since the prefix fields are supposed to be indexed, but you will have to think about the storage factor also.
It is hard to tell as we don't have the search query, but worth mentioning that, after taking a look at the documentation, it appears that MongoDB is able to use an index when using RE search if the pattern is prefixed by a constant string only if it is anchored:
http://docs.mongodb.org/manual/reference/operator/query/regex/#index-use
So, to quote the doc:
A regular expression is a “prefix expression” if it starts with a caret (^) or a left anchor (\A), followed by a string of simple symbols. For example, the regex /^abc.*/ will be optimized by matching only against the values from the index that start with abc.
But /abc.*/ ou /^.*abc/ will not use the index.

CI & MySQL; using LIKE on a delimited string

Maybe LIKE isn't the proper solution for what I'm attempting to do; but what I'm looking to achieve is to take a string retrieved from the database. (i.e. tags [ tag1,tag2,tag3,tag4 ]) - then compare each value (delimited by the comma) and return like rows accordingly.
<?
$string = 'tag1,tag2,tag3,tag4';
$tags = explode(',', $string);
foreach ( $tags as $tag )
{
$this->db->limit(2); $this->db->like('Tags', $tag);
$result = $this->db->get('table');
$recommend[] = $result;
}
This is the best I could understand how to do what I'm looking for, but I'm quite confused on the LIKE method in general of CI's Active Record Class.
Is performing searches on delimited string in a database even really entirely wise?
Thanks for the input, and ideas you two. MySQL goes a bit over my head, but after reading a bit about the things possible with relational databases; I'm going to attempt to learn this process so I can build my database more efficient.
My database is going to be a catalog of video games and files, so obviously that means it's going to reach insane amounts of entries - so attempting to look through any sort of string for possible returned rows would be by far the dumbest thing I could do in the long run.
Again, thanks for the information and ideas.
If you have a properly normalized database, tags should be split up to a new table as this is a one-to-many (or maybe even a many-to-many) relationship.
However, the easiest way to achieve this is to also place a comma before the first tag, and after the last, and simply search for:
SELECT * FROM table WHERE Tags LIKE "%,coleslaw,%";
But yes, this is bad, and can be slow if your table grows as you will not be able to utilize indexes.
There is a FIELD command in MySQL that you might be able to take advantage of. You can read up on it here.
But in general, performing searches on delimited strings, or substrings in general, in a database will kill performance as you will lose any indexing when matching on the substring. If you have a thousand rows, MySQL will have to scan through all one thousand rows before discovering that there is no match. This applies to both Like and Substring appearing in a where clause. Many database implementations specifically provide a BEGINS WITH operator that can still take advantage of indexing, but I'm not aware of this in MySQL.
Consider instead breaking the string up during insertion, and storing each word seperately in a different table. If you find that you do this a lot, consider an engine built for text searching, such as Lucene. You can use PHP with Lucene through SOLR. While I throw that out there, it may very well be complete overkill for your project.
I know this probably isn't what you want to hear, but your gut is correct on this one.

Search engine for website

I'm trying to build a search engine for a website. It's mostly a collection of HTML/CSS pages with some PHP. Now that's all there is. All of my content in on the pages.
From what I understand to be able to do this I would need to have the content on a Database, am I correct?
If so I was considering doing as such, creating a MySQL table with four columns "Keywords" "Titles" "Content" and "Link".
Keywords - will hold the a word that if its in the query will show this as the most likely result.
Titles - after searching Keywords searches the titles produce the most relevant results
Content - should be a last resource for finding something as it will be messier I believe
Link - is just the link that belongs to the particular row.
I will be implementing it with PHP and MySQL, and it will be tiresome to put all the content, titles etc into a db. Is this a good method or should I be looking at something else?
Thanks.
---------------EDIT-------------------
Lucene seems like a good option, however even after reading the Getting started and looking around a bit on the web I cant understand how it works, can someone point me somewhere that explains this in a very very basic manner? Especially taking in consideration I do not know how to compile anything.
Thank you.
Building a search engine from scratch is painful. It is an interesting task, indeed, so if it is for learning, then do it!
However, if you just need a good search function for your web site, please use something that others have done for you. Apache Lucene is one option.
Sphinxsearch is an open-source full-text search server, designed from the ground up with performance, relevance (aka search quality), and integration simplicity in mind.
Sphinx lets you either batch index and search data stored in an SQL database, NoSQL storage, or just files quickly and easily — or index and search data on the fly, working with Sphinx pretty much as a database server.
I'm assuming your pages are static HTML. You can do two things at once and transfer the content of the pages in the DB, so that they will be generated on the fly by reading their content from the DB.
Anyway, I think your strategy is OK at least for a basic search engine. Also have a look into MySQL fulltext search.
MySQL fulltext search will be the easiest to setup but it will be a lot slower than Sphinxsearch. Even Lucene is slower than Sphinx. So if speed is a criteria, I would suggest taking time out to lean and implement Sphinx.
In one of his presentations, Andrew Aksyonoff (creator of Sphinx) presented the following
benchmarking results. Approximately 3.5 Million records with around 5 GB of text were used
for the purpose.
MySQL Lucene Sphinx
Indexing time, min 1627 176 84
Index size, MB 3011 6328 2850
Match all, ms/q 286 30 22
Match phrase, ms/q 3692 29 21
Match bool top-20, ms/q 24 29 13
Apart from a basic search, there are many features that make Sphinx a better solution for
searching. These features include multivalve attributes, tokenizing settings, wordforms,
HTML processing, geosearching, ranking, and many others
Zend Lucene is a pure PHP implementation of search which is quite useful.
Another search option is solr, which is based on lucene, but does a lot of the heavy lifting for you in order to produce more google like results. This is probably your easiest option, besides using Mysql MyISAM fulltext search capabilities.

best way to search data-base table for user input keywords?

I want to build a product-search engine.
I was thinking of using google-site-search but that really searches Google's index of your site. I do not want to search that. I want to search a specific table (all the fields, even ones the user never sees) on my data-base for given keywords.
But I want this search to be as robust as possible, I was wondering if there was something already out there I could use? if not whats the best way to go about making it myself?
You can try using Sphinx full-text search for MySQL.
Here's also a tutorial from IBM using PHP.
I'd focus on MySQL Full-Text search first. Take a look at these links:
http://dev.mysql.com/doc/refman/4.1/en/fulltext-search.html
http://dev.mysql.com/doc/refman/5.1/en/fulltext-boolean.html
Here is a snippet from the first link:
Full-text searching is performed using
MATCH() ... AGAINST syntax. MATCH()
takes a comma-separated list that
names the columns to be searched.
AGAINST takes a string to search for,
and an optional modifier that
indicates what type of search to
perform. The search string must be a
literal string, not a variable or a
column name. There are three types of
full-text searches:
As far as stuff that's already out there, take a look at these :
Search all tables (for SQL Server, but you could probably adapt it to MySQL)
Another search all tables (for SQL Server, but you could probably adapt it to MySQL)
Search all varchar columns in database
MySQL Full-Text Search
Using MySQL Full-Text Search
SELECT * FROM table WHERE value REGEXP 'searchterm'
Allows you to use many familiar search tricks such as +, "", etc
This is a native function of MySQL. No need to use go to a new language or plugin which might be faster, but is also extra time for maintenance, troubleshooting, etc.
It may be a little slower than doing some crazy C++ based mashup, but users don't generally notice a difference between milliseconds......
One thing you might also want to look into (if you're not going to utilize sphinx), is stemming your keywords. It will make matching keywords a bit easier (as stemming 'cheese' and 'cheesy' would end up producing the same stemmed word) which makes your keyword matching a bit more flexible.

Article search engine in php

I am using sphinx as a search engine on my website its working perfect and I have no complain with it. The only thing it lacks is, it does not allow me to search articles whose query length is more than 15 words. I know in reality people don't use more than 3-4 words i want to use it for finding duplicate contents.
I was wondering if there is any alternative solution to sphinx. I want to cope with duplicate contents.
My main articles table is in innodb but I am also caching articles into MyISAM table as well for full text searching but when I search an article it takes ages to perform one search. Its not the query problem, i think mysql lacks the fulltext searching facility.
Thanks
Jason
Apache Solr is an alternative. It's based on Apache's Lucene project...
you might want to check Lucene as well.
And since you're using MySQL, check it's full-text searching MySQL Full Text Searching
Check Zend_Search_Lucene as well: http://framework.zend.com/manual/en/zend.search.lucene.html
Though it's slower than sphinx.
Perhaps not helpful, but could you simply add a unique index to the MySQL field to prevent insertion of duplicates?
I have not come across any query length limitations in the Sphinx version I'm using (0.9.9), but maybe I have not tried hard enough.

Categories