I have a site (built in php) where I have roughly a million records in a mysql table. There is a very complex "advanced" search which allows the data to be searched, sorted, and ordered in hundreds if not thousands of various ways.
Unfortunately, mysql search isn't that good, and is extremely slow. Average search takes 5 seconds currently, and the only way I can make the site function is by caching all the searches for a week (there are over 1.1 million cache files just for searches currently). I have "ghetto fuzzy search" which I implement via the soundex() function.
I wanted to see what I can do about replacing the mysql based search with something a bit faster, and something that would return accurate results. I also need the output to be totally skinnable, as the results page isn't just text, but pictures, and complex css.
I looked at sphinx, but there is no fuzzy search there, which I'd very much like to have.
SQL isn't designed for the kinds of queries you are performing, certainly not at scale. My recommendation would be as follows:
Index all your tables into a Apache Solr Index. Solr is a FOSS search server built using Lucene and can easily query index in milliseconds or less. It supports pretty much any type of query you need: fuzzy, wildcard, proximity etc.
Since Solr offers a RESTful (HTTP) API should easily integrate into whatever platform you are on.
Related
I want code not cause load on server to find similar posts in php/mysql
I try with
MATCH (post) AGAINST ('string string')
but it was cause alot of load on server, so it stop my server for
I have over 4,125,274 post in my database
please help mE
While Fulltext index will help, it will be still really slow if you want to load similar items many times. We have an implementation which has about 7 million records of posts with fulltext and it takes maybe up to a minute to search if we only rely on mysql.
A good alternative is having a search server like sphinx http://sphinxsearch.com/ which creates its own indexing and caching and is much much faster.
It is simple and efficient and is used by many big places like urbandictionary, craiglist, mozilla, etc.
If you want to do it in only mysql queries, and if you don't want to do one search many times, try caching the returned IDs on memcached.
Supposing you already have a fulltext index on post and this doesn't help you should consider incorporating a dedicated search engine on your posts, such as Lucene (not necessarily the php implementation though)
I'm making a social website with lots of different sections, like blogs, galleries, multimedia etc. And now the time has come to implement the search functionality. Customer refused to use google search and insisted on making custom one, where results will be shown for each section individually.
For example, if user enters 'art', the result should be displayed like this:
3 found in blogs
1 ...
2 ...
3 ...
2 found in galleries
1 ...
2 ...
None found in multimedia
I'm planning to use MySQL fulltext search for this. So, the question is: How do I make such search, so it won't kill the server if very many records match the query? I don't really see how to implement paging in this case.
I would highly recommend NOT using MySQL for full text search, it is slow both in index creation and in performing searches.
Take a look at Sphinx or Lucene, both of which are significantly faster than MySQL and which bind quite readily to PHP applications.
You wont kill a mysql server with such a thing, even if your app is huge (we are talking about thousands of queries/sec here) you will just have to set up a replicate of your mysql server dedicaced to search, you may want to build a cache of "popular keyword results" for speeding things up a bit, but appliances likes a googlemini is still the best for that ...
If you can run a Java servlet container (like Tomcat or Jetty), then I recommend Solr (http://lucene.apache.org/solr/). It sits on top of Lucene and is very powerful. Solr was started at CNET and is used by big sites like Netflix and Zappos. Stack Overflow uses a .NET implementation of Lucene. I'm not familiar with Sphinx, so I can't tell you how it compares to Solr.
If you use Solr, look into faceting. This allows you to perform a search and then get a count of how many documents were in "blogs", "galleries", "multimedia', etc.
Here is a PHP client to interface with Solr (http://code.google.com/p/solr-php-client/).
Maybe better decision is use - sphinx
I've done this before on some sites I created. What I have done is run one query against each module to find the results. What you want to do is run a mysql query, and then fetch rows in a while loop rather than using a fetch all. This will make sure you don't over consume memory.
for example:
while($row = mysql_fetch_array($result)){ echo $row['item_name']; }
You will most likely find that MySQL can handle much larger searches than you think.
Pagination is best done with a paging class, like one from code igniter or the like.
Are you using a web frame work?
Yes Sphinx or Lucene, both are good and significantly faster than MySQL and which bind quite readily to PHP applications.
I'm trying to build a search engine for a website. It's mostly a collection of HTML/CSS pages with some PHP. Now that's all there is. All of my content in on the pages.
From what I understand to be able to do this I would need to have the content on a Database, am I correct?
If so I was considering doing as such, creating a MySQL table with four columns "Keywords" "Titles" "Content" and "Link".
Keywords - will hold the a word that if its in the query will show this as the most likely result.
Titles - after searching Keywords searches the titles produce the most relevant results
Content - should be a last resource for finding something as it will be messier I believe
Link - is just the link that belongs to the particular row.
I will be implementing it with PHP and MySQL, and it will be tiresome to put all the content, titles etc into a db. Is this a good method or should I be looking at something else?
Thanks.
---------------EDIT-------------------
Lucene seems like a good option, however even after reading the Getting started and looking around a bit on the web I cant understand how it works, can someone point me somewhere that explains this in a very very basic manner? Especially taking in consideration I do not know how to compile anything.
Thank you.
Building a search engine from scratch is painful. It is an interesting task, indeed, so if it is for learning, then do it!
However, if you just need a good search function for your web site, please use something that others have done for you. Apache Lucene is one option.
Sphinxsearch is an open-source full-text search server, designed from the ground up with performance, relevance (aka search quality), and integration simplicity in mind.
Sphinx lets you either batch index and search data stored in an SQL database, NoSQL storage, or just files quickly and easily — or index and search data on the fly, working with Sphinx pretty much as a database server.
I'm assuming your pages are static HTML. You can do two things at once and transfer the content of the pages in the DB, so that they will be generated on the fly by reading their content from the DB.
Anyway, I think your strategy is OK at least for a basic search engine. Also have a look into MySQL fulltext search.
MySQL fulltext search will be the easiest to setup but it will be a lot slower than Sphinxsearch. Even Lucene is slower than Sphinx. So if speed is a criteria, I would suggest taking time out to lean and implement Sphinx.
In one of his presentations, Andrew Aksyonoff (creator of Sphinx) presented the following
benchmarking results. Approximately 3.5 Million records with around 5 GB of text were used
for the purpose.
MySQL Lucene Sphinx
Indexing time, min 1627 176 84
Index size, MB 3011 6328 2850
Match all, ms/q 286 30 22
Match phrase, ms/q 3692 29 21
Match bool top-20, ms/q 24 29 13
Apart from a basic search, there are many features that make Sphinx a better solution for
searching. These features include multivalve attributes, tokenizing settings, wordforms,
HTML processing, geosearching, ranking, and many others
Zend Lucene is a pure PHP implementation of search which is quite useful.
Another search option is solr, which is based on lucene, but does a lot of the heavy lifting for you in order to produce more google like results. This is probably your easiest option, besides using Mysql MyISAM fulltext search capabilities.
I have a location search website for a city, we started out with collecting data for all possible categories in the city like Schools, Colleges, Departmental Stores etc and stored their information in a separate table, as each entry had different details apart from their name, address and phone number.
We had to integrate search in the website to enable people to find information, so we built an index table where in we stored the categories and related keywords for the same category and the table which much be fetched if that category was searched for. Later on we added the functionality of searching on the name and address as well by adding another master table containing those fields from all the tables to one place. Now my doubt is the following
The application design is improper, and we have written queries like select * from master where name like "%$input%" , all over, since our database is MYSQL and PHP on serverside, is there any suggestion for me to improve on the design of the system?
People want more features like splitting the keywords and ranking them according to relevance etc, is there any ready framework available which runs search on a database.
I tried using Full Text Search in MYSQL and it seems effective to me, is that enough?
Correct me if i am wrong, i had a look into Lucene and Google Custom Search, don't they work on making an index by crawling existing webpages and building their own index? I have a collection of tables on a mysql database on which i have to apply searching. What options do i have?
To address your points:
Using %input% is very bad. That will cause a full table scan every query. Under any amount of load or on even a remotely large dataset your DB server will choke.
An RDBMS alone is not a good solution for this. You are looking in the right place by seeking a separate solution for search. Something which can communicate well with your RDBMS is good; something that runs inside an RDBMS won't do what you need.
Full Text Search in MySQL is workable for very basic keyword searches, nothing more. The scope of usefulness is extremely limited - you need a highly predictable usage model to leverage the built-in searching. It is called "search" but it's not really search the way most people think of it. Compared to the quality of search results we have come to expect from Google and Bing, it does not compare. In that sense of the word "search", it is something else - like Notepad vs Word. They both are things to type in, but that's about it.
As far as separate systems for handling search, Lucene is very good. Lucene works however you want it to work, essentially. You can interact with it programatically to insert indexable documents. Likewise, a Google Appliance (not Google Custom Search) can be given direct meta feeds which expose whatever you want to be indexed, such as data directly from a database.
Take a look at sphinx: http://www.sphinxsearch.com/
Per their site:
How do you implement full-text search for that 10+ million row table, keep up with the load, and stay relevant? Sphinx is good at those kinds of riddles.
It's quite popular with a lot of people in the rails community right now, and they all rave about how awesome it is :)
OK I have a mySQL Database that looks something like this
ID - an int and the unique ID of the recorded
Title - The name of the item
Description - The items description
I want to search both title and description of key words, currently I'm using.
SELECT * From ‘item’ where title LIKE %key%
And this works and as there’s not much in the database, as however searching for “this key” doesn’t find “this that key” I want to improve the search engine of the site, and may be even add some kind of ranking system to it (but that’s a long time away).
So to the question, I’ve heard about something called “Full text search” it is (as far as I can tell) a staple of database design, but being a Newby to this subject I know nothing about it so…
1) Do you think it would be useful?
And an additional questron…
2) What can I read about database design / search engine design that will point me in the right direction.
If it’s of relevance the site is currently written in stright PHP (I.E. without a framework) (thro the thought of converting it to Ruby on Rails has crossed my mind)
update
Thanks all, I'll go for Fulltext search.
And for any one finding this later, I found a good tutorial on fulltext search as well.
The problem with the '%keyword%' type search is that there is no way to efficiently search on it in a regular table, even if you create an index on that column. Think about how you would look that string up in the phone book. There is actually no way to optimize it - you have to scan the entire phone book - and that is what MySQL does, a full table scan.
If you change that search to 'keyword%' and use an index, you can get very fast searching. It sounds like this is not what you want, though.
So with that in mind, I have used fulltext indexing/searching quite a bit, and here are a few pros and cons:
Pros
Very fast
Returns results sorted by relevance (by default, although you can use any sorting)
Stop words can be used.
Cons
Only works with MyISAM tables
Words that are too short are ignored (default minimum is 4 letters)
Requires different SQL in where clause, so you will need to modify existing queries.
Does not match partial strings (for example, 'word' does not match 'keyword', only 'word')
Here is some good documentation on full-text searching.
Another option is to use a searching system such as Sphinx. It can be extremely fast and flexible. It is optimized for searching and integrates well with MySQL.
You might also consider Zend_Lucene. It's slightly easier to integrate than Sphinx, because it is pure PHP.
I would guess that MySQL fulltext is sufficient for your needs, but it's worth noting that the built in support doesn't scale very well. For average size documents it starts to become unusable for table sizes as small as a few hundred thousand rows. If you think that this might become a problem further on you should probably look into Sphinx already. It's becoming the defacto standard for MYSQL-users, even though I personally prefer to implement my own solution using java lucene. :)
Also, I'd like to mention that full text search is fundamentally different from the standard LIKE '%keyword%'-search. Unlike the LIKE-search full text indexing allows you to search for several keywords that doesn't have to appear right next to each other. Standard search engines such as google are full text search engines, for example.