I am building a system that has database operations that has millions of records.I am using Zend Framework in all part of my project.I wanted to use a search indexing technique but have you got any advice on this?which technique should i use?
Thanks in advance
Zend Lucene absolutely unrelevant for "millions of records".
Try to use sphinx http://sphinxsearch.com/docs/manual-1.10.html.
It has many usefull fratures, including clasterization to many servers; smart, customizable result ranking and much more. And it is really fast.
PHP API docs: http://www.php.net/manual/en/book.sphinx.php
There is C-version of PHP API http://pecl.php.net/package/sphinx
You absolutely don't want to use Zend Framework's Lucene implementation for that many records. Lucene is a great idea, just not a pure-PHP version.
Check out Solr and ElasticSearch, two Lucene-based search services that may fit your needs well. ElasticSearch is incredibly usable right out of the box with effectively zero configuration.
Related
I'm developing a site that could be compared with a tube site (like YouTube). I'm in the design phase and am trying to figure out what search method to go with.
I'm using SilverStripe framework which has modules for Sphinx, Solr, and Lucene so they are obviously interesting. Another option is to simply query the database (MySQL) and not use any search engine.
What would you do? And why?
Any input is appreciated! Thanks in advance!
simply query the database (MySQL) and not use any search engine
I assume you want to use MyISAM's full-text search capabilities? This is possible, SilverStripe's default configuration is currently (at least until version 2.4) set to MyISAM and not InnoDB. However, this is only recommended for simple, small, and not performance hungry tasks - I assume that's not what you want.
More powerful (both in terms of speed and feature wise) are dedicated search services.
For a general overview, take a look at ElasticSearch, Sphinx, Lucene, Solr, Xapian. Which fits for which usage? for example.
With the details you've given, any of the five should get your job done, but you might give that some more consideration.
However, I would also take into consideration, for which search services SilverStripe modules are already available, how well they fit your requirements, and how much you "like" them. Unless you'd want to write a module for ElasticSearch for example - that would be pretty cool, but I'm not sure it's really worth the effort.
Personally, I'd probably go with https://code.google.com/p/lucene-silverstripe-plugin/ as it's easy to set up and seems to be working well (haven't tried it myself, but I have only heard good things from others about it).
I have a PHP web site with data stored in a MySql database. (approximately 50 000 articles)
I want to improve the results of the full text search functionality and stop using just a simple LIKE query.
I find Zend_Search_Lucene from the Zend framework that seems to be a great tool.
Do you think zend search lucene is a good choice in my case ?
After indexing all my articles with lucene, do I need to keep the data in MySql or zend search lucene is enough to keep all the data ?
Thanks in advance,
I would investigate if MySQLs native Full-Text Searching would meet your needs first before jumping to a Lucene based solution. It is a major improvement upon using LIKE statements without the additional implementation required for Lucene.
Zend_Search_Lucene is a pure PHP implementation of Lucene and can therefore be pretty slow when used with large datasets. I would skip it and look at implementing Apache Solr. There is PECL extension for it, which is documented here.
I have used MySQL's fulltext on over 200,000 docs with a good amount of data and my search times are around .5 seconds to 2 seconds on popular terms and a very rare 5 or 6 second response every so often. I update some data each day so long term caching doesn't work the best but if I could cache searches I could be looking at .2 second times or lower after caching.
I am testing moving over to Zend Lucene and so far the same searches come in under 1.5 seconds for the most used terms.
All of the above is on a dedicated server with 2 gigs of ram and a core 2 duo.
I am no expert but for 50,000 articles I agree with Treffynnon to check out fulltext searching instead of using LIKE. If you do move to a new version of Zend Lucene I believe the indexes are compatible with the java version so it may make for a good gateway if down the road you add more articles and need more speed?
I am creating a social site and for search want to try solr or lucene as I have very indepth searches required. Platform is PHP codeignitor and MySQL. However my php developers have 0 experience outside of PHP/MySQL. So before i make them implement this I need to know:
1) How easy or how much time would it normally take to setup and get it implemented?
2) Is there coding involved or is it ready out of the box? ( I know there will be some to link it with my system objects)
3) Which one to use out of the two?
For your use, I would suggest Solr. To use Lucene, you will need in depth Java knowledge, where as with Solr, you don't necessarily need this.
Solr will be ready out of the box, but you will need to do some configuration to "describe" your search index. You need to configure it so that it understands what your documents look like, what fields within that document to search on, how to search them, etc. This does have a learning curve. However, it's not overly difficult. The time this takes is greatly affected by how complex you want your searches to be.
For simple searches, I would think a developer should be able to insert documents and perform searches within a week of starting with Solr. Depending on how in depth your searches are, a developer could spend weeks or months learning and fiddling to tweak things. However, the bulk of the work should be doable within a few weeks of concentrated effort.
For what it's worth, the wiki and mailing lists for Solr are great resources. AND the developers themselves are very responsive.
EDIT: The coding involved with Solr would be on the PHP side. You need to write something to put your data into the XML format that Solr needs to insert documents into it's index, as all of this is done via XML over HTTP.
Can anyone help me with a good list of php site search engines. I am thinking of implementing a google site search, but I would rather not pay for that and I would rather have as much control as I can over it.
Read through Roll your own Search Engine with Zend_Lucene.
The article is rather old though, so have a look at the ZF Reference Guide about Zend_Lucene too. Searching for Zend Lucene on Google should yield plenty useful results too.
Sphinx is pretty good, but it isn't written in PHP. It has got PHP libraries to interface with it though. You could also have a look at Zend_Search_Lucene from Zend Framework. Both of these make search indexes so you can do fast searches.
You can try the Zend Lucene implementation:
http://framework.zend.com/manual/en/zend.search.lucene.html
http://devzone.zend.com/article/91
You don't have to pay for Google Site Search and there's a small chance for much control means greater quality of results.
If your site is very specific you need to write you own code for search.
Sphinx is one of the best Open Source Search Engines. It has an excellent PHP API. Has very good community and forum too. PHP API for Sphinx comes embedded with the tar/zip file that you will download and with ease it can be embedded on top of your database. Has great vertical search capabilities. Its pretty simple to implement, try it out.
Here is a new PHP Search engine script, that can be implemented in any website, it is made with PHP 5.4+, MySQL, and Ajax.
https://sourceforge.net/projects/site-search-engine-php-ajax/
It crawls and indexes automatically the site pages, similar to Sphider.
It can uses PDO or MySQLi for connecting to MySQL database.
I want to implement a powerful search engine for my ecommerce application. im using php and mysql as database. Can anyone guide me how to proceed? Is the FULL TEXT feature of MYSQL good for a large volume of data?
Thanks!
IMHO, the MySQL Full text engine is a really poor choice.
Firstly, the number of parameters to tweak the search is almost 0.
Secondly, from my experiencem it doesn't scale.
You might consider using
Sphinx
Lucene
Lucene is said to be the industry standard project. They have solr if you want to have a separate architecture.
They are far more advanced and perform better.
This should get you started, however you will have to modify or expand on the idea.
For the second part of your question, have a look at:
Pros & Cons of Full Text Search
Recently, for an app handling a huge amount of data, we have given up both MySQL FULL TEXT and Lucene to switch on PostgreSQL which has a much more powerful native FULL TEXT engine. At least, it was what the results of our investigations said.
Take a look at the Zend_Lucene from Zend_Framework and a new feature for mysql full text search here