I have a PHP web site with data stored in a MySql database. (approximately 50 000 articles)
I want to improve the results of the full text search functionality and stop using just a simple LIKE query.
I find Zend_Search_Lucene from the Zend framework that seems to be a great tool.
Do you think zend search lucene is a good choice in my case ?
After indexing all my articles with lucene, do I need to keep the data in MySql or zend search lucene is enough to keep all the data ?
Thanks in advance,
I would investigate if MySQLs native Full-Text Searching would meet your needs first before jumping to a Lucene based solution. It is a major improvement upon using LIKE statements without the additional implementation required for Lucene.
Zend_Search_Lucene is a pure PHP implementation of Lucene and can therefore be pretty slow when used with large datasets. I would skip it and look at implementing Apache Solr. There is PECL extension for it, which is documented here.
I have used MySQL's fulltext on over 200,000 docs with a good amount of data and my search times are around .5 seconds to 2 seconds on popular terms and a very rare 5 or 6 second response every so often. I update some data each day so long term caching doesn't work the best but if I could cache searches I could be looking at .2 second times or lower after caching.
I am testing moving over to Zend Lucene and so far the same searches come in under 1.5 seconds for the most used terms.
All of the above is on a dedicated server with 2 gigs of ram and a core 2 duo.
I am no expert but for 50,000 articles I agree with Treffynnon to check out fulltext searching instead of using LIKE. If you do move to a new version of Zend Lucene I believe the indexes are compatible with the java version so it may make for a good gateway if down the road you add more articles and need more speed?
Related
the task is to implement text search in MySQL in my project(PHP/Zend Framework 2 + MySQL). The issue is that text fields are not big at all, it is mostly VARCHAR fields or joined fields like city names, company names and so on, about 5-10 fields for each entity.
So currently I decided to choose Lucene(zend framework 2 module - Zend Search), but will it be effective to use technologies like Lucene or Sphinx for small varchar fields?
Thank you.
Sure, Lucene or Sphinx can work with any varchar columns that contain text.* They don't have to be huge.
Any fulltext indexing solution is hundreds or thousands of times better than using LIKE '%word%'!
You might be interested in my presentation, Fulltext Search Throwdown.
You can also watch a recording of me delivering that presentation as a webinar: http://www.percona.com/webinars/2012-08-22-full-text-search-throwdown (it's free but requires registration).
* Lucene and Sphinx can do some things with numeric columns too.
PS: I was the project lead on Zend Framework 1.0. Zend_Search_Lucene was an interesting experiment at circa 2007, but it's way outdated, relative to Apache Lucene/Solr, and Zend_Search_Lucene is orders of magnitude slower than the Java implementation. I wouldn't bother with it.
I am writing a website which indexes large amounts of data into databases (each with about 800 tables per database), and the website allows you to search the database for various items. Should I use something like lucene or just write my own search algorithm? I am using PHP and MySQL. Although I can filter my SELECT queries, and create a searching algorithm I just wanted to know if I should use Lucene because I am just indexing stuff in a database. Also please do suggest anything that might help me. Forgot to mention that even though I have 800 tables they would be pretty small in size.
Lucene is a mature, tested, open source library.
I would definetly say: try to use it as much as possible, it will probably be better and consume less time then implementing your own library.
If there is a certain functionality that lucene does not provide - you can always create your own variation of lucene to take care of it.
Do not underestimate the importance of the community in using products such lucene: Help is almost always available in lucene's forums [and SO], and the library is constantly tested and maintained because of the large number of users!
Without seeing your data answering this question is very hard, however I can say from personal experience that writing a search of any kind quickly becomes very complex. You have to worry about weighting the various columns you are searching, and search in SQL is almost never as fast as search in a dedicated search engine. At work we are switching from an in house SQL based search to Sphinx Search to search our product catalog because of this very reason.
So I have a simple classified site in php/mysql , to which I need to add search capability.
I can't use mysql FULLTEXT feature , as I require 3 character long words to be part of the index. What are my options ?
I am looking at zend search lucene . And while it would do the job, I am wondering if there is something better . Example: for implementing zend search lucene I would have to make 3 methods , one to add to the index when a post is created, one to delete from the index when a post is deleted and finally one to create index from existing posts.
Is there something more automatic or should I just stick with lucene.
Thanks
Having used Zend_Search_Lucene in a rather big project I can say that it's not well suited for that. Small to medium sized projects should be fine, but if the data set is too large you'll run into problems with a pure PHP implementation of Lucene. Mostly that it becomes rather slow, since it can't keep things in memory, and that PHP will run out of memory for large result sets.
I'd recommend a standalone search server like Solr, which will give you more flexibility in the long run.
I'm developing a site that could be compared with a tube site (like YouTube). I'm in the design phase and am trying to figure out what search method to go with.
I'm using SilverStripe framework which has modules for Sphinx, Solr, and Lucene so they are obviously interesting. Another option is to simply query the database (MySQL) and not use any search engine.
What would you do? And why?
Any input is appreciated! Thanks in advance!
simply query the database (MySQL) and not use any search engine
I assume you want to use MyISAM's full-text search capabilities? This is possible, SilverStripe's default configuration is currently (at least until version 2.4) set to MyISAM and not InnoDB. However, this is only recommended for simple, small, and not performance hungry tasks - I assume that's not what you want.
More powerful (both in terms of speed and feature wise) are dedicated search services.
For a general overview, take a look at ElasticSearch, Sphinx, Lucene, Solr, Xapian. Which fits for which usage? for example.
With the details you've given, any of the five should get your job done, but you might give that some more consideration.
However, I would also take into consideration, for which search services SilverStripe modules are already available, how well they fit your requirements, and how much you "like" them. Unless you'd want to write a module for ElasticSearch for example - that would be pretty cool, but I'm not sure it's really worth the effort.
Personally, I'd probably go with https://code.google.com/p/lucene-silverstripe-plugin/ as it's easy to set up and seems to be working well (haven't tried it myself, but I have only heard good things from others about it).
I am building a system that has database operations that has millions of records.I am using Zend Framework in all part of my project.I wanted to use a search indexing technique but have you got any advice on this?which technique should i use?
Thanks in advance
Zend Lucene absolutely unrelevant for "millions of records".
Try to use sphinx http://sphinxsearch.com/docs/manual-1.10.html.
It has many usefull fratures, including clasterization to many servers; smart, customizable result ranking and much more. And it is really fast.
PHP API docs: http://www.php.net/manual/en/book.sphinx.php
There is C-version of PHP API http://pecl.php.net/package/sphinx
You absolutely don't want to use Zend Framework's Lucene implementation for that many records. Lucene is a great idea, just not a pure-PHP version.
Check out Solr and ElasticSearch, two Lucene-based search services that may fit your needs well. ElasticSearch is incredibly usable right out of the box with effectively zero configuration.