Solr / lucene search - how easy to use - which one? - php

I am creating a social site and for search want to try solr or lucene as I have very indepth searches required. Platform is PHP codeignitor and MySQL. However my php developers have 0 experience outside of PHP/MySQL. So before i make them implement this I need to know:
1) How easy or how much time would it normally take to setup and get it implemented?
2) Is there coding involved or is it ready out of the box? ( I know there will be some to link it with my system objects)
3) Which one to use out of the two?

For your use, I would suggest Solr. To use Lucene, you will need in depth Java knowledge, where as with Solr, you don't necessarily need this.
Solr will be ready out of the box, but you will need to do some configuration to "describe" your search index. You need to configure it so that it understands what your documents look like, what fields within that document to search on, how to search them, etc. This does have a learning curve. However, it's not overly difficult. The time this takes is greatly affected by how complex you want your searches to be.
For simple searches, I would think a developer should be able to insert documents and perform searches within a week of starting with Solr. Depending on how in depth your searches are, a developer could spend weeks or months learning and fiddling to tweak things. However, the bulk of the work should be doable within a few weeks of concentrated effort.
For what it's worth, the wiki and mailing lists for Solr are great resources. AND the developers themselves are very responsive.
EDIT: The coding involved with Solr would be on the PHP side. You need to write something to put your data into the XML format that Solr needs to insert documents into it's index, as all of this is done via XML over HTTP.

Related

php MySQL implementing search feature

I'm having a hard time implementing a search feature for a web based system I’m working on, I first use MySQL Like with %wildcards%, but it not searching what I want to display, then I come upon Full Text index search, it search very good but has an issue on displaying joined multiple tables with foreign key which I don’t know workarounds, then I came along with MySQL with sphinx,
may I ask for any advice the best way/technologies to implement a search feature to search a Complex database tables
Check Apache Solr search server
Apache Solr official website
this technology will solve all your searching related problems
I guess the general answer here is you want a 'search index' - an index specifically for running searches. A repository that has all the required data to answer queries.
A RDBMS (like MySQL) is very good for Normalizing data, setting data up in a compact and easy to update format (ie minimise duplicate) - thats great for storage. But queries suffer as they have to do much more work to 'join' all the required data back.
... but for searching a denormalizaed structre may be best. (bigger, but easier - therefore quicker to 'search'.
There are many ways of doing that.
A materialized view as noted in your other thread php mysql full text search multiple table joined by id - keeps it all in mysql.
Using a external application. There are many examples, Lucene (variants include Solr and ElasticSearch), SphinxSearch, and many more.
This generally work in a similar way - setting up a dedicated copy of the data to make queries easier.
Use an external provider. Ther are many 'search as a service' systems (basically wrappers around the software mentioned in previous posts)
Building your own! Its possible to build a system yourself using just normal mysql tables. Basically an implementation of an inverted index will probably be the easiest.
Which you use is down to personal preference (eg, an external app is more work to setup, but overall is more powerful)

Loose searching approach

I want to make a searching option for my site, and for fun I decided I should at least try to make it myself (If I fail, there's always Google Custom Search).
The problem is, I don't even know how to approach this monster! Here are the requirements:
Not all keywords will be required in the search (Should one search for "Big happy world", it would also search for "Big world" "happy world" etc)
Common spelling mistakes considerations (from a database, via edit difference or a predefined list of common mistakes (rather then => rather than, etc).
Search in both content and titles of posts, with an emphesis on titles.
Don't suck
I've searched my old pal Google for it, but the only reasonable things I found were academic level papers on the subject (English isn't my native, I'm good but not that good =( ).
So in short: does anyone know of a good place to start, a tutorial, an article, an example?
Thanks in advance.
There are several options you could try:
Apache Lucene (A PHP based implementation exists in the Zend Framework)
ElasticSearch (provides a REST-like API on top of Lucene)
Xapian
Sphinx
Probably a bunch of others too.
If you want to create your own search engine, apache lucene is a mature open source library that can take care of a big part of the functionality for you.
Using lucene, you first index your information [using an IndexWriter]. This is done off line, to create the index.
On serach - you use an IndexSearcher to find documents that match your query.
If you want some theoretical knowledge on "how it works", you should read more on information retrieval. A good place to start is stanford's introduction to information retrieval

My website searches for data in heavy databases. Should I use Lucene to search or write my own algorithm?

I am writing a website which indexes large amounts of data into databases (each with about 800 tables per database), and the website allows you to search the database for various items. Should I use something like lucene or just write my own search algorithm? I am using PHP and MySQL. Although I can filter my SELECT queries, and create a searching algorithm I just wanted to know if I should use Lucene because I am just indexing stuff in a database. Also please do suggest anything that might help me. Forgot to mention that even though I have 800 tables they would be pretty small in size.
Lucene is a mature, tested, open source library.
I would definetly say: try to use it as much as possible, it will probably be better and consume less time then implementing your own library.
If there is a certain functionality that lucene does not provide - you can always create your own variation of lucene to take care of it.
Do not underestimate the importance of the community in using products such lucene: Help is almost always available in lucene's forums [and SO], and the library is constantly tested and maintained because of the large number of users!
Without seeing your data answering this question is very hard, however I can say from personal experience that writing a search of any kind quickly becomes very complex. You have to worry about weighting the various columns you are searching, and search in SQL is almost never as fast as search in a dedicated search engine. At work we are switching from an in house SQL based search to Sphinx Search to search our product catalog because of this very reason.

Making MySQL databases readable for non-developers

This summer, I will be designing an e-commerce website and have chosen MySQL to organize the incredible amounts of data I will be receiving. The people I am designing for are great at making their products...but have absolutely no development or coding experience.
I have three months to make the site, and I don't begin until June. In the end, they would like an easy, readable, and preferably fasionable way to present this data. They also want to be able to manipulate it (sort by date, item, customer, etc.). They don't care if it's an Excel file, a secure webpage, or anything like that.
I know the basics of MySQL, but I am looking for ways to PRESENT the data in a way that is easy and accessible. I love to teach myself and do my own research, so my question is...what topics of interest in MySQL should I read into to learn how to present this data?
Choose any e-commerce CMS like Magento or oscommerce or opencart. All these e-commerce solutions has many in-built reports that would be needed by the business people..
And there are much more options available than normal reporting and these solutions covers most of the business objectives and business models , so whenever the business evolves it will be easier to update the website with little effort..
For a list of e-commerce solutions and comparisons, visit http://en.wikipedia.org/wiki/Comparison_of_shopping_cart_software
If you have a decent grasp of JavaScript and programming web via PHP or Java I would recommend Dojo DataGrid. It is fairly simple to implement if you use the basic grid and looks and performs great.
Dont use MYSQL - Oracle is going to kill it - it is in their plans - use MariaDB - (drop in replacement for MySQL)
look into using php/mysql together with some fancy jquery stuff like dataTables to present your data. A great article/tutorial on just how to do this can be found here ->
You should get away with knowing the basics of mysql to rig something like that up to work...
If there are a lot of numeric parameters and enum type stuff, try using jquery ui to make it look nice with some sliders and fancy checkboxes etc.
I've got a prototype of something I'm working on (slowly...) that utilizes all of the above if you want to see. here it is! It's for a shopping cart but you get the drift
Good Luck!
Assuming you are building the system yourself (and don't have an off-the-shelf option)...
· If they need lots of flexibility in manipulating the data, I'd run a cron job that exports reports as CSV files for them to open in Excel.
· If there are limited views that they are interested in, I'd run the report as a php script that renders an html table, and make it sortable using a jQuery widget.

Choosing search engine for tube site? (SilverStripe specific or in general)

I'm developing a site that could be compared with a tube site (like YouTube). I'm in the design phase and am trying to figure out what search method to go with.
I'm using SilverStripe framework which has modules for Sphinx, Solr, and Lucene so they are obviously interesting. Another option is to simply query the database (MySQL) and not use any search engine.
What would you do? And why?
Any input is appreciated! Thanks in advance!
simply query the database (MySQL) and not use any search engine
I assume you want to use MyISAM's full-text search capabilities? This is possible, SilverStripe's default configuration is currently (at least until version 2.4) set to MyISAM and not InnoDB. However, this is only recommended for simple, small, and not performance hungry tasks - I assume that's not what you want.
More powerful (both in terms of speed and feature wise) are dedicated search services.
For a general overview, take a look at ElasticSearch, Sphinx, Lucene, Solr, Xapian. Which fits for which usage? for example.
With the details you've given, any of the five should get your job done, but you might give that some more consideration.
However, I would also take into consideration, for which search services SilverStripe modules are already available, how well they fit your requirements, and how much you "like" them. Unless you'd want to write a module for ElasticSearch for example - that would be pretty cool, but I'm not sure it's really worth the effort.
Personally, I'd probably go with https://code.google.com/p/lucene-silverstripe-plugin/ as it's easy to set up and seems to be working well (haven't tried it myself, but I have only heard good things from others about it).

Categories