I need to write a small search engine with spiders and all this stuff.What do you recommend men ASP.NET or PHP ?
and what sources should i read in to get the knowledge?
Before you begin writing this monster of a project (by no means will it be small) I'd like to know why you need to write this engine... Is it for an internal project that can't be indexed by other search engines, or what?
If it's a search engine for a site of your own which you have full control of, it's better to index the information on the site as it's added, edited, and removed, to prevent having to use spiders.
If it's for other websites, then the technology that engines such as Google, Yahoo, and Bing have to offer will always be better than what you can come up with in a couple weeks. If it's something that they can index, then I'd suggest looking into their APIs (Bing has some pretty neat ones if you are okay with the results they provide) and use them for crawling and querying whatever you require them to.
If you really need to make your own engine, it's not going to be a small project..
If you don't want to write it, I recommend you: Sphider
Related
I have a MySQL database with two main tables that contain the data I need to index. I am looking for a search engine API that can index and return appropriate search results - as close as possible to Google quality -. The application uses the keywords and creates pages based on the search results.
I have tried SOLR but am not sure if that is the best one. Any other paid or open source alternatives you may have come across? The project is LAMP based.
Thanks,
Sameer
Solr/Lucene are definitely the de-facto when it comes to open source search World. I love Solr. No! you dont need to go for anything "Paid" :). In my opinion (if you want to go for something else) you try out Sphinx Search Engine, its absolutely amazing, integrates extremely well with LAMP. Infact the PHP API that ships with it is really good and you can get started with Search using Sphinx in almost no time.
I am writing a website which indexes large amounts of data into databases (each with about 800 tables per database), and the website allows you to search the database for various items. Should I use something like lucene or just write my own search algorithm? I am using PHP and MySQL. Although I can filter my SELECT queries, and create a searching algorithm I just wanted to know if I should use Lucene because I am just indexing stuff in a database. Also please do suggest anything that might help me. Forgot to mention that even though I have 800 tables they would be pretty small in size.
Lucene is a mature, tested, open source library.
I would definetly say: try to use it as much as possible, it will probably be better and consume less time then implementing your own library.
If there is a certain functionality that lucene does not provide - you can always create your own variation of lucene to take care of it.
Do not underestimate the importance of the community in using products such lucene: Help is almost always available in lucene's forums [and SO], and the library is constantly tested and maintained because of the large number of users!
Without seeing your data answering this question is very hard, however I can say from personal experience that writing a search of any kind quickly becomes very complex. You have to worry about weighting the various columns you are searching, and search in SQL is almost never as fast as search in a dedicated search engine. At work we are switching from an in house SQL based search to Sphinx Search to search our product catalog because of this very reason.
I'm developing a site that could be compared with a tube site (like YouTube). I'm in the design phase and am trying to figure out what search method to go with.
I'm using SilverStripe framework which has modules for Sphinx, Solr, and Lucene so they are obviously interesting. Another option is to simply query the database (MySQL) and not use any search engine.
What would you do? And why?
Any input is appreciated! Thanks in advance!
simply query the database (MySQL) and not use any search engine
I assume you want to use MyISAM's full-text search capabilities? This is possible, SilverStripe's default configuration is currently (at least until version 2.4) set to MyISAM and not InnoDB. However, this is only recommended for simple, small, and not performance hungry tasks - I assume that's not what you want.
More powerful (both in terms of speed and feature wise) are dedicated search services.
For a general overview, take a look at ElasticSearch, Sphinx, Lucene, Solr, Xapian. Which fits for which usage? for example.
With the details you've given, any of the five should get your job done, but you might give that some more consideration.
However, I would also take into consideration, for which search services SilverStripe modules are already available, how well they fit your requirements, and how much you "like" them. Unless you'd want to write a module for ElasticSearch for example - that would be pretty cool, but I'm not sure it's really worth the effort.
Personally, I'd probably go with https://code.google.com/p/lucene-silverstripe-plugin/ as it's easy to set up and seems to be working well (haven't tried it myself, but I have only heard good things from others about it).
I am creating a social site and for search want to try solr or lucene as I have very indepth searches required. Platform is PHP codeignitor and MySQL. However my php developers have 0 experience outside of PHP/MySQL. So before i make them implement this I need to know:
1) How easy or how much time would it normally take to setup and get it implemented?
2) Is there coding involved or is it ready out of the box? ( I know there will be some to link it with my system objects)
3) Which one to use out of the two?
For your use, I would suggest Solr. To use Lucene, you will need in depth Java knowledge, where as with Solr, you don't necessarily need this.
Solr will be ready out of the box, but you will need to do some configuration to "describe" your search index. You need to configure it so that it understands what your documents look like, what fields within that document to search on, how to search them, etc. This does have a learning curve. However, it's not overly difficult. The time this takes is greatly affected by how complex you want your searches to be.
For simple searches, I would think a developer should be able to insert documents and perform searches within a week of starting with Solr. Depending on how in depth your searches are, a developer could spend weeks or months learning and fiddling to tweak things. However, the bulk of the work should be doable within a few weeks of concentrated effort.
For what it's worth, the wiki and mailing lists for Solr are great resources. AND the developers themselves are very responsive.
EDIT: The coding involved with Solr would be on the PHP side. You need to write something to put your data into the XML format that Solr needs to insert documents into it's index, as all of this is done via XML over HTTP.
Can anyone help me with a good list of php site search engines. I am thinking of implementing a google site search, but I would rather not pay for that and I would rather have as much control as I can over it.
Read through Roll your own Search Engine with Zend_Lucene.
The article is rather old though, so have a look at the ZF Reference Guide about Zend_Lucene too. Searching for Zend Lucene on Google should yield plenty useful results too.
Sphinx is pretty good, but it isn't written in PHP. It has got PHP libraries to interface with it though. You could also have a look at Zend_Search_Lucene from Zend Framework. Both of these make search indexes so you can do fast searches.
You can try the Zend Lucene implementation:
http://framework.zend.com/manual/en/zend.search.lucene.html
http://devzone.zend.com/article/91
You don't have to pay for Google Site Search and there's a small chance for much control means greater quality of results.
If your site is very specific you need to write you own code for search.
Sphinx is one of the best Open Source Search Engines. It has an excellent PHP API. Has very good community and forum too. PHP API for Sphinx comes embedded with the tar/zip file that you will download and with ease it can be embedded on top of your database. Has great vertical search capabilities. Its pretty simple to implement, try it out.
Here is a new PHP Search engine script, that can be implemented in any website, it is made with PHP 5.4+, MySQL, and Ajax.
https://sourceforge.net/projects/site-search-engine-php-ajax/
It crawls and indexes automatically the site pages, similar to Sphider.
It can uses PDO or MySQLi for connecting to MySQL database.