I have a MySQL database with two main tables that contain the data I need to index. I am looking for a search engine API that can index and return appropriate search results - as close as possible to Google quality -. The application uses the keywords and creates pages based on the search results.
I have tried SOLR but am not sure if that is the best one. Any other paid or open source alternatives you may have come across? The project is LAMP based.
Thanks,
Sameer
Solr/Lucene are definitely the de-facto when it comes to open source search World. I love Solr. No! you dont need to go for anything "Paid" :). In my opinion (if you want to go for something else) you try out Sphinx Search Engine, its absolutely amazing, integrates extremely well with LAMP. Infact the PHP API that ships with it is really good and you can get started with Search using Sphinx in almost no time.
Related
I want to make a searching option for my site, and for fun I decided I should at least try to make it myself (If I fail, there's always Google Custom Search).
The problem is, I don't even know how to approach this monster! Here are the requirements:
Not all keywords will be required in the search (Should one search for "Big happy world", it would also search for "Big world" "happy world" etc)
Common spelling mistakes considerations (from a database, via edit difference or a predefined list of common mistakes (rather then => rather than, etc).
Search in both content and titles of posts, with an emphesis on titles.
Don't suck
I've searched my old pal Google for it, but the only reasonable things I found were academic level papers on the subject (English isn't my native, I'm good but not that good =( ).
So in short: does anyone know of a good place to start, a tutorial, an article, an example?
Thanks in advance.
There are several options you could try:
Apache Lucene (A PHP based implementation exists in the Zend Framework)
ElasticSearch (provides a REST-like API on top of Lucene)
Xapian
Sphinx
Probably a bunch of others too.
If you want to create your own search engine, apache lucene is a mature open source library that can take care of a big part of the functionality for you.
Using lucene, you first index your information [using an IndexWriter]. This is done off line, to create the index.
On serach - you use an IndexSearcher to find documents that match your query.
If you want some theoretical knowledge on "how it works", you should read more on information retrieval. A good place to start is stanford's introduction to information retrieval
I need to design a search form and the code behind it.
I'm not very familiar with searches.
My table have the following aspect:
- Table_ads
site_name
ad_type
uri
pagetitle
name_ad
general_description_ad
age
country_ad
location_ad
zone_ad
Initially my idea was to do a search like google, we have a single text box and the button search, but I think this will be difficult. The other option is to build a search by fields(traditional search)
What do you think about this subject. What type of search should I do?
Best Regards,
PS: Sorry my English.
For "google-like" search it's best to use Full-Text Search (FTS) solution.
PostgreSQL 8.3 and newer has a built-in FTS engine, and it will let you do all querying in SQL. Example:
SELECT uri FROM ads WHERE fts ## plainto_tsquery('cereal');
See documentation -> http://www.postgresql.org/docs/current/static/textsearch.html and come back if you have more questions :-)
However, in-database FTS is several times slower than dedicated FTS.
So if you need better performance, you will have to build an index outside of database,
Here I would recommend Sphinx -> http://sphinxsearch.com/docs/current.html, Sphinx integrates smoothly with PHP. You feed it with documents (preferably, in form of special XML docset) and update the index on demand or with some scheduler. Then you do searching directly from PHP (not touching the database).
HTH.
Can anyone help me with a good list of php site search engines. I am thinking of implementing a google site search, but I would rather not pay for that and I would rather have as much control as I can over it.
Read through Roll your own Search Engine with Zend_Lucene.
The article is rather old though, so have a look at the ZF Reference Guide about Zend_Lucene too. Searching for Zend Lucene on Google should yield plenty useful results too.
Sphinx is pretty good, but it isn't written in PHP. It has got PHP libraries to interface with it though. You could also have a look at Zend_Search_Lucene from Zend Framework. Both of these make search indexes so you can do fast searches.
You can try the Zend Lucene implementation:
http://framework.zend.com/manual/en/zend.search.lucene.html
http://devzone.zend.com/article/91
You don't have to pay for Google Site Search and there's a small chance for much control means greater quality of results.
If your site is very specific you need to write you own code for search.
Sphinx is one of the best Open Source Search Engines. It has an excellent PHP API. Has very good community and forum too. PHP API for Sphinx comes embedded with the tar/zip file that you will download and with ease it can be embedded on top of your database. Has great vertical search capabilities. Its pretty simple to implement, try it out.
Here is a new PHP Search engine script, that can be implemented in any website, it is made with PHP 5.4+, MySQL, and Ajax.
https://sourceforge.net/projects/site-search-engine-php-ajax/
It crawls and indexes automatically the site pages, similar to Sphider.
It can uses PDO or MySQLi for connecting to MySQL database.
I need to write a small search engine with spiders and all this stuff.What do you recommend men ASP.NET or PHP ?
and what sources should i read in to get the knowledge?
Before you begin writing this monster of a project (by no means will it be small) I'd like to know why you need to write this engine... Is it for an internal project that can't be indexed by other search engines, or what?
If it's a search engine for a site of your own which you have full control of, it's better to index the information on the site as it's added, edited, and removed, to prevent having to use spiders.
If it's for other websites, then the technology that engines such as Google, Yahoo, and Bing have to offer will always be better than what you can come up with in a couple weeks. If it's something that they can index, then I'd suggest looking into their APIs (Bing has some pretty neat ones if you are okay with the results they provide) and use them for crawling and querying whatever you require them to.
If you really need to make your own engine, it's not going to be a small project..
If you don't want to write it, I recommend you: Sphider
I wanted to add a search feature on a website that would allow users to search the whole website.
The site has around 20 tables and i wanted the search to search through all 20 tables.
Any one can point me into what sort of mysql queries I need to build?
First of all, what about adding custom Google websearch to your site?
The hard way: You should propably do a query for each of your tables and LIMIT (with LIKE on text columns or use full text indexing if your database software supports this) the result to X (e.g. ten) results. In your code, somehow rate these results and display the X best results.
You could also try to use a UNION of multiple queries but then the resulting tuples all have to same structure (if I remember correctly).
Search engines. My Comp Sci degree thesis. First of all you have to ask yourself the question. What type of search do you want to offer the user. If the user will clearly know what they are looking for, for example a product based website then you should provide a search engine based on meta-data. For example users will be searching for a specific product, or product type. This is generally quite easy to provide.
The next is your familiar web search engine such as Google. Google here targets a completely different market. The typical user doesn't know exactly what they are looking for. They just know that they are looking for something to do with Aeroplanes for example. Now Google has to try and figure out what is the result that is most likely to match that and be the most relevant.
I know Google has an incredibly complex and optimised system but from memory if you want to go this way you need to create something called an inverted index file. Then you need to start thinking about a thesaurus because what if the user types in cat, then you should also provide results that contain the word feline. Also word trees, because the user typed in cat the cats result will also be relevant.
I am pretty sure that if you are providing a search engine for your website then it most likely be a metadata search engine in which case you can roll your own solution. If not and you are looking for the second type then why not use Google's services. They provide a custom search that will work within your own website.
Use Sphinx or if you're using ZF — Lucene.
1.: Set a FULLTEXT index on the fields with the content and use the fulltext search mysql provides: http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html
or
2.: Have a look at the lucene search the Zend Framework provides: http://framework.zend.com/manual/en/zend.search.lucene.html
have u tried looking at lucene? its one of the best search modules available today. i would strongly suggest you to give it a shot