I'm creating a site that allows users to submit quotes. How would I go about creating a (relatively simple?) search that returns the most relevant quotes?
For example, if the search term was "turkey" then I'd return quotes where the word "turkey" appears twice before quotes where it only appears once.
(I would add a few other rules to help filter out irrelevant results, but my main concern is that.)
Everyone is suggesting MySQL fulltext search, however you should be aware of a HUGE caveat. The Fulltext search engine is only available for the MyISAM engine (not InnoDB, which is the most commonly used engine due to its referential integrity and ACID compliance).
So you have a few options:
1. The simplest approach is outlined by Particle Tree. You can actaully get ranked searches off of pure SQL (no fulltext, no nothing). The SQL query below will search a table and rank results based off the number of occurrences of a string in the search fields:
SELECT
SUM(((LENGTH(p.body) - LENGTH(REPLACE(p.body, 'term', '')))/4) +
((LENGTH(p.body) - LENGTH(REPLACE(p.body, 'search', '')))/6))
AS Occurrences
FROM
posts AS p
GROUP BY
p.id
ORDER BY
Occurrences DESC
edited their example to provide a bit more clarity
Variations on the above SQL query, adding WHERE statements (WHERE p.body LIKE '%whatever%you%want'), etc. will probably get you exactly what you need.
2. You can alter your database schema to support full text. Often what is done to keep the InnoDB referential integrity, ACID compliance, and speed without having to install plugins like Sphinx Fulltext Search Engine for MySQL is to split the quote data into it's own table. Basically you would have a table Quotes that is an InnoDB table that, rather than having your TEXT field "data" you have a reference "quote_data_id" which points to the ID on a Quote_Data table which is a MyISAM table. You can do your fulltext on the MyISAM table, join the IDs returned with your InnoDB tables and voila you have your results.
3. Install Sphinx. Good luck with this one.
Given what you described, I would HIGHLY recommend you take the 1st approach I presented since you have a simple database driven site. The 1st solution is simple, gets the job done quickly. Lucene will be a bitch to setup especially if you want to integrate it with the database as Lucene is designed mainly to index files not databases. Google custom site search just makes your site lose tons of reputation (makes you look amateurish and hacked), and MySQL fulltext will most likely cause you to alter your database schema.
Use Google Custom Site Search. I've heard they know a thing or two about searching.
Stackoverflow plans to use the Lucene search engine. There is a PHP port of this written for the Zend Framework but can be downloaded as a separate entity without needing all the ZF bloat. This is called Zend_Search_Lucene, documentation for which can be found here.
Your sql for that will look something like this (where you're trying to find quotes with 'turkey' in it):
SELECT * FROM Quotes
WHERE the_quote LIKE "%turkeyt%";
From there you can figure out what to do with whatever it spits out at you.
Be careful to properly handle cases where a malicious user might inject malicious SQL into your database, especially if you're planning on putting this on the www. If you're doing this for fun though, I guess it's just about what you want to learn.
If you're new to databases and sql, I recommend sqlite over mysql. Much easier to set up and work with, as in no set up. It'll get you around the potential headaches of having to install and set up mysql for the first time.
I'd go with Full Text Search, look at it here: http://hockinson.com/fulltext-search-of-mysql-database-table.html
If you want to write your own, take a look at phpBB's implementation. They have two tables, the first is a unique list of all the words that appear in entries, and the second is a many-to-many reference between the words and the entries. You could then do a group and count to sort the entries in the manner you're looking for.
It's a lot more work then implementing a third-party search engine (or full text search), but it will allow you greater control over the results.
As an alternative to Sphinx and Lucene, a relatively simple search engine can be created using the Xapian library.
+ Supports many advanced search features (such as relevancy ranking)
+ Fast
- You would need to learn the API to create your interface
- Requires a php extension to be installed
Note also that Xapian stores its data in a separate index to mysql.
You might also be interested in Forage which is a wrapper for Solr, Xapian and Lucene.
The Xapian people also created the Omega search engine which is a frontend to Xapian, and can be called via cgi.
Here's a much simpler and easier to operate open source alternative to Solr / Lucene:
http://github.com/typesense/typesense
Google Custom Site Search is great, if you don't query it much (I think you get 1k queries/ day for free) or if you're willing to pay.
MySQL's fulltext search is also a great resource (as has been mentioned previously).
Yahoo's BOSS is an intriguing project -- I'm going to give it a shot during my next search project.
And, finally, Lucene is a great resource if you need more power than fulltext, but want to tweak your own search engine. http://lucene.apache.org
I came across the Zoom Search Engine a few days ago and think this might be the simplest search engine I have ever used.
The Windows based tool creates a database of the site, then it also asks you what language (PHP, ASP.NET, JavaScript, etc), you want to use. I picked PHP and it built the PHP code for me. All, I had to do then was upload the files to the server and (optionally) customize the template and site search was working.
This is free to small sites, and the only con I can find is that the spider tool (database builder) has to run on Windows.
Related
I'm having a hard time implementing a search feature for a web based system I’m working on, I first use MySQL Like with %wildcards%, but it not searching what I want to display, then I come upon Full Text index search, it search very good but has an issue on displaying joined multiple tables with foreign key which I don’t know workarounds, then I came along with MySQL with sphinx,
may I ask for any advice the best way/technologies to implement a search feature to search a Complex database tables
Check Apache Solr search server
Apache Solr official website
this technology will solve all your searching related problems
I guess the general answer here is you want a 'search index' - an index specifically for running searches. A repository that has all the required data to answer queries.
A RDBMS (like MySQL) is very good for Normalizing data, setting data up in a compact and easy to update format (ie minimise duplicate) - thats great for storage. But queries suffer as they have to do much more work to 'join' all the required data back.
... but for searching a denormalizaed structre may be best. (bigger, but easier - therefore quicker to 'search'.
There are many ways of doing that.
A materialized view as noted in your other thread php mysql full text search multiple table joined by id - keeps it all in mysql.
Using a external application. There are many examples, Lucene (variants include Solr and ElasticSearch), SphinxSearch, and many more.
This generally work in a similar way - setting up a dedicated copy of the data to make queries easier.
Use an external provider. Ther are many 'search as a service' systems (basically wrappers around the software mentioned in previous posts)
Building your own! Its possible to build a system yourself using just normal mysql tables. Basically an implementation of an inverted index will probably be the easiest.
Which you use is down to personal preference (eg, an external app is more work to setup, but overall is more powerful)
I'm implementing a search through a db that involves a text search inside names in addition to crossing with additional filters. Sphinx seems like a better tool than MySQL full-text search to solve the text search feature, but I'm not sure if it will enable to do the cross selects in addition to the text match field. Does it have such an option? Will MySQL Full-text be more suitable?
It doesnt support joins as such. Because sphinx lives outside mysql - its a seperate system.
But the sphinx index itself, can be built with joins, so when creating the index, you join all the required tables, to put all the relevent data into the index.
Bascially you precompute a index that 'does everything', and jsut filter as required. (we technically sometimes it requires multiple indexes, but only in advanced cases)
Personally would suggest that sphinx is well worth the investment. Will be a bit of work to get it running, but it will pay off both in performance, and flexiblity. Will be able to queries, you dont even imagine right now. (Can you tell I am a really avid fan of sphinx)
I am writing a website which indexes large amounts of data into databases (each with about 800 tables per database), and the website allows you to search the database for various items. Should I use something like lucene or just write my own search algorithm? I am using PHP and MySQL. Although I can filter my SELECT queries, and create a searching algorithm I just wanted to know if I should use Lucene because I am just indexing stuff in a database. Also please do suggest anything that might help me. Forgot to mention that even though I have 800 tables they would be pretty small in size.
Lucene is a mature, tested, open source library.
I would definetly say: try to use it as much as possible, it will probably be better and consume less time then implementing your own library.
If there is a certain functionality that lucene does not provide - you can always create your own variation of lucene to take care of it.
Do not underestimate the importance of the community in using products such lucene: Help is almost always available in lucene's forums [and SO], and the library is constantly tested and maintained because of the large number of users!
Without seeing your data answering this question is very hard, however I can say from personal experience that writing a search of any kind quickly becomes very complex. You have to worry about weighting the various columns you are searching, and search in SQL is almost never as fast as search in a dedicated search engine. At work we are switching from an in house SQL based search to Sphinx Search to search our product catalog because of this very reason.
I need to design a search form and the code behind it.
I'm not very familiar with searches.
My table have the following aspect:
- Table_ads
site_name
ad_type
uri
pagetitle
name_ad
general_description_ad
age
country_ad
location_ad
zone_ad
Initially my idea was to do a search like google, we have a single text box and the button search, but I think this will be difficult. The other option is to build a search by fields(traditional search)
What do you think about this subject. What type of search should I do?
Best Regards,
PS: Sorry my English.
For "google-like" search it's best to use Full-Text Search (FTS) solution.
PostgreSQL 8.3 and newer has a built-in FTS engine, and it will let you do all querying in SQL. Example:
SELECT uri FROM ads WHERE fts ## plainto_tsquery('cereal');
See documentation -> http://www.postgresql.org/docs/current/static/textsearch.html and come back if you have more questions :-)
However, in-database FTS is several times slower than dedicated FTS.
So if you need better performance, you will have to build an index outside of database,
Here I would recommend Sphinx -> http://sphinxsearch.com/docs/current.html, Sphinx integrates smoothly with PHP. You feed it with documents (preferably, in form of special XML docset) and update the index on demand or with some scheduler. Then you do searching directly from PHP (not touching the database).
HTH.
I have asked several questions about Zend and its search functions.
Now after further reading I have noticed that it requires FULL-TEXT indexes in the MySQL fields.
My webhosting provider doesn't allow me to change anything in the my.ini (my.cnf) file, which holds information about minimum length of word to search full-text indexes and more.
So I can't use FULL-TEXT if there is no other way of setting configuration than changing in that file.
Examples of changes are the ft_min_word_len which is by default 4 I think.
I have a table with around 400,000 records, and I need a good search function. It's classifieds btw.
There has to be a way, I just don't know it, so I thought maybe you guys would know.
In the first question I asked regarding Zend I also mentioned I don't have FULLTEXT support, but people suggested Zend anyway.
Can somebody please give me a good explanation of what I should do in my situation?
NOTE: My website is PHP based!
PS: 'LIKE' wont suffice in the searches I need to make. It must be pretty advanced. If you need details about what it should consist of, check my previous Q: Which third party search engine (free) should I use?
Thanks
UPDATE: In two articles, it says Zend "does full-text searches". What do they mean by that? I believe they mean I require full-text indexes!?
Zend_Search in no way requires any full-text searching to be enabled on any database. In fact, Zend_Search is totally independent of any database, as it is a implementation of the Lucene search engine totally in PHP. You should therefore be able to customize it however you wish.
Full text searching is simply the method it uses. So it does do full text searches, but doens't use your database settings (or your database at all)
EDIT
In response to the third comment, Yes, it is in effect a database, but I wouldn't use it as a replacement to a 'true' database as it doesn't have the fields and data integrity support. You can use the UnStored field type so that it only indexes the records, but doesn't store the actual text, so that you can use it in combination with a relational database.
Are you sure that Zend_Search_Lucence requires a fulltext index on your data ? I don't see why it would -- even if I never used it.
This component allows you to do "fulltext searches", but it doesn't mean it uses any fulltext index from the database : it can implement its own fulltext mecanism.
(And, as a matter of facts, it does)
Still, with a database that big (you said you have several hundreds of thousands of records), I would probably change hosting service, getting one that allows me to do whatever I want with my server, includind changing the configuration of MySQL, and installing other software, like Solr or Sphinx.
Maybe it'll cost a bit more (but a dedicated server is not that costly either), but, at least, you'll be able to do what you need with your server...
Using full-text indexes is a bad idea anyway; they're not very good for making a useful search; they only work with MyISAM; they don't scale to big data very well.
Lucene does not use them, nor does any sensible mysql-based app (Sadly bugs.mysql.com does)