I am building a forum from scratch in PHP. I have used the most of phpBB:s database structure.
But now I am thinking about the search functionality?, what is a good design to be able to search really fast in all posts. I guess there must be some better way than just %query_string% in mysql :)
Maybe explode all sentences into words, let the words be keys in a hash table, and the value is a comma separated list of all the post the word is in? Then there is little more trouble if you delete a post but I think that approach is better.
From start I guess I can use the simple solution, but I dont want to change the code when the forum grows bigger.
Thanks for any ideas or if you can point me to the right direction!
Zend Lucene is a powerful way to add search to a PHP site.
Here's an article about how to do exactly that: Roll Your Own Search Engine with Zend_Search_Lucene
The best option for me today is sphinx search. It can be used with php, rails, perl and until now for me worked like a charm. You can check a php solution. Craiglist for example use it.
Don't reinvent the wheel. Have a look at Lucene. There is also a port for php:
Zend Lucene
Lucene does the parsing and indexing for you and the queries are fast as lightning.
Most forum users will want more than just a string-search. They might not know the exact phrase they need and when they search for "forum search" they would be delighted to find a result for "How to search a forum", which contains the relevant terms but in a different order and separated by other words.
They may also need some fuzzy searching if they don't know the spelling of what they need. They might search for "sequal" and want "sql".
All of this points towards a more complex solution than your like-search.
The most important pointer for now is that whatever you implement, you should make sure it is easy to switch it out in favour of something better later. Make sure your search is hot-swappable as you know you will want to change it later.
Related
We're building an application that has multiple different entities that are pretty simple with name & description and some specific stuff.
Now we want to add tags, for the purpose of adding extra search keywords. (There will also be a tag cloud somewhere, but that's easy)
I've been reading up on different ways to do a proper search. Solutions like lucene, elastic, mysql fulltext match against and more.
Does anyone have any experience to share on the best solution for an application like this?
Should I put the tags in the same table in a string/array field? Use a seperate table? I've also found DoctrineExtensions-Taggable which seems pretty decent and will make it easy to make a tagcloud too.
For a proper search over name, description & tags, what's the best solution? lucene? elastic? myssql? So far I think the FOSElasticaBundle looks the most mature, but not sure how to add search on tags there? (see 1)
Thanks for the advice!
"What's the best..." is usually not a good way to ask here and you will either get very opinionated answers or none at all.
So here is my opinionated answer:
If you have a small set you can surely run with MySQL and MATCH AGAINST (you need to teach Doctrine how to use it!). It's the most straight forward thing to do because you already have Doctrine and just need to teach it a little bit of stuff.
So yeah, Gedmo Taggable and a bit of your own Doctrine Extension and you have your Queries.
If your Searchable Database will get larger you will want to switch to a proper search engine like Elastic or Solr or whatever else.
What you will need there is usually called "Faceted Search" (or in ElasticSearch they are called "Aggregations" nowadays).
More infos about Faceted Search you find on Wikipedia for example.
Yes, a proper Search Engine is cooler and faster and flashy, but if you work on it on a schedule and for the first time, it might not be the best solution.
I want to make a searching option for my site, and for fun I decided I should at least try to make it myself (If I fail, there's always Google Custom Search).
The problem is, I don't even know how to approach this monster! Here are the requirements:
Not all keywords will be required in the search (Should one search for "Big happy world", it would also search for "Big world" "happy world" etc)
Common spelling mistakes considerations (from a database, via edit difference or a predefined list of common mistakes (rather then => rather than, etc).
Search in both content and titles of posts, with an emphesis on titles.
Don't suck
I've searched my old pal Google for it, but the only reasonable things I found were academic level papers on the subject (English isn't my native, I'm good but not that good =( ).
So in short: does anyone know of a good place to start, a tutorial, an article, an example?
Thanks in advance.
There are several options you could try:
Apache Lucene (A PHP based implementation exists in the Zend Framework)
ElasticSearch (provides a REST-like API on top of Lucene)
Xapian
Sphinx
Probably a bunch of others too.
If you want to create your own search engine, apache lucene is a mature open source library that can take care of a big part of the functionality for you.
Using lucene, you first index your information [using an IndexWriter]. This is done off line, to create the index.
On serach - you use an IndexSearcher to find documents that match your query.
If you want some theoretical knowledge on "how it works", you should read more on information retrieval. A good place to start is stanford's introduction to information retrieval
Sorry for the long title but couldn't think of a good way to put it really - i'm currently working on a large web app project and one of the main features is the detailed search, without saying too much about the project it is used to find business related deals - the search function is spread over 3 pages currently and offers pretty much every option you'd want if you were in the industry...
But the problem i've got now is that is a lot of fields and so when it comes to searching for matches in the db i don't really know the best way forward i don't think a standard mysql like is going to cut it here also i need to be able figure out how much of a fit (good match) each result is and then display that in the results (search result 1 is a 90% fit etc)
Does anyone know which is the best way to tackle this ? i know there are external search engines etc out there but don't know anything about them really to make any sort of logical choice...
Thanks !
Finding relevance in search is a complex topic that deals with many parameters. The MySQL match() search itself is pretty complex as you can see here. Perhaps you could use this score itself as your measure. You can customize this to some extent.
Another option as you mentioned is to use external search engines, something on the lines of Solr. It has all the requirements you are looking for. Its fast, scalable and able to provide customizing options to improve "relevance" for your specific needs.
I want to implement a powerful search engine for my ecommerce application. im using php and mysql as database. Can anyone guide me how to proceed? Is the FULL TEXT feature of MYSQL good for a large volume of data?
Thanks!
IMHO, the MySQL Full text engine is a really poor choice.
Firstly, the number of parameters to tweak the search is almost 0.
Secondly, from my experiencem it doesn't scale.
You might consider using
Sphinx
Lucene
Lucene is said to be the industry standard project. They have solr if you want to have a separate architecture.
They are far more advanced and perform better.
This should get you started, however you will have to modify or expand on the idea.
For the second part of your question, have a look at:
Pros & Cons of Full Text Search
Recently, for an app handling a huge amount of data, we have given up both MySQL FULL TEXT and Lucene to switch on PostgreSQL which has a much more powerful native FULL TEXT engine. At least, it was what the results of our investigations said.
Take a look at the Zend_Lucene from Zend_Framework and a new feature for mysql full text search here
I have a site that lists movies. Naturally people make spelling mistakes when searching for movies, and of course there is the fact that some movies have apostrophes, use letters to spell out numbers in the title, etc.
How do I get my search script to overlook these errors? Probably need something that's a little more intelligent than WHERE mov_title LIKE '%keyword%'.
It was suggested that I use a fulltext search engine, but all of those things look really complicated, and I feel that building them into my application will be like hell on earth. If I do have to use one, what's the least invasive one, that will be most painless to implement into existing code?
I think you'll have to implement an external fulltext search engine. MySQL just isn't good at fulltext search. I'd say you should give Lucene a go (tutorials). Zend Framework has an API that plugs into Lucene, making it easier to learn and utilize.
Presuming that you use MySQL - MySQL has no in-built functionality that is capable of doing this.
This means you will have to implement a full-text search yourself, or use a third party full text search tool.
If you implement it yourself, you should look into the metaphone or double metaphone algorithms (I'd recommend them over soundex, which is not nearly as good at this type of task), to store phoenetic representations of all your words. However, building your own full text search is no task for the faint-hearted. Don't attempt it if you don't consider yourself a database wizard.
If you want a third party tool, Lucene is the way to go. It is ported into tons of different languages/platforms including PHP - you don't have to use Java.
I've used neither php nor mysql, but an alternative to full text search might be soundex searches.