Approaches to making custom site search - php

I'm making a social website with lots of different sections, like blogs, galleries, multimedia etc. And now the time has come to implement the search functionality. Customer refused to use google search and insisted on making custom one, where results will be shown for each section individually.
For example, if user enters 'art', the result should be displayed like this:
3 found in blogs
1 ...
2 ...
3 ...
2 found in galleries
1 ...
2 ...
None found in multimedia
I'm planning to use MySQL fulltext search for this. So, the question is: How do I make such search, so it won't kill the server if very many records match the query? I don't really see how to implement paging in this case.

I would highly recommend NOT using MySQL for full text search, it is slow both in index creation and in performing searches.
Take a look at Sphinx or Lucene, both of which are significantly faster than MySQL and which bind quite readily to PHP applications.

You wont kill a mysql server with such a thing, even if your app is huge (we are talking about thousands of queries/sec here) you will just have to set up a replicate of your mysql server dedicaced to search, you may want to build a cache of "popular keyword results" for speeding things up a bit, but appliances likes a googlemini is still the best for that ...

If you can run a Java servlet container (like Tomcat or Jetty), then I recommend Solr (http://lucene.apache.org/solr/). It sits on top of Lucene and is very powerful. Solr was started at CNET and is used by big sites like Netflix and Zappos. Stack Overflow uses a .NET implementation of Lucene. I'm not familiar with Sphinx, so I can't tell you how it compares to Solr.
If you use Solr, look into faceting. This allows you to perform a search and then get a count of how many documents were in "blogs", "galleries", "multimedia', etc.
Here is a PHP client to interface with Solr (http://code.google.com/p/solr-php-client/).

Maybe better decision is use - sphinx

I've done this before on some sites I created. What I have done is run one query against each module to find the results. What you want to do is run a mysql query, and then fetch rows in a while loop rather than using a fetch all. This will make sure you don't over consume memory.
for example:
while($row = mysql_fetch_array($result)){ echo $row['item_name']; }
You will most likely find that MySQL can handle much larger searches than you think.
Pagination is best done with a paging class, like one from code igniter or the like.
Are you using a web frame work?

Yes Sphinx or Lucene, both are good and significantly faster than MySQL and which bind quite readily to PHP applications.

Related

Auto complete movie names

I have the Jquery auto complete currently implemented for searching movie names. I have it starting a 2 characters with a 150ms delay in between.
I have a PHP and Mysql DB behind it that does a like '%term%' search to return the results.
I find this is pretty slow and database intensive.
I tried using Mysql's full text search, but didn't have much luck - perhaps I wasn't using the right match type.
Can someone suggest tweaks to the mysql full text or whether I should go straight to an indexing solution like Lucene or Sphinx and if they work well on partial matches with only 1-3 characters?
You are limited by the speed at which your database returns the results. I can suggest you following to speed up.
Dont make a new connection to mysql from php for every request. Enable database connection pooling. It improves performance quite a lot. I dont know how to do connection pool in php. This might help.
If possible cache the results in php, so that you dont hit the database everytime.
Use an external service to serve data for autocomplete. Look at Autocomplete as a Service. This relieves you writing backend for autocomplete, produces faster results.

Best solution for custom live search task

I'm going to add simple live search to website (tips while entering text in input box).
Main task:
39k plain text lines for search into (~500 length of each line, 4Mb total size)
1k online users can simultaneously typing something in inputbox
In some cases 2k-3k resuts can match user request
I'm worried about the following questions:
Database VS textfile?
Are there any general rules or best practices related to my task aimed for decreasing db/server memory load? (caching/indexing/etc)
Do Sphinx/Solr are appropriate for such task?
Any links/advice will be extremely helpful.
Thanks
P.S. May be this is the best solution? PHP to search within txt file and echo the whole line
Put your data in a database (SQLite should do just fine, but you can also use a more heavy-duty RDBMS like MySQL or Postgres), and put an index on the column or columns that will be searched.
Only do the absolute minimum, which means that you should not use a framework, an ORM, etc. They will just slow down your code.
Create a PHP file, grab the search text and do a SELECT query using a native PHP driver, such as SQLite, MySQLi, PDO or similar.
Also, think about how the search box will work. You can prevent many requests if you e.g. put a minimum character limit (it does not make sense to search only for one or two characters), put a short delay between sending requests (so that you do not send requests that are never used), and so on.
Whether or not to use an extension such as Solr depends on your circumstances. If you have a lot of data, and a lot of requests, then maybe you should look into it. But if the problem can be solved using a simple solution then you should probably try it out before making it more complicated.
I have implemented 'live search' many times, always using AJAX with querying the database (MySQL) and haven't had/observed any speed or large load issues yet.
Anyway I saw an implementations using Solr but cannot suggest whether it was quicker or consumed less resources.
It completely depends on the HW the server will run on, IMO. As I wrote somewhere, I had seen a server with very slow filesystem so implementing live search while reading and parsing from txt files (or using Solr) could be slower than when querying the database. On the other hand You can host on poor shared webhosting with slow DB connection (that gets even slower with more concurrent connections) so this won't be the best solution.
My suggestion: use MySQL with AJAX (look at this jquery plugin or this article), set proper INDEXes on the searched columns and if this is found slow You still can move to a txt file.
In the past, i have used Zend search Lucene with great success.
It is a general purpose text search engine written entirely in PHP 5. It manages the indexing of your sources and is quite fast (in my experience). It supports many query types, search fields, search ranking.

best way to make similar posts php/mysql

I want code not cause load on server to find similar posts in php/mysql
I try with
MATCH (post) AGAINST ('string string')
but it was cause alot of load on server, so it stop my server for
I have over 4,125,274 post in my database
please help mE
While Fulltext index will help, it will be still really slow if you want to load similar items many times. We have an implementation which has about 7 million records of posts with fulltext and it takes maybe up to a minute to search if we only rely on mysql.
A good alternative is having a search server like sphinx http://sphinxsearch.com/ which creates its own indexing and caching and is much much faster.
It is simple and efficient and is used by many big places like urbandictionary, craiglist, mozilla, etc.
If you want to do it in only mysql queries, and if you don't want to do one search many times, try caching the returned IDs on memcached.
Supposing you already have a fulltext index on post and this doesn't help you should consider incorporating a dedicated search engine on your posts, such as Lucene (not necessarily the php implementation though)

Want to replace fulltext, and %foo% search with something better (and faster)

I have a site (built in php) where I have roughly a million records in a mysql table. There is a very complex "advanced" search which allows the data to be searched, sorted, and ordered in hundreds if not thousands of various ways.
Unfortunately, mysql search isn't that good, and is extremely slow. Average search takes 5 seconds currently, and the only way I can make the site function is by caching all the searches for a week (there are over 1.1 million cache files just for searches currently). I have "ghetto fuzzy search" which I implement via the soundex() function.
I wanted to see what I can do about replacing the mysql based search with something a bit faster, and something that would return accurate results. I also need the output to be totally skinnable, as the results page isn't just text, but pictures, and complex css.
I looked at sphinx, but there is no fuzzy search there, which I'd very much like to have.
SQL isn't designed for the kinds of queries you are performing, certainly not at scale. My recommendation would be as follows:
Index all your tables into a Apache Solr Index. Solr is a FOSS search server built using Lucene and can easily query index in milliseconds or less. It supports pretty much any type of query you need: fuzzy, wildcard, proximity etc.
Since Solr offers a RESTful (HTTP) API should easily integrate into whatever platform you are on.

Search engine for website

I'm trying to build a search engine for a website. It's mostly a collection of HTML/CSS pages with some PHP. Now that's all there is. All of my content in on the pages.
From what I understand to be able to do this I would need to have the content on a Database, am I correct?
If so I was considering doing as such, creating a MySQL table with four columns "Keywords" "Titles" "Content" and "Link".
Keywords - will hold the a word that if its in the query will show this as the most likely result.
Titles - after searching Keywords searches the titles produce the most relevant results
Content - should be a last resource for finding something as it will be messier I believe
Link - is just the link that belongs to the particular row.
I will be implementing it with PHP and MySQL, and it will be tiresome to put all the content, titles etc into a db. Is this a good method or should I be looking at something else?
Thanks.
---------------EDIT-------------------
Lucene seems like a good option, however even after reading the Getting started and looking around a bit on the web I cant understand how it works, can someone point me somewhere that explains this in a very very basic manner? Especially taking in consideration I do not know how to compile anything.
Thank you.
Building a search engine from scratch is painful. It is an interesting task, indeed, so if it is for learning, then do it!
However, if you just need a good search function for your web site, please use something that others have done for you. Apache Lucene is one option.
Sphinxsearch is an open-source full-text search server, designed from the ground up with performance, relevance (aka search quality), and integration simplicity in mind.
Sphinx lets you either batch index and search data stored in an SQL database, NoSQL storage, or just files quickly and easily — or index and search data on the fly, working with Sphinx pretty much as a database server.
I'm assuming your pages are static HTML. You can do two things at once and transfer the content of the pages in the DB, so that they will be generated on the fly by reading their content from the DB.
Anyway, I think your strategy is OK at least for a basic search engine. Also have a look into MySQL fulltext search.
MySQL fulltext search will be the easiest to setup but it will be a lot slower than Sphinxsearch. Even Lucene is slower than Sphinx. So if speed is a criteria, I would suggest taking time out to lean and implement Sphinx.
In one of his presentations, Andrew Aksyonoff (creator of Sphinx) presented the following
benchmarking results. Approximately 3.5 Million records with around 5 GB of text were used
for the purpose.
MySQL Lucene Sphinx
Indexing time, min 1627 176 84
Index size, MB 3011 6328 2850
Match all, ms/q 286 30 22
Match phrase, ms/q 3692 29 21
Match bool top-20, ms/q 24 29 13
Apart from a basic search, there are many features that make Sphinx a better solution for
searching. These features include multivalve attributes, tokenizing settings, wordforms,
HTML processing, geosearching, ranking, and many others
Zend Lucene is a pure PHP implementation of search which is quite useful.
Another search option is solr, which is based on lucene, but does a lot of the heavy lifting for you in order to produce more google like results. This is probably your easiest option, besides using Mysql MyISAM fulltext search capabilities.

Categories