Optimize my search engine - php

I am trying to optimize my search engine. Right now, I am running a strcmp between the search words the user entered and keywords stored in the database. I am trying to come up with a way so that the more matches the users search words has with the keywords the sooner it will show up in the search results.
For example, if the user search for "red apple painting" and I have two entries for that item with the following keywords 1. "old apple painting green" 2. "apple painting red new york" I would like the second entry to come up first in the search result because all of the users search words were found in the keywords stored in the db.
Any help on how I can achieve this?

Take a look at full text search.
You may also want to consider an external text search engine such as Lucene or Sphinx.

you need to create an index of words. The index would contain word id, doc id, number of hits, position of hits. Then the searcher will be able to give results like you want. There are free indexing tools available in market. But if you want to develop your own then follow the original paper bt google founders-
http://infolab.stanford.edu/~backrub/google.html

Find applicable keywords with accurate seek site visitors capability.
Create and optimize pages for engines like google and customers alike.
Make sure your internet site is offered to each bots and human beings.
Build applicable links from different notable web sites.

Related

Smart search for Mysql on php in mamp

I have mysql table containing about lots of names of songs.
I want to create smart search, so that i can find specific song from name.
it should return then smart search result like google does.
(not as perfect as google)
I need spell mistake avoidance and ranking according to relevance.
is there any easy way to do this?

Match a word with similar words using Solr?

I want to search for threads in my mysql database with Solr.
But i want it to not just search the thread words, but for similar words.
Eg. if a thread title is "dog for sale" and if the user searches for dogs the title will be in the result.
and also if a user searches for "mac os x" the word "snow leopard" will appear.
and the ability to link words the application thinks is related eg. house and apartment.
how is this kind of logic done?
i know that you can with solr look up words in a dictionary file you create/add, so solr will look for dogs and see what related words there are (eg. dog).
but where do you find such a dictionary?
i have no idea about this kind of implementation.
please point me into right direction.
thanks
I think you'll have to build such a dictionary yourself, since it's very application-specific. "House" and "Apartment" might be similar terms for your application but very distant in another application.
Once you have this dictionary you can use it through the SynonymFilterFactory.
Matching "dog" when the user searches for "dogs" is managed by the stemmer and doesn't require any dictionary.
You could use the synonym.txt file and create your own dictionary.
Another option for you could be fuzzy search.

So i've been recommended Zend search lucene php for search functionality, but is it better than mySql full text?

i've playing around with mysql's full text search and I'm not sure if it is the best.
let's say I search for "how do I book an appointment?" it will give me the correct results from the database, so "how can I book appointments?" I think it is because "book" is an uncommon word.
What if the user searches "how can I schedule an appointment?" I find it will not retrieve any records.
I think it is because there are so many records with appointment in it.
So am I to understand a user can only get information on how to book an appointment if they use "book" in their question?
Lastly, should I be cleaning out words like "how","I" etc... in a mysql full text search? Should I be Stemming words as well?
give a look also to SOLR

Which third party search engine (free) should I use?

As the title says, I need a search engine... for mysql searching.
My website is PHP based.
I was going with sphinx but my hosting company doesn't support full-text indexes!
So a search engine to be used without full-text!
It should be pretty powerful, and must include atleast these functions below:
When searching for 'bmw 520' only matches where these two words come in exactly this order is returned. not matches for only 'bmw' or only '520'.
When searching for 'bmw 330ci' results as the above will be returned, but, WITH AND WITHOUT the ci extension. There are a nr of extensions in cars as you all know (i, ci, si, fi etc).
I want the 'minus sign' to 'exclude' all returns containing the word after the sign, ex: 'bmw -330' will return all 'bmw' results without the '330' ones. (a NOT instead of minus sign is also ok)
all special character accents like 'é' are converted to their simple values, in this case 'e'.
list of words to ignore completely in the search
Thanks guys!
The Zend_Lucene search competent works fairly well. I am not sure how it would cope with your second requirement, however if you customized the tokenized you should be able to do it by treating a change from letters to numbers as a new word.
The one I am really not sure about is the top requirement. Given how it is indexed, order becomes irreverent in the search, so you may not be able to do it without heavy editing of Lucene, writing a filter (using lucene to pull the matches, then checking the order), or writing your own solution. All of these will slow the search down, and add load to your server.
There is also solr, but I have never used it and don't know anything about it. Sphinx was another one, but I see you have already ruled that out.
Xapian is very good (very comprehensive) if you have the time for the initial setup.
It functions as you would expect a search engine to work, tell the indexer what bits of information to index under what namespace/table/object (Page, Profile, Products etc), then issue a query for your users based on keywords, it also supports google style tags e.g. "profile:Mark icecream" would search my profile for the word icecream, i seem to remember it supporting ranges too for data you specify as numeric.
Can be used in local mode which can offer spelling modifications (Did you mean?), or remote mode that many sites can index to and query from.
What really saved me one time was the ability to attach transient non searchable data to an indexed item, e.g. attaching the DB id to all data indexed for that record, very good for then going and getting the whole record from the DB when your matches come back from xapian.
I have used a couple of Search Engines on my site during it's time, but in the next rebuild I'm planning to move to Google Site Search.
There are several reasons for this:
Users are very familiar with the Google style of search result listings which improves usability and hence click-through rates
The Google engine is very good at guessing when to use the page description and when to use a fragment of the page (it also very good at getting relevant fragments compared to some other engines)
It's used by thousands of very popular websites
Google is the most popular search engine around so you know their technology is both reliable and accurate
Google Site Search begins at $100 per annum for 1000 pages or less (and a limit on queries)
or you can use the free Google Custom Search Engine (but this has much less customizability)

PHP 'smart' search engine to search Mysql tables advice

I am creating a search engine for my php based website. I need to search a mysql table.
Thing is, the search engine must be pretty 'smart', so that users can easily find their items (it's a classifieds website).
I have currently set up a FULLTEXT search with this piece of code:
MATCH (headline) AGAINST ($querystring)
But this isn't enough...
For instance, lets say the field headline contains something like Bmw 330ci.
If I search for 330, I wont get any results. The ending ('ci') is just one of many endings in car models which must be taken into account when searching the table.
Or what if the headline field is bmw330? Also no results, because it only matches full words.
Or also, what if the headline is bmw 330, and I search for bmw 520, still with FULLTEXT I will get the bmw 330 as a result, even though I searched for bmw 520... Not good!
How should I solve this problem?
When it comes to fulltext search, people who want free solutions often tend to use either Sphinx or Solr.
I've not used any of those two, but I've read several times that they were great, and easy to use from/with PHP and MySQL.
Don't reinvent the wheel: inverted-index search engine are already there, free of charge, open source, easy and powerful. They have all what you need for such kind of search requirements.
Depending on your context, you can choose between a search library like Apache Lucene or a search platform like Apache Solr or Elastic Search.
All of them have a great documentation and they are widely used. That extremely minimizes the learning curve, even if you never worked with fulltext search world.

Categories