Making search facility - php

I want to make a search facility in my website.I'm using php..
What criteria should be taken for searching.
For ex: if someone searches
How to make soap
I can use many approaches for the search like finding database entries having exactly the same search string
or finding the database entries in the order of search keywords(ie . entry with search string "How" +"Soap" will have less preference than entry having search string "how soap make")...
So what is the algorithm generally used for searching.?
Also what is meant by full text search?

This is kind of a big subject for a simple answer, but I think what you mean is how to run complex fulltext searches on MySQL. In other words, this is really a MySQL question, not a PHP one.
Basically, you need to:
1. Create a fulltext index on a text field in your database.
2. Run queries on that database field using MySQL's fulltext syntax.
The basic syntax for querying a fulltext indexed table in MySQL is:
SELECT * FROM table
WHERE MATCH (fulltextfield)
AGAINST ('my search phrase');
There's a lot more to it than that, but the MySQL documentation is the place to go: http://dev.mysql.com/doc/refman/5.0/en/fulltext-natural-language.html
If you want to do really advanced fulltext searches, a good recommendation is Sphinx, but that's probably way more advanced than you need.

Related

faster way for Search in multiple databases

I am working on big eCommerce shopping website. I have around 40 databases. i want to create search page which show 18 result after searching by title in all databases.
(SELECT id_no,offers,image,title,mrp,store from db1.table1 WHERE MATCH(title) AGAINST('$searchkey') AND title like '%$searchkey%')
UNION ALL (SELECT id_no,offers,image,title,mrp,store from db3.table3 WHERE MATCH(title) AGAINST('$searchkey') AND title like '%$searchkey%')
UNION ALL (SELECT id_no,offers,image,title,mrp,store from db2.table2 WHERE MATCH(title) AGAINST('$searchkey') AND title like '%$searchkey%')
LIMIT 18
currently i am using the above query its working fine for 4 or more character keyword search like laptop nokia etc but takes 10-15 sec for processes but for query with keyword less than 3 characters it takes 30-40sec or i end up with 500 internal server error. Is there any optimized way for searching in multiple databases. I generated two index primary and full text index with title
Currently my search page is in php i am ready to code in python or any
other language if i gets good speed
You can use the sphixmachine:http://sphinxsearch.com/. This is powerfull search for database. IMHO Sphinx this best decision
for search in your site.
FULLTEXT is not configured (by default) for searching for words less than three characters in length. You can configure that to handle shorter words by setting a ...min_token_size parameter. Read this. https://dev.mysql.com/doc/refman/5.7/en/fulltext-fine-tuning.html You can only do this if you control the MySQL server. It won't be possible on shared hosting. Try this.
FULLTEXT is designed to produce more false-positive matches than false-negative matches. It's generally most useful for populating dropdown picklists like the ones under the location field of a browser. That is, it requires some human interaction to choose the correct record. To expect FULLTEXT to be able to do absolutely correct searches is probably a bad idea.
You simply cannot use AND column LIKE '%whatever%' if you want any reasonable performance at all. You must get rid of that. You might be able to rewrite your python program to do something different when the search term is one or two letters, and thereby avoid many, but not all, LIKE '%a%' and LIKE '%ab%' operations. If you go this route, create ordinary indexes on your title columns. Whatever you do, don't combine the FULLTEXT and LIKE searches in a single query.
If this were my project I'd consider using a special table with columns like this to hold all the short words from the title column in every row of each table.
id_pk INT autoincrement
id_no INT
word VARCHAR(3)
Then you can use a query like this to look up short words
SELECT a.id_no,offers,image,title,mrp,store
FROM db1.table1 a
JOIN db1.table1_shortwords s ON a.id_no = s.id_no
WHERE s.word = '$searchkey'
To do this, you will have to preprocess the title columns of your other tables to populate the shortwords tables, and put an index on the word column. This will be fast, but it will require a special-purpose program to do the preprocessing.
Having to search multiple tables with your UNION ALL operation is a performance problem. You will be able to improve performance dramatically by redesigning your schema so you need search only one table.
Having to search databases on different server machines is a performance problem. You may be able to rig up your python program to search them in parallel: that is, to somehow use separate tasks to search each one, then aggregate the results. Each of those separate search tasks requires its own connection to the data base, so this is not a cheap or simple solution.
If this system faces the public web, you will have to redesign it sooner or later, because it will never perform well enough as it is now. (Sorry to be the bearer of bad news.) Many system designers like to avoid redesigning systems after they become enormous. So, if I were you I would get the redesign done.
If your focus is on searching, then bend the schema to facilitate searching rather than the other way around.
Collect all the strings to search for in a single table. Whereas a UNION of 40 tables does work, it will be ~40 times as slow as having the strings collected together.
Use FULLTEXT when the words are long enough, use some other technique when they are not. (This addresses your 3-char problem; see also the Answer discussing innodb_ft_min_token_size. You are using InnoDB, correct?)
Use + and boolean mode to say that a word is mandatory: MATCH(col) AGAINST("+term" IN BOOLEAN MODE)
Do not add on a LIKE clause unless there is a good reason.

PHP/SQL Check if a similar entry already exists in the database

I have a search on my website and im trying to show a list of the most popular search terms on my site it sort of works but it isn't matching the strings close enough.
This is what i'm currently using:
$sql = "SELECT * FROM db WHERE r_name LIKE '%".$searchname."%' OR r_number like '%".$searchname."%'
However if a user searches for say Game Name and another searches for Game Name (Reviews) it will add 2 entries into my database, Is there a way to do a similarity test before entering the entry ?
yes, there is a way but it requires you to fiddle with it a little. there is a search called
SOUNDEX, which will match things that are very close. It might not be a perfect solution to your question, but it is definitely something that might get you started in the right direction.
SELECT * FROM db WHERE SOUNDEX( db.r_name ) LIKE SOUNDEX( '{$searchname}' );
I believe that if you have an entry 'lowercasexd' , and do soundex like('what is lowercasexd?'), it will find the entry that's associated with 'lowercasexd'.
Be aware that this type of search take a little while to run compare to '=' searches on indexed databases(on my database it does about 5-6k entries per second) so it is NOT recommended for anything big. If you want a near-perfect solution, I suggest you read about google's search mechanism, and look up some search engine source code if your project is significant enough.

Matching a user entered title to a category - large INNODB database

I have a large INNODB database with over 2 million products on it. The 'products' table has the following fields: id,title,description,category.
There is also a MyISAM table called 'category' that contains a list of all categories used on the website. This has the following fields: id,name,keywords,parentid.
My question is more about the logic rather than code, but what I am trying to achieve is as follows:
When a user lists a new product on the site, as they are typing the description it should try to work out what category to put the product in (with good accuracy).
I tried this initially by using MySQL MATCH() to match the entered title against a list of keywords in the category table, but this was far from accurate.
A better idea seems to be to match the user entered title against titles for products already in the database, grouping them by the category they are in and then sorting them by the largest group. However, on an INNODB database I obviously can't use fulltext, and with 2mill items I think it would be pretty slow anyway?
How would you do it - I guess it would need to be a similar way to how stackoverflow displays similar questions?
A fulltext index on 2 million records is a valid option, if you are running on a decent server. The inital indexing will take a while, that's for sure, but searches should be reasonably fast, MySQL can take it.
InnoDB supports fulltext indexes as of v5.6.4. You should consider upgrading.
If upgrading is not an option, please see this previous answer of mine where I suggest a workaround.
For your use case, you may want to take a look at the WITH QUERY EXPANSION option:
It works by performing the search twice, where the search phrase for the second search is the original search phrase concatenated with the few most highly relevant documents from the first search. Thus, if one of these documents contains the word “databases” and the word “MySQL”, the second search finds the documents that contain the word “MySQL” even if they do not contain the word “database”

MySql select question

I have a search engine. I have to select some data from a table when a user types in the search keywords.
I want to find an alternative to 'LIKE':
SELECT id,text FROM example WHERE text LIKE '$search'
Because the text column has usually loads of words in it and the search term always contains a few words, I don't get accurate results.
Is there any other way of doing this?
It's called full-text indexing, but currently it's not supported in InnoDB, only in MyISAM. Alternative is to use third-party indexing, like Lucene, Solr (which provides web service access on top on Lucene), Sphinx...
If your table is MyISAM, you can enable full text searching:
ALTER TABLE table ADD FULLTEXT idx_text (`text`);
You could take a look at the MySQL Fulltext mechanism, it provides natural language searches in a fairly easy to use way.
Usually a search engine is using a tree data structure for fast lookups and some sort of a graph to weight the search result. Maybe you want to look into a trie data structure and space-filling-curve. The latter is useful if you want to compare 2 documents. For example if you sort and count all the words you can do a heat map.

How can I search for multiple terms in multiple table columns?

I have a table that lists people and all their contact info. I want for users to be able to perform an intelligent search on the table by simply typing in some stuff and getting back results where each term they entered matches at least one of the columns in the table. To start I have made a query like
SELECT * FROM contacts WHERE
firstname LIKE '%Bob%'
OR lastname LIKE '%Bob%'
OR phone LIKE '%Bob%' OR
...
But now I realize that that will completely fail on something as simple as 'Bob Jenkins' because it is not smart enough to search for the first an last name separately. What I need to do is split up the the search terms and search for them individually and then intersect the results from each term somehow. At least that seems like the solution to me. But what is the best way to go about it?
I have heard about fulltext and MATCH()...AGAINST() but that sounds like a rather fuzzy search and I don't know how much work it is to set up. I would like precise yes or no results with reasonable performance. The search needs to be done on about 20 columns by 120,000 rows. Hopefully users wouldn't type in more than two or three terms.
Oh sorry, I forgot to mention I am using MySQL (and PHP).
I just figured out fulltext search and it is a cool option to consider (is there a way to adjust how strict it is? LIMIT would just chop of the results regardless of how well it matched). But this requires a fulltext index and my website is using a view and you can't index a view right? So...
I would suggest using MATCH / AGAINST. Full-text searches are more advanced searches, more like Google's, less elementary.
It can match across multiple tables and rank them to how many matches they have.
Otherwise, if the word is there at all, esp. across multiple tables, you have no ranking. You can do ranking server-side, but that is going to take more programming/time.
Depending on what database you're using, the ability to do cross columns can become more or less difficult. You probably don't want to do 20 JOINs as that will be a very slow query.
There are also engines such as Sphinx and Lucene dedicated to do these types of searches.
BOOLEAN MODE
SELECT * FROM contacts WHERE
MATCH(firstname,lastname,email,webpage,country,city,street...)
AGAINST('+bob +jenkins' IN BOOLEAN MODE)
Boolean mode is very powerful. It might even fulfil all my needs. I will have to do some testing. By placing + in front of the search terms those terms become required. (The row must match 'bob' AND 'jenkins' instead of 'bob' OR 'jenkins'). This mode even works on non-indexed columns, and thus I can use it on a view although it will be slower (that is what I need to test). One final problem I had was that it wasn't matching partial search terms, so 'bob' wouldn't find 'bobby' for example. The usual % wildcard doesn't work, instead you use an asterisk *.

Categories