How do I get this lightning fast search? - php

I just came over this site: http://www.hittaplagget.se. If you enter the following search word moo, the autosuggest pops up immediately.
But if you go to my site, http://storelocator.no, and use the same search phrase (in "Search for brand" field), it takes a lot longer for autosuggest to suggest anything.
I know that we can only guess on what type of technology they are using, but hopefully someone here can do an educational guess better than I can.
In my solution I only do a SELECT moo% FROM table and return the results.
I have yet not indexed my table as there are only 7000 rows in it. But I'm thinking of indexing my tables using Lucene.
Can anyone suggest what I need to do in order to get equally fast autosuggest?

You must add an index on the column holding your search terms, even at 7000 - otherwise, the database searching through the whole list every time. See http://dev.mysql.com/doc/refman/5.0/en/create-index.html.

Lucene is a full text search index and may or may not be what you're looking for. Lucene would find any occurrence of "moo" in the entire indexed column (e.g. Mootastic and Fantasticmoo) and does not necessarily speed up your search although it's faster than a where x like '%moo%' type of search.
As others have already pointed out a regular index (probably even unique?) is what you want if you're performing "starts with" type of searches.

You will need to table-scan the table, so I suggest:
Don't put any rows in the table you don't need - for example, "inactive" records - keep them in a different table
Don't put any columns in the table you don't need
You can achieve this by having a special "Search table" which just contains the rows/columns you're interested in, and updating it from the "Master table".
Table-scanning a 7000 row table should be extremely efficient if the rows are small; I understand from your problem domain that this will be the case.
But as others have pointed out - don't send the 7000 rows to the client-side when it doesn't need it.
A conventional index can optimise a LIKE 'someprefix%' into a range-scan, so it is probably helpful having one. If you want to search for the string in any part of the entry, it is going to be a table-scan (which should not be slow on such a tiny table!)

Related

Search Methods For Showing Results Similar to Item Entered

I am building a data driven website (using PHP and MySQL) for some farmers around the community (well, was - you will see in a second). They wanted to be able to list their products and have people search those products and their names come up along with a link to a page detailing all of their produce.
While I knew it would be a long list, I thought, "Since every search into a mysql database is picky - including case sensitive - I'll just make a list in alphabetical order and people can choose from a dropdown box what they want to search for. No problem."
Well, now it's a problem. He has expanded the parameters of the site. He now wants to included hand made and home made products. Needless to say we went from a few dozen to hundreds of potential products and now a dropdown list is no longer feasible. But if I use a text field for visitors to search the site, unless they type it with no spelling errors and use the same case, they won't get accurate results from the search.
Can anyone recommend another method? I am aware of the "LIKE" search, but it doesn't really solve my problem - especially since it could create false positives in the search. Any help would be appreciated, thanks!
Well, the question is somewhat vague, since you are talking about multiple search parameters.
It's more a design choice. For example, consider the following:
For items such as "homemade" or "handmade", perhaps underneath the input field, you should have a checkbox where people can add additional flags to the search
say, search by name "John Smith" and check off "handmade" and "homemade"
The "handmade" and "homemade" fields in the database will always be the same (either on / off)
So, an sql query might be like this:
SELECT * FROM products WHERE farmername LIKE '%$search%' AND handmade = 'handmade'
Or, when inputting the data, if handmade is checked, insert an integer into the handmade, where 1 would mean handmade was checked, 0 handmade was not checked
so your query would then go AND handmade = 1 (or 0) for not handmade
These are just some ideas to get you going, but this is more a design decision than a database decision, how do you create your tables to use the flags
I would use two tables:
one with all possible search terms
one with synonyms of any terms that were applicable (EG "handmade" and "homemade" )
Use AJAX to search for values from the first table as characters are entered in a text box. Return a list of possible terms using:
select term from search_table where term like '%<input string>%'
Only start returning values when you have less than 10 hits or so. (IE don't populate when they enter 2 letters ). Then when a particular term is entered, search in the second table for synonyms and include those with the search. In the results page, indicate that you included the synonyms and maybe put an 'X' by each to opt to re-search with those excluded.
Note that 'like' is case-insensitive.

Autocomplete concept

I'm programming a search engine for my website in PHP, SQL and JQuery. I have experience in adding autocomplete with existing data in the database (i.e. searching article titles). But what about if I want to use the most common search queries that the users type, something similar to the one Google has, without having so much users to contribute to the creation of the data (most common queries)? Is there some kind of open-source SQL table with autocomplete data in it or something similar?
As of now use the static data that you have for auto complete.
Create another table in your database to store the actual user queries. The schema of the table can be <queryID, query, count> where count is incremented each time same query is supplied by some other user [Kind of Rank]. N-Gram Index (so that you could also auto-complete something like "Manchester United" when person just types "United", i.e. not just with the starting string) the queries and simply return the top N after sorting using count.
The above table will gradually keep on improving as and when your user base starts increasing.
One more thing, the Algorithm for accomplishing your task is pretty simple. However the real challenge lies in returning the data to be displayed in fraction of seconds. So when your query database/store size increases then you can use a search engine like Solr/Sphinx to search for you which will be pretty fast in returning back the results to be rendered.
You can use Lucene Search Engiine for this functionality.Refer this link
or you may also give look to Lucene Solr Autocomplete...
Google has (and having) thousands of entries which are arranged according to (day, time, geolocation, language....) and it is increasing by the entries of users, whenever user types a word the system checks the table of "mostly used words belonged to that location+day+time" + (if no answer) then "general words". So for that you should categorize every word entered by users, or make general word-relation table of you database, where the most suitable searched answer will be referenced to.
Yesterday I stumbled on something that answered my question. Google draws autocomplete suggestions from this XML file, so it is wise to use it if you have little users to create your own database with keywords:
http://google.com/complete/search?q=[keyword]&output=toolbar
Just replacing [keyword] with some word will give suggestions about that word then the taks is just to parse the returned xml and format the output to suit your needs.

MYSQL search database for similar results

Essentially what I want to do is search a number of MYSQL databases and return results where a certain field is more than 50% similar to another record in the databases.
What am I trying to achieve?
I have a number of writers who add content to a network of websites that I own, I need a tool that will tell me if any of the pages they have written are too similar to any of the pages currently published on the network. This could run on post/update or as a cron... either way would work for me.
I've tried making something with php, drawing the records from the database and using the function similar_text(), which gives a % difference between two strings - this however is not a workable solution as you have to compare every entry against every other entry & I worked out with microtime that it would take around 80 hours to completely search all of the entries!
Wondering if it's even possible!?
Thanks!
You are probably looking for is SOUNDEX. It is the only sound based search in mysql. If you have A LOT of data to compare, you're probably going to need to pregenerate the soundex and compare the soundex columns or use it live like this:
SELECT * FROM data AS t1 LEFT JOIN data AS t2 ON SOUNDEX(t1.fieldtoanalyse) = SOUNDEX(t2.fieldtoanalyse)
Note that you can also use the
t1.fieldtoanalyze SOUNDS LIKE t2.fieldtoanalyze
syntax.
Finaly, you can save the SOUNDEX to a column, just create a column and:
UPDATE data SET fieldsoundex = SOUNDEX(fieldtoanalyze)
and then compare live with pregenerated values
More on Soundex
Soundex is a function that analyzes the composition of a word but in a very crude way. It is very useful for comparisons of "Color" vs "Colour" and "Armor" vs "Armour" but can also sometimes dish out weird results with long words because the SOUNDEX of a word is a letter + a 3 number code. There is just so much you can do sadly with these combinations.
Note that there is no levenstein or metaphone implementation in mysql... not yet, but probably, levenstein would have been the best for your case.
Anything is possible.
Without knowing your criteria for similar, it's difficult to offer a specific solution. However, my suggestion would be pre-build a similarity table, utilize a function such as similar_text(). Use this as your index table when searching by term.
You'll take an initial hit to build such an index. However, you can manage it easier as new records are added.
Thanks for your answers guys, for anyone looking for a solution to a problem similar to this I used the SOUNDEX function to pull out entries that had a similar title then compared them with the similar_text() function. Not quite a complete database comparison, but near as I could get it!

building a search for a particular table

I have a search text field which searches a particular column in a table, but the results are not as I expected.
When the user tries to search like "hello world how", he will not find a result as I the query is LIKE '%hello world how%'. The table row contains the string "hello world".
How do I do a proper search using php and mysql, and what if I need to do a search on multiple tables/ all columns in a table? Which is the best way to do it?
One bad way to do this would be to split the user's search text on whitespace. "hello world how" would become "hello%world%how". However, this still would require the word "how" to be in there, after "hello world", and would not guarantee that "hello" and "world" are near each other.
Further, even if you've put an index on the column being searched, a LIKE clause that stars with a wildcard character (%) can not use that index in MySQL. This means that every search would be a full table scan. That can get pretty slow.
One solution for you might be MySQL fulltext search. It's pretty flexible. However, it only works with MyISAM tables. You probably want to use InnoDB for your data, because MyISAM does not support foreign keys or transactions. You can create separate, dedicated tables that use MyISAM and fulltext indexing, just for searching purposes.
Sphinx is another option available to you. It's a third party search system that can be attached to MySQL and PHP.
All of these answers, however, are focused around searching one column at a time. If you need to search entire rows, that becomes a little more interesting. You might want to consider a "document"-based search system, like ElasticSearch.

Is Full Text search the answer?

OK I have a mySQL Database that looks something like this
ID - an int and the unique ID of the recorded
Title - The name of the item
Description - The items description
I want to search both title and description of key words, currently I'm using.
SELECT * From ‘item’ where title LIKE %key%
And this works and as there’s not much in the database, as however searching for “this key” doesn’t find “this that key” I want to improve the search engine of the site, and may be even add some kind of ranking system to it (but that’s a long time away).
So to the question, I’ve heard about something called “Full text search” it is (as far as I can tell) a staple of database design, but being a Newby to this subject I know nothing about it so…
1) Do you think it would be useful?
And an additional questron…
2) What can I read about database design / search engine design that will point me in the right direction.
If it’s of relevance the site is currently written in stright PHP (I.E. without a framework) (thro the thought of converting it to Ruby on Rails has crossed my mind)
update
Thanks all, I'll go for Fulltext search.
And for any one finding this later, I found a good tutorial on fulltext search as well.
The problem with the '%keyword%' type search is that there is no way to efficiently search on it in a regular table, even if you create an index on that column. Think about how you would look that string up in the phone book. There is actually no way to optimize it - you have to scan the entire phone book - and that is what MySQL does, a full table scan.
If you change that search to 'keyword%' and use an index, you can get very fast searching. It sounds like this is not what you want, though.
So with that in mind, I have used fulltext indexing/searching quite a bit, and here are a few pros and cons:
Pros
Very fast
Returns results sorted by relevance (by default, although you can use any sorting)
Stop words can be used.
Cons
Only works with MyISAM tables
Words that are too short are ignored (default minimum is 4 letters)
Requires different SQL in where clause, so you will need to modify existing queries.
Does not match partial strings (for example, 'word' does not match 'keyword', only 'word')
Here is some good documentation on full-text searching.
Another option is to use a searching system such as Sphinx. It can be extremely fast and flexible. It is optimized for searching and integrates well with MySQL.
You might also consider Zend_Lucene. It's slightly easier to integrate than Sphinx, because it is pure PHP.
I would guess that MySQL fulltext is sufficient for your needs, but it's worth noting that the built in support doesn't scale very well. For average size documents it starts to become unusable for table sizes as small as a few hundred thousand rows. If you think that this might become a problem further on you should probably look into Sphinx already. It's becoming the defacto standard for MYSQL-users, even though I personally prefer to implement my own solution using java lucene. :)
Also, I'd like to mention that full text search is fundamentally different from the standard LIKE '%keyword%'-search. Unlike the LIKE-search full text indexing allows you to search for several keywords that doesn't have to appear right next to each other. Standard search engines such as google are full text search engines, for example.

Categories