MySql select question - php

I have a search engine. I have to select some data from a table when a user types in the search keywords.
I want to find an alternative to 'LIKE':
SELECT id,text FROM example WHERE text LIKE '$search'
Because the text column has usually loads of words in it and the search term always contains a few words, I don't get accurate results.
Is there any other way of doing this?

It's called full-text indexing, but currently it's not supported in InnoDB, only in MyISAM. Alternative is to use third-party indexing, like Lucene, Solr (which provides web service access on top on Lucene), Sphinx...

If your table is MyISAM, you can enable full text searching:
ALTER TABLE table ADD FULLTEXT idx_text (`text`);

You could take a look at the MySQL Fulltext mechanism, it provides natural language searches in a fairly easy to use way.

Usually a search engine is using a tree data structure for fast lookups and some sort of a graph to weight the search result. Maybe you want to look into a trie data structure and space-filling-curve. The latter is useful if you want to compare 2 documents. For example if you sort and count all the words you can do a heat map.

Related

faster way for Search in multiple databases

I am working on big eCommerce shopping website. I have around 40 databases. i want to create search page which show 18 result after searching by title in all databases.
(SELECT id_no,offers,image,title,mrp,store from db1.table1 WHERE MATCH(title) AGAINST('$searchkey') AND title like '%$searchkey%')
UNION ALL (SELECT id_no,offers,image,title,mrp,store from db3.table3 WHERE MATCH(title) AGAINST('$searchkey') AND title like '%$searchkey%')
UNION ALL (SELECT id_no,offers,image,title,mrp,store from db2.table2 WHERE MATCH(title) AGAINST('$searchkey') AND title like '%$searchkey%')
LIMIT 18
currently i am using the above query its working fine for 4 or more character keyword search like laptop nokia etc but takes 10-15 sec for processes but for query with keyword less than 3 characters it takes 30-40sec or i end up with 500 internal server error. Is there any optimized way for searching in multiple databases. I generated two index primary and full text index with title
Currently my search page is in php i am ready to code in python or any
other language if i gets good speed
You can use the sphixmachine:http://sphinxsearch.com/. This is powerfull search for database. IMHO Sphinx this best decision
for search in your site.
FULLTEXT is not configured (by default) for searching for words less than three characters in length. You can configure that to handle shorter words by setting a ...min_token_size parameter. Read this. https://dev.mysql.com/doc/refman/5.7/en/fulltext-fine-tuning.html You can only do this if you control the MySQL server. It won't be possible on shared hosting. Try this.
FULLTEXT is designed to produce more false-positive matches than false-negative matches. It's generally most useful for populating dropdown picklists like the ones under the location field of a browser. That is, it requires some human interaction to choose the correct record. To expect FULLTEXT to be able to do absolutely correct searches is probably a bad idea.
You simply cannot use AND column LIKE '%whatever%' if you want any reasonable performance at all. You must get rid of that. You might be able to rewrite your python program to do something different when the search term is one or two letters, and thereby avoid many, but not all, LIKE '%a%' and LIKE '%ab%' operations. If you go this route, create ordinary indexes on your title columns. Whatever you do, don't combine the FULLTEXT and LIKE searches in a single query.
If this were my project I'd consider using a special table with columns like this to hold all the short words from the title column in every row of each table.
id_pk INT autoincrement
id_no INT
word VARCHAR(3)
Then you can use a query like this to look up short words
SELECT a.id_no,offers,image,title,mrp,store
FROM db1.table1 a
JOIN db1.table1_shortwords s ON a.id_no = s.id_no
WHERE s.word = '$searchkey'
To do this, you will have to preprocess the title columns of your other tables to populate the shortwords tables, and put an index on the word column. This will be fast, but it will require a special-purpose program to do the preprocessing.
Having to search multiple tables with your UNION ALL operation is a performance problem. You will be able to improve performance dramatically by redesigning your schema so you need search only one table.
Having to search databases on different server machines is a performance problem. You may be able to rig up your python program to search them in parallel: that is, to somehow use separate tasks to search each one, then aggregate the results. Each of those separate search tasks requires its own connection to the data base, so this is not a cheap or simple solution.
If this system faces the public web, you will have to redesign it sooner or later, because it will never perform well enough as it is now. (Sorry to be the bearer of bad news.) Many system designers like to avoid redesigning systems after they become enormous. So, if I were you I would get the redesign done.
If your focus is on searching, then bend the schema to facilitate searching rather than the other way around.
Collect all the strings to search for in a single table. Whereas a UNION of 40 tables does work, it will be ~40 times as slow as having the strings collected together.
Use FULLTEXT when the words are long enough, use some other technique when they are not. (This addresses your 3-char problem; see also the Answer discussing innodb_ft_min_token_size. You are using InnoDB, correct?)
Use + and boolean mode to say that a word is mandatory: MATCH(col) AGAINST("+term" IN BOOLEAN MODE)
Do not add on a LIKE clause unless there is a good reason.

Better performance searching SQL table with 170,000 rows

I have SQL table with 170,000 rows. Each row has column for string long approximately 600 characters.
I want to list all the rows, that contains searched keyword.
Using LIKE '% keyword %' takes about 1000ms. My app is build in Laravel using Eloquent.
Do you have any ideas what way would be the best for performane? I need to have options for searching case sensitive/insesitive, accent sensitive/insesitive, exact phrase or just multiple words in random order. So when I tried TNTSearch, the performance was excellent, but with not so much options.
Also, I tried to create index and Match Against function in my query, but there are also some limitations.
Define FULLTEXT indexes on the columns which you want to be able to search to greatly increase search speed. Only works on MyISAM and InnoDB tables though.
Google it or have a look here
Like queries that begin with a wildcard cannot take advantage of indexes. Performance will continue to degrade as your table size grows.
I would recommend one of the following options for improving performance:
You can use Laravel Scout
Laravel Scout provides a simple, driver based solution for adding full-text search to your Eloquent models.
Out of the box Scout supports Algolia, but there are other drivers available as well, including TNTSearch
https://github.com/teamtnt/laravel-scout-tntsearch-driver
You can use a fulltext index to improve search performance.
Eloquent does not support fulltext search out of the box, but there are a few third party packages that add support.
Ex:
https://github.com/jarektkaczyk/eloquence-base
https://github.com/swisnl/laravel-fulltext
I'd recommend using SOLR in combination with Solarium: https://solarium.readthedocs.io/en/latest/

How to find 'similar' records in a MySQL table based on 'title' and 'description' columns?

I have a MySQL table storing some user generated content. For each piece of content, I have a title (VARCHAR 255) and a description (TEXT) column.
When a user is viewing a record, I want to find other records that are 'similar' to it, based on the title/description being similar.
What's the best way to go about doing this? I'm using PHP and MySQL.
My initial ideas are:
1) Either to strip out common words from the title and description to be left with 'unique' keywords, and then find other records which share those keywords.
E.g in the sentence: "Bob woke up at 5 am and went to school", the keywords would be: "Bob, woke, 5, went, school". Then if there's another record whose title talks about 'bob' and 'school', they would be considered 'similar'.
2) Or to use MySQL's full text search, though I don't know if this would be any good for something like this?
Which method would be better out of the two, or is there another method which is even better?
I'll keep this short (it could be way too long)...
I would not select they keywords 'manually' or modify your original data.
MySQL supports full text search with MyISAM (not InnoDB) engine. A full description of the options available when querying the DB are available here. The query can automatically get rid of common stop-words and words too common in the data set (more than 50% of the rows contains them) depending on the querying method. Query expansion is also available and the query type should be decided depending on your needs.
Consider also using a separate engine like Lucene. With Lucene you will probably have more functionalities and better indexing/searching. You can automatically get rid of common words (they get a low score and do not influence the search) and use things as stemming for instance. There is a little bit of a learning curve but I'll definitely look into it.
EDIT:
The MySQL 'full-text natural language search' returns the most similar rows (and their relevance score) and is not a boolean matching search.
You would start by defining what similar means to you and how you want to score the similarity between two different documents.
Using that algorithm you can processing all your documents and build a table of similarity scores.
Depending on the complexity of your scoring algorithm and size of data set, this may not be something you would run realtime, but instead batch it through something like Hadoop.
I have done something like this. I replace all of the spaces in the string with % then use LIKE in the where clause. Here, I will give you my code. It is from MSSQL but minor adjustments can be made to work it with MySQL. Hope it helps.
CREATE FUNCTION [dbo].[fss_MakeTextSearchable] (#text NVARCHAR(MAX)) RETURNS NVARCHAR(MAX)
--replaces spaces with wildcard characters to return more matches in a LIKE condition
-- for example:
-- #text = 'my file' will return '%my%file%'
-- SELECT WHERE 'my project files' like #text would return true
AS
BEGIN
DECLARE #searchableText NVARCHAR(MAX)
SELECT #searchableText = '%' + replace(#text, ' ', '%') + '%'
RETURN #searchableText
END
Then use the function like this:
SELECT #searchString = dbo.fss_MakeTextSearchable(#String)
Then in your query:
Select * from Table where title LIKE #searchString

What's the most efficient way to search multiple MySQL tables for a large quantity of terms?

Taking a PHP array of terms with variable length (i.e. it could be 50 terms, it could be 400), what's the most efficient way of searching my database for each of these terms?
The search I'm trying to do is quite straightforward. For each term, I'd like to do:
SELECT id, post_title FROM wp_posts WHERE post_title LIKE %term%
Obviously I can run a foreach in PHP and run multiple MySQL queries, but I'd imagine this to be hugely inefficient.
The code I've most recently tried involves multiple OR statements, but with ~100ish terms it appears to run very slowly.
I have no idea if something like this would work?
SELECT id, post_title FROM wp_posts WHERE post_title LIKE %term1%, %term2%, %term3%, %term4%, [...]
Can I use a more efficient SQL statement, or should I be looking at this in a different way?
Stock MySQL could handle this kind of search using
MATCH (post_title) AGAINST ('term1 term2 term3 term4')
To do this search you will need to add Full Text index into the table using
ALTER TABLE wp_posts ADD FULLTEXT INDEX ft_key1(post_title);
This would be way faster than LIKE %term%, but please note that Full-Text indexes are only supported in MyISAM tables (InnoDB supported this syntax since MySQL 5.6).
However as your data grow bundled MySQL search speed might become an issue. In this case I would suggest to use external search engine like Solr or Sphinx.
If you decided to switch to Sphinx you may want to take a look on this guide http://astellar.com/2011/12/replacing-mysql-full-text-search-with-sphinx/
Create an additional index holding a single column where you simply concatenate the values of all table columns you want to query during the search. This way you can use a single SELECT query with a LIKE clause to search through all columns at once.
This is often referred to as "full text search".
Using MySQL the most efficient way is to set up Full-text searching... http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html
I don't know whether its any more efficient than the other suggestions but you could also do
SELECT id, post_title FROM wp_posts WHERE FIND_IN_SET(post_title, '%term1%, %term2%, %term3%, %term4%') <> 0

Making search facility

I want to make a search facility in my website.I'm using php..
What criteria should be taken for searching.
For ex: if someone searches
How to make soap
I can use many approaches for the search like finding database entries having exactly the same search string
or finding the database entries in the order of search keywords(ie . entry with search string "How" +"Soap" will have less preference than entry having search string "how soap make")...
So what is the algorithm generally used for searching.?
Also what is meant by full text search?
This is kind of a big subject for a simple answer, but I think what you mean is how to run complex fulltext searches on MySQL. In other words, this is really a MySQL question, not a PHP one.
Basically, you need to:
1. Create a fulltext index on a text field in your database.
2. Run queries on that database field using MySQL's fulltext syntax.
The basic syntax for querying a fulltext indexed table in MySQL is:
SELECT * FROM table
WHERE MATCH (fulltextfield)
AGAINST ('my search phrase');
There's a lot more to it than that, but the MySQL documentation is the place to go: http://dev.mysql.com/doc/refman/5.0/en/fulltext-natural-language.html
If you want to do really advanced fulltext searches, a good recommendation is Sphinx, but that's probably way more advanced than you need.

Categories