For some customers I am working on a project which is using a MySQL database.
I need to implement a search functionality which should be able to search in the database the devices with all the features selected. I am using CodeIgniter. The problem is the structure of the table.
I've found out that the table contains 2 columns: ID_D (the id of the device) and ID_F (the id of the feature). Basically the table doesn't contain any primary key (that's why I cannot execute any join at all).
So, it's also possible that a device id can appear in 10 rows for each feature it has. When I execute the search, I have a list of the features ID and I should be able to read only the devices with all the features selected.
if (isset($feature_array)) {
foreach($feature_array as $key => $row) {
$this->db->where('f_id',$row['f_id']);
}
}
Naturally, something like that won't work. Any ideas?
I think this might solve your problem, in case the feature ids are unique and do not contain spaces. In case they do contain spaces you should select a separator, that is not in the range of the feature ids.
Basically the code uses the mySQL group_concat function to concatenate all feature ids and matches the created string by all searched features. This finds all devices, that support at least the given set of features.
CI itself does not support that fucntion, so it is added by a workarround in the select method.
Also this might be a bit load intensive if it is called on a large table. Maybe someone else got a faster solution?
$this->db->select(['ID_D', 'GROUP_CONCAT(ID_F SEPARATOR \' \']) as id_f_concatenated']);
foreach($feature_array as $row) {
$this->db->where("id_f_concatenated REGEXP '{$row['id_f']} | {$row['id_f']}|^{$row['id_f']}$'");
}
$this->db->group_by('ID_D');
$result = $this->db-get('tablename');
EDIT:
Changed the LIKE to an REGEXP to make sure, that the feature id 112 is not matched by the id 12, without making the expression to complicated.
I'm building search engine for jobs and I've used match against but I have problem when I search with 'php', 'hr', 'IT'.
there are no results because I can't change ft_min_word_len variable on shared hosting , but my client tells me that he know others companies use same queries with match against on shared hosting and it works !, so what is the solution now ?
Thanks
#Shadow pointed out that ft_min_word_len is MySQL server-wide setting. Some shared hosting companies will say yes when you ask for a change to that variable, and others will say no. The worst ones will say huh? Avoid those companies.
So your first task is to find out which companies say yes. Can you use one of those companies? If so, go do it.
If that doesn't work, you could consider moving your application to a virtual machine provider like Amazon Web Services, and installing your own MySQL server instance. It's a little more trouble, but you can control things.
If that doesn't work, and you must be able to search for short words, you'll need to populate your own short word table and search it with non-FULLTEXT SQL filtering.
To explain this in detail is beyond the scope of a SO answer. But here's the outline.
create a shortwords table containing two columns, item_id and word. item_id is a FK to another table containing your text items. word is a VARCHAR(6) column. The primary key of this table can be a composite of both columns, that is, (word, item_id).
populate that table by writing a php program to process each bit of text in your text items. It must explode the text strings into words, then INSERT each short word into your shortwords table with the item_id. value.
When you search for short words do this sort of query.
SELECT item.text
FROM item
JOIN shortwords ON item.item_id = shortwords.item_id
WHERE shortwords.word = 'php'
OR shortwords.word = 'it'
OR shortwords.word = 'hr'
Look, this is an outline of the solution to your short word problem. It is going to take quite a bit of programming, by you, to get this working. Please don't ask SO to do this programming for you.
In a database I work with, there are a few million rows of customers. To search this database, we use a match against Boolean expression. All was well and good, until we expanded into an Asian market, and customers are popping up with the name 'In'. Our search algorithm can't find this customer by name, and I'm assuming that it's because it's an InnoDB reserved word. I don't want to convert my query to a LIKE statement because that would reduce performance by a factor of five. Is there a way to find that name in a full text search?
The query in production is very long, but the portion that's not functioning as needed is:
SELECT
`customer`.`name`
FROM
`customer`
WHERE
MATCH(`customer`.`name`) AGAINST("+IN*+KYU*+YANG*" IN BOOLEAN MODE);
Oh, and the innodb_ft_min_token_size variable is set to 1 because our customers "need" to be able to search by middle initial.
It isn't a reserved word, but it is in the stopword list. You can override this with ft_stopword_file, to give your own list of stopwords. 2 possible problems with these are: (1) on altering it, you need to rebuild your fulltext index (2) it's a global variable: you can't alter it on a session / location / language-used basis, so if you really need all the words & are using a lot of different languages in one database, providing an empty one is almost the only way to go, which can hurt a bit for uses where you would like a stopword list to be used.
I'm currently developing a search functionality for a website. Users search for other users by name. I'm having some trouble getting good results for users that have accents on their name.
I have a FULLTEXT index on the name column and the table's collation is utf8_general_ci.
Currently if somebody registers for the site, and has a name with accents (for example: Alberto Andrés), the name is stored in the DB as shown in the following image:
So if I perform the following query SELECT * MATCH(name) AGAINST('alberto andres') I get lots of results with better match scores like 'Alberto', 'Andres', 'Andrés' and finally with a low match score the record the user is probably looking for 'Alberto Andrés'.
What could I do to take into account the way accented records are currently stored in the DB?
Thanks!
It looks to me like the surname of el Señor Andrés is actually stored correctly. The rendering you showed us is the way some non-UTF apps mangle UTF8 text.
You might try this modification of your query if you don't yet have a whole bunch of records in your table. Fulltext (non-boolean) mode works weirdly on small data sets.
SELECT *
FROM TABLE
WHERE MATCH(name) AGAINST('alberto andres' IN BOOLEAN MODE)
You also might try
SELECT *
FROM TABLE
WHERE MATCH(name) AGAINST(CONVERT('alberto andres' USING utf8))
just to make sure your match string is in the same character set as your MySQL columns.
I'm building a rather large "search" engine for our company intranet, it has 1miljon plus entries
it's running on a rather fast server and yet it takes up to 1 min for some search queries.
This is how the table looks
I tried create an index for it, but it seems as if i'm missing something, this is how the show index is showing
and this is the query itself, it is the ordering that slows the query mostly but even a query without the sorting is somewhat slow.
SELECT SQL_CALC_FOUND_ROWS *
FROM `businessunit`
INNER JOIN `businessunit-postaddress` ON `businessunit`.`Id` = `businessunit-postaddress`.`BusinessUnit`
WHERE `businessunit`.`Name` LIKE 'tanto%'
ORDER BY `businessunit`.`Premium` DESC ,
CASE WHEN `businessunit`.`Name` = 'tanto'
THEN 0
WHEN `businessunit`.`Name` LIKE 'tanto %'
THEN 1
WHEN `businessunit`.`Name` LIKE 'tanto%'
THEN 2
ELSE 3
END , `businessunit`.`Name`
LIMIT 0 , 30
any help is very much appreciated
Edit:
What's choking this query 99% is ordering by relevance with the wildcharacter %
When i Do an explain it says using where; using fsort
You should try sphinx search solution which is full-text search engine will give you very good performance along with lots of options to set relevancy.
Click here for more details.
Seems like the index doesn't cover Premium, yet that is the first ORDER BY argument.
Use EXPLAIN your query here to figure out the query plan and change your index to remove any table scans as explained in http://dev.mysql.com/doc/refman/5.0/en/using-explain.html
MySQL is good for storing data but not great when it comes down to fast text based search.
Apart from Sphinx which has been already suggested I recommend two fantastic search engines:
Solr with http://pecl.php.net/package/solr - very popular search engine. Used on massive services like NetFlix.
Elastic Search - relatively new software but with very active community and lots of respect
Both solution are based on the same library Apache Lucene
If the "ORDER BY" is really the bottleneck, the straight-forward solution would be to remove the "ORDER BY" logic from your query, and re-implement the sorting directly in your application's code using C# sorting. Unfortunately, this means you'd also have to move your pagination into your application, since you'd need to obtain the complete result set before you can sort & paginate it. I'm just mentioning this because no-one else so far appears to have thought of it.
Frankly (like others have pointed out), the query you showed at the top should not need full-text indexing. A single suffix wildcard (e.g., LIKE 'ABC%') should be very effective as long as a BTREE (and not a HASH) index is available on the column in question.
And, personally, I have no aversion to even double-wildcard (e.g., LIKE '%ABC%"), which of course can never make use of indexes, as long as a full table scan is cheap. Probably 250,000 rows is the point where I'll start to seriously consider full-text indexing. 100,000 is definitely no problem.
I always make sure my SELECT's are dirty-reads, though (no transactionality applied to the select).
It's dirty once it gets to the user's eyeballs in any case!
Most of the search engine oriended sites are use FULL-TEXT-SEARCH.
It will be very faster compare to select and LIKE...
I have added one example and some links ...
I think it will be useful for you...
In this full text search have some conditions also...
STEP:1
CREATE TABLE articles (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
title VARCHAR(200),
body TEXT,
FULLTEXT (title,body)
);
STEP:2
INSERT INTO articles (title,body) VALUES
('MySQL Tutorial','DBMS stands for DataBase ...'),
('How To Use MySQL Well','After you went through a ...'),
('Optimizing MySQL','In this tutorial we will show ...'),
('1001 MySQL Tricks','1. Never run mysqld as root. 2. ...'),
('MySQL vs. YourSQL','In the following database comparison ...'),
('MySQL Security','When configured properly, MySQL ...');
STEP:3
Natural Language Full-Text Searches:
SELECT * FROM articles
WHERE MATCH (title,body) AGAINST ('database');
Boolean Full-Text Searches
SELECT * FROM articles WHERE MATCH (title,body)
AGAINST ('+MySQL -YourSQL' IN BOOLEAN MODE);
Go through this links
viralpatel.net,devzone.zend.com,sqlmag.com,colorado.edu,en.wikipedia.org
It's so strange query :)
Let's try to understand what it does.
The results are less than 30 rows from the table "businessunit" with some conditions.
The first condition is a foreign key of the "businessunit-postaddress" table.
Please check if you have an index on the column businessunit-postaddress.BusinessUnit.
The second one is a filter for returning rows only with businessunit.Name begining with 'tanto'.
If I didn't make a mistake you have a very complex index 'Business' consists of 11 fields!
And field 'Name' is not the first field in this index.
So this index is useless when you run "like tanto%"'s query.
I have strong doubt about necessity of this index at all.
By the way it demands quite big resources for its maintaining and slow down edit operations with this table.
You have to make an index with the only field 'Name'.
After filtering the query is sorting results and do it in some strange way too.
At first it sorts by field businessunit.Premium - it's normal.
However next statements with CASE are useless too.
That's why.
The zero are assigned to Name = 'tanto' (exactly).
The next rows with the one are rows with space after 'tanto' - these will be after 'tanto' in any cases (except special symbols) cause space is lower than any letter.
The next rows with the two are rows with some letters after 'tanto' (include space!). These rows will be in this order too by definition.
And the three is "reserved" for "other" rows but you won't get "other" rows - remeber about [WHERE businessunit.Name LIKE 'tanto%'] condition.
So this part of ORDER BY is meaningless.
And at the end of ORDER BY there is businessunit.Name again...
My advice: you need rebuild the query from scratch keeping in mind what you want to get.
Anyway I guess you can use
SELECT SQL_CALC_FOUND_ROWS *
FROM `businessunit`
INNER JOIN `businessunit-postaddress` ON `businessunit`.`Id` = `businessunit-postaddress`.`BusinessUnit`
WHERE `businessunit`.`Name` LIKE 'tanto%'
ORDER BY `businessunit`.`Premium` DESC,
`businessunit`.`Name`
LIMIT 0 , 30
Don't forget about an index on field businessunit-postaddress.BusinessUnit!
And I have strong assumption about field Premium.
I guess it is designed for storing binary data (yes/no).
So an ordinary (BTREE) index doesn't match.
You have to use bitmap index.
P.S. I'm not sure that you really need to use SQL_CALC_FOUND_ROWS
MySQL: Pagination - SQL_CALC_FOUND_ROWS vs COUNT()-Query
Its either full-text(http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html) or the pattern matching (http://dev.mysql.com/doc/refman/5.0/en/pattern-matching.html) from php and mysql side.
From experience and theory:
Advantages of full-text -
1) Results are very relevant and de-limit characters like spacing in the search query does not hinder the search.
Disadvantages of full-text -
1) There are stopwords that are used as restrictions by webhosters to prevent excess load of data.(E.g. search results containing the word 'one' or 'moz' are not displayed. And this can be avoided if you're running your own server by keeping no stopwords.
2) If I type 'ree' it only displays words containing exactly 'ree' not 'three' or 'reed'.
Advantages of pattern matching -
1) It does not have any stopwords as in full-text and if you search for 'ree', it displays any word containing 'ree' like 'reed' or 'three' unlike fulltext where only the exact word is retreived.
Disadvantages of pattern matching-
1) If delimiters like spaces are used in your search words and if these spaces are not there in the results, because each word is separate from any delimiters, then it returns no result.
If the argument of LIKE doesn't begin with a wildchard character, like in your example, LIKE operator should be able to take advantage of indexes.
In this case, LIKE operator should perform better than LOCATE or LEFT, so I suspect that changing the condition like this could make things worse, but I still think it's worth trying (who knows?):
WHERE LOCATE('tanto', `businessunit`.`Name`)=1
or:
WHERE LEFT(`businessunit`.`Name`,5)='tanto'
I would also change your order by clause:
ORDER BY
`businessunit`.`Premium` DESC ,
CASE WHEN `businessunit`.`Name` LIKE 'tanto %' THEN 1
WHEN `businessunit`.`Name` = 'tanto' THEN 0
ELSE 2 END,
`businessunit`.`Name`
Name has to be LIKE 'tanto%' already, so you can skip a condition (CASE will never return value 3). Of course, make sure that Premium field is indexed.
Hope this helps.
I think you need to collect the keys only, sort them, then join last
SELECT A.*,B.* FROM
(
SELECT * FROM (
SELECT id BusinessUnit,Premium
CASE
WHEN Name = 'tanto' THEN 0
WHEN Name LIKE 'tanto %' THEN 1
WHEN Name LIKE 'tanto%' THEN 2
ELSE 3
END SortOrder
FROM businessunit Name LIKE 'tanto%'
) AA ORDER BY Premium,SortOrder LIMIT 0,30
) A LEFT JOIN `businessunit-postaddress` B USING (BusinessUnit);
This will still generate a filesort.
You may want to consider preloading the needed keys in a separate table you can index.
CREATE TABLE BusinessKeys
(
id int not null auto_increment,
BusinessUnit int not null,
Premium int not null,
SortOrder int not null,
PRIMARY KEY (id),
KEY OrderIndex (Premuim,SortOrder,BusinessUnit)
);
Populate all keys that match
INSERT INTO BusinessKeys (BusinessUnit,Premuim,SortOrder)
SELECT id,Premium
CASE
WHEN Name = 'tanto' THEN 0
WHEN Name LIKE 'tanto %' THEN 1
WHEN Name LIKE 'tanto%' THEN 2
ELSE 3
END
FROM businessunit Name LIKE 'tanto%';
Then, to paginate, run LIMIT on the BusinessKeys only
SELECT A.*,B.*
FROM
(
SELECT FROM BusinessKeys
ORDER BY Premium,SortOrder
LIMIT 0,30
) BK
LEFT JOIN businessunit A ON BK.BusinessUnit = A.id
LEFT JOIN `businessunit-postaddress` B ON A.BusinessUnit = B.BusinessUnit
;
CAVEAT : I use LEFT JOIN instead of INNER JOIN because LEFT JOIN preserves the order of the keys from the left side of the query.
I've read the answer to use Sphinx to optimize the search. But regarding my experience I would advise a different solution. We used Sphinx for some years and had a few nasty problems with segmentation faults and corrupted indice. Perhaps Sphinx isn't as buggy as a few years before, but for a year now we are very happy with a different solution:
http://www.elasticsearch.org/
The great benefits:
Scalability - you can simply add another server with nearly zero configuration. If you know mysql replication, you'll love this feature
Speed - Even under heavy load you get good results in much less than a second
Easy to learn - Only by knowing HTTP and JSON you can use it. If you are a Web-Developer, you feel like home
Easy to install - it is useable without touching the configuration. You just need simple Java (no Tomcat or whatever) and a Firewall to block direct access from the public
Good Javascript integration - even a phpMyAdmin-like Tool is a simple HTML-Page using Javascript: https://github.com/mobz/elasticsearch-head
Good PHP Integration with https://github.com/ruflin/Elastica
Good community support
Good documentation (it is not eye friendly, but it covers nearly every function!)
If you need an additional storage solution, you can easily combine the search engine with http://couchdb.apache.org/