Is there a common way of dealing with tags in databases?
I'm thinking of using tinytext with pipes.
I think adding another table and using IDs might make it more complicated for little gain.
What's your preferred way of doing this?
and what is the right way of doing queries in a table to find results matching multiple or single tags?

Implement a simple N:N relations.
-Fkey itemId
-Fkey tagId

I'll spread little heresy here.
Big boys, including this site are using denormalised schemas for tags for scalability reasons, storing comma, pipe or space delimited tags in text type field for each row and separate table for tags with counts. Upon inserting or updating an item just check what tags were added or dropped and update counts accordingly (explode to arrays old and new tag strings and do array_diff() ).
Now you have cheap way to display tag cloud with counts by simple SELECT * FROM tags, no fancy queries. To find items tagged with given name just do LIKE '%TAG%', this will work well for small traffic website (say less then 100k page views per day) and small data sets (again, say less than 100k of records). Above that you could use Fulltext Search to speed things up and ultimately proper search engine like Lucene or Sphinx.
Finding related tags, like here on SO, is easy too (Kohana specific code, LIKE based, MySQL specific):
$tags = array('foo', 'bar');
private function get_related_tags( $tags )
## Get db entries with specific tags and build array with counts
## is it cached already? ------------------------------------------------
$this->cache = Cache::instance();
$tags = array_filter( array_flip(array_flip($tags)) );
$cache_name = implode('', $tags);
$cache = $this->cache->get( $cache_name );
if( $cache )
return $cache;
## not cached, fire up ---------------------------------------------------
$db = Database::instance();
## count tagged items ----------------------------------------------------
// build like string
$like = array();
foreach( $tags as $tag )
$like[] = "tags LIKE '%$tag%'";
$like = implode(' AND ', $like);
// get counts
$count = $db->query("SELECT count(id) AS count FROM `articles` WHERE $like")->current()->count;
## check what tags are related ------------------------------------------
$offset = 0;
$step = 300;
$related_tags = array();
while( $offset < $count )
$assets = $db->query("SELECT tags FROM `articles` WHERE $like ORDER BY id ASC LIMIT $step OFFSET $offset");
foreach($assets as $asset)
// tags
$input = explode( ' ', trim($asset->tags) );
foreach( $input as $k => $v )
if( $v == ''){
//do nothing, shouldnt be here anyway
elseif( array_key_exists($v, $related_tags) ){
$related_tags[$v] = 1;
$offset += $step;
// remove already displayed from list
foreach( $tags as $tag )
unset( $related_tags[$tag] );
// set cache
$this->cache->set( $cache_name, array($related_tags, $count), 'related_tags_counts', 0);
return array($related_tags, $count);
This is not really cheap so I keep counts cached for given set of tags until I make changes to tags in articles table.
This setup is not perfect by any means, but certainly has some advantages. Schema is simple, getting tag cloud is straightforward, getting articles along with tags with one simple query (ie without subqueries). As main disadvantages I would see inability to rename or drop tag system-wide without amending every single row where it occurs, but hey, how often you do that anyway?
Currently I'm using this setup for few projects of mine and it works like a dream, but I must admit these are not high traffic websites (hence I get away with LIKE), next year I will be able to test it with busy site but I'm pretty sure it will do. Normalization nazis will vote me down perhaps, but I just love simplicity of it and I'm happy to trade off cpu cycles for that.
Actually I was going to post this tag system a while ago on SO and ask experts what they think of it so feel free to leave comments.
Traditionally, sorry for my English, I believe it's funny =)
Since you've provided your requiremnents in comments, I think this setup is perfect for you. I've posted full Tag Model in pastie here, with methods to handle counts, Kohana specific but if you know Codeigniter you'll feel home. Just use it this way:
table TAGS: id, tag_name, tag_count
// insert new item/article
$tag_model->update_tags( $tags_str, null );
// update existing item
$tag_model->update_tags( $new_tags_str, $old_tags_str ); // $old_tags as stored in db
// delete item, you'll have to get item from db before deletion
$tag_model->update_tags( null, $old_tags_str );
I've amended the code as markdown have mangled it up, also queries are mySQL flavour, not SQLite.

Yes, don't scare away from normalization and having each tag in its own record. This will ultimately be the most flexible and with the correct indexing the fastest.


Optimization of search function MySQL or PHP wise

After running a few test I realized that my search method does not perform very well if some words of the query is short (2~3 letters).
The way I made the search is by making a MySQL query for every words in the string the visitor entered and then filtering result from each word to see if every words had that result. Once one result has been returned for all words its a match and il show that result to the visitor.
But I was wondering if that's an effective way to do it. Is there any better way while keeping the same functionality ?
Currently the code I have takes about .7Sec making MySQL queries. And the rest of the stuff is under .1Sec.
Normally I would not care much about my search taking .7Sec, But Id like to create a "LiveSearch" and is critical that it loads faster than that.
Here is my code
public static function Search($Query){
$Querys = explode(' ',$Query);
foreach($Querys as $Query)
$MatchingRow = \Database\dbCon::$dbCon -> prepare("
`product_products` as pp
' ',
(SELECT `Name` FROM `product_brands` WHERE `Id` = pp.BrandId),
' ',
' ',
IF(`isFreeShipping` = 1 OR `isFreeShippingOnOrder` = 1, ' FreeShipping', '')
LIKE :Title;
$MatchingRow -> bindValue(':Title','%'.$Query.'%');
$MatchingRow -> execute();
foreach($MatchingRow -> fetchAll(\PDO::FETCH_ASSOC) as $QueryInfo)
$Matchings[$Query][$QueryInfo['Id']] = $QueryInfo['Id'];
}catch(\PDOException $e){
echo 'Error MySQL: '.$e->getMessage();
$TmpMatch = $Matchings;
$Matches = array_shift(array_values($TmpMatch));
foreach($TmpMatch as $Query)
$Matches = array_intersect($Matches,$Query);
foreach($Matches as $Match){
$Products[] = new Product($Match);
return $Products;
As others have already suggested, fulltext search is your friend.
The logic should go more or less like this.
Add a text column to your "product_products" table called, say, "FtSearch"
Write a small script that will run only once, in which you write, for each existing product, the text that has to be searched for into the "FtSearch" column. This is, of course, the text you compose in your query (id + brand name + title and so forth, including the FreeShipping part). You can probably do this with a single SQL statement, but I have no mysql at hand and can't provide you the code for that... you might be able to find it out by yourself.
Create a fulltext index on the "FtSearch" column (doing this after having populated the FtSearch column saves you a little execution time).
Now you have to add the code necessary to ensure that every time any of the fields involved in the search string is inserted/updated, you insert/update the search string as well. Pay attention here, since this includes not only the "Title", "ModelNumber" and "FreeShipping" of the "product_product", but as well the "Name" of the "product_brand". This means that if the name of a product_brand is updated, you will have to regenerate all search strings of all products having that brand. This might be a little slow, depending on how many products are bound to that brand. However I assume it does not happen too often that a brand changes its name, and if it does, it certainly happens in some sort of administration interface where such things are usually acceptable.
At this point you can query the table using the MATCH construct, which is way way faster you could ever get by using your current approach.

Is it worth to save keyword <-> link relation into "hastable" like structure in mysql?

im working on PHP + MySQL application, which will crawl HDD/shared drive and index all files and directories into database, to provide "fulltext" search on it. So far im doing well, but im stuck on question, if i chosed good way how to store data into database.
On picture below, you can see part schema of my database. Thought is, that i'm saving domain (which represents part of disk which i wana to index) then there are some link(s) (which represents files and folder (with content, filepath, etc) then i have table to store sole (uniq) keywords, which i find in file/folder name or content.
And finaly, i have 16 tables linkkeyword to store relations between links and keywords. I have 16 of them because i thought it might be good to make something like hashtable, because im expecting high number of relations between link <-> keyword. (so far for 15k links and 400k keywords i have about 2.5milion of linkkeyword records). So to avoid storing so much data into one table (and later search above them) i thought that this hastable can be faster. It works like i wana to search for word, i compute it md5 and look at first character of md5 and then i know to which linkkeyword table i should use. So there is only about 150~200k records in each linkkeyword table (against 2.5milions)
So there im curious, if this approach can be of any use, or if will be better to store all linkkeyword information to single table and mysql will take care of it (and to how much link<->keyword it can work?)
So far this was great solution to me, but i crushed hard when i tried to implement regular-expression search. So user can use e.g. "tem*" which can result in temp, temporary, temple etc... In normal way when searching for word, i will conpute in md5 hash and then i know to which linkkeyword table i need to look. But for regular expression i need to get all keywords from keywords table (which matches regular expression) and then process them one by one.
Im also attaching part of code for normal keyword search
private function searchKeywords($selectedDomains) {
$searchValues = $this->searchValue;
$this->resultData = array();
foreach (explode(" ", $searchValues) as $keywordName) {
$keywordName = strtolower($keywordName);
$keywordMd5 = md5($keywordName);
$selection = $this->database->table('link');
$results = $selection->where('domain.id', $selectedDomains)->where('domain.searchable = ?', '1')->where(':linkkeyword' . $keywordMd5[0] . '.keyword.keyword LIKE ?', $keywordName)
->select('link.*,:linkkeyword' . $keywordMd5[0] . '.weight,:linkkeyword' . $keywordMd5[0] . '.keyword.keyword');
foreach ($results as $result) {
$keyExists = array_key_exists($result->linkId, $this->resultData);
if ($keyExists) {
} else {
$domain = $result->ref('domain');
$linkClass = new search\linkClass($result, $domain);
$this->resultData[$result->linkId] = $linkClass;
and regular expression search function
private function searchRegexp($selectedDomains) {
//get stored search value
$searchValues = $this->searchValue;
//replace astering and exclamation mark (counted as characters for regular expression) and replace them by their mysql equivalent
$searchValues = str_replace("*", "%", $searchValues);
$searchValues = str_replace("!", "_", $searchValues);
// empty result array to prevent previous results to interfere
$this->resultData = array();
//searched phrase can be multiple keywords, so split it by space and get results for each keyword
foreach (explode(" ", $searchValues) as $keywordName) {
//set default link result weight to -1 (default value)
$weight = -1;
//select all keywords, which match searched keyword (or its regular expression)
$keywords = $this->database->table('keyword')->where('keyword LIKE ?', $keywordName);
foreach ($keywords as $keyword) {
//count keyword md5 sum to determine which table should be use to match it links
$md5 = md5($keyword->keyword);
//get all link ids from linkkeyword relation table
$keywordJoinLink = $keyword->related('linkkeyword' . $md5[0])->where('link.domain.searchable','1');
//loop found links
foreach ($keywordJoinLink as $link) {
//store link weight, for later result sort
$weight = $link->weight;
//get link ID
$linkId = $link->linkId;
//check if link already exists in results, to prevent duplicity
$keyExists = array_key_exists($linkId, $this->resultData);
//if link already exists in result set, just update its weight and insert matching keyword for later keyword tag specification
if ($keyExists) {
//if link isnt in result yet, insert it
} else {
//get link reference
$linkData = $link->ref('link', 'linkId');
//get information about domain, to which link belongs (location, flagPath,...)
$domainData = $linkData->ref('domain', 'domainId');
//if is domain searchable and was selected before search, add link to result set. Otherwise ignore it
if ($domainData->searchable == 1 && in_array($domainData->id, $selectedDomains)) {
//create new link instance
$linkClass = new search\linkClass($linkData, $domainData);
//insert matching keyword to links keyword set
//set links weight
//insert link into result set
$this->resultData[$linkId] = $linkClass;
Your question is mostly one of opinion, so you may want to include the criteria that allow us to answer "worth it' more objectively.
It appears you've re-invented the concept of database sharding (though without distributing your data across multiple servers).
I assume you are trying to optimize search time; if that's the case, I'd suggest that 2.5 million records on a modern hardware is not a particularly big performance challenge, as long as your queries can use an index. If you can't use an index (e.g. because you're doing a regular expression search), sharding will probably not help at all.
My general recommendation with database performance tuning is to start with the simplest possible relational solution, keep tuning that until it breaks your performance goals, then add more hardware, and only once you've done that should you go for "exotic" solutions like sharding.
This doesn't mean using prayer as a strategy. For performance-critical application, I typically build a test database, where I can experiment with solutions. In your case, I'd build a database with your schema without the "sharding" tables, and then populate it with test data (either write your own population routines, or use a tool like DBMonster). Typically, I'd go for at least double the size I expect in production. You can then run and tune queries to prove, one way or another, whether your schema is good enough. It sounds like a lot of work, but it's much less work than your sharding solution is likely to bring along.
There are (as #danFromGermany comments) solutions that are optimized for text serach, and you could use MySQL fulltext search features rather than regular expressions.

MYSQL PHP check a like statement on more one column at once

I have query I have came up with for a search bar. I know I can use OR to split the statement to check different columns:
$query = mysql_query('SELECT * FROM PRODUCTS WHERE cats LIKE "%'.$s.'%" OR sub_cats LIKE "%'.$s.'%"');
But if I want to check more than 2 or three columns is the re a way doing something like this to speed things up a bit:
$query_string = explode('&&=',$_GET['q']);
$db_str = 'SELECT * FROM products WHERE name, cats, sub_cats, desc, brand ';
for($x = 0; $x < count($query_string); $x++){
$enc = mysql_real_escape_string($query_string[$x]);
if($x == (count($query_string) -1)){
$db_str .= 'LIKE "%'.$enc.'%"';
$db_str .= 'LIKE "%'.$enc.'%" ';
$query = mysql_query($db_str);
I normally used POST for search bars but I fancy giving GET a go it looks more user friendly to me.
I don't know if this would speed things up (you would have to test on your data). However, the following is an alternative with only one comparison:
WHERE concat_ws('|', name, cats, sub_cats, desc, brand) LIKE '%$enc%'
Because your like has a wildcard at the beginning, the query does not use regular indexes. So this version should not slow things down much, if at all.
The answer to your performance problem, though, may be a full text index.
Your code will not work like that, even ignoring the deprecated API. You would need to do something like this:
$searchFields = ['name', 'cats', 'sub_cats', 'desc', 'brand'];
$searchTerm = mysql_real_escape_string($query_string[$x]);
$filters = implode(' OR ', array_map(function($field) use ($searchTerm) {
return "$field LIKE '%{$searchTerm}%'";
}, $searchFields));
$query = mysql_query('SELECT * FROM products WHERE '.$filters);
It's far from optimal though, and will quickly become slow as database and userbase grow. It is however exactly how for example phpMyAdmin implements its database-wide searches. If your database is larger than a few records, or is likely to attract more than a handful of searches simultaneously, you should implement a smarter solution like full text search or implement an external search engine like Apache SOLR, Amazon CloudSearch and the like - relational databases like MySQL on their own just aren't good at messy searching nor were they ever meant to be.
As for this remark:
I normally used POST for search bars but I fancy giving GET a go it
looks more user friendly to me.
This isn't really the good argument to choose. As a rule:
GET is for repeatable requests, that the user can F5 on as much as they want.
POST is for non-repeatable requests, that are usually considered harmful if (overly) repeated.
If your name is Google and you have a gazillion servers all over the world, use GET. If you already know your search has bad performance, stick with POST and don't tempt the lions.

mysql | PHP | Join within own table

i dont know if i am doing right or wrong, please dont judge me...
what i am trying to do is that if a record belongs to parent then it will have parent id assosiated with it.. let me show you my table schema below.
i have two columns
ItemCategoryID &
Let Suppose a record on ItemCategoryID =4 belongs to ItemCategoryID =2 then the column ItemParentCategoryID on ID 4 will have the ID of ItemCategoryID.
I mean a loop with in its own table..
but problem is how to run the select query :P
I mean show all the parents and childs respective to their parents..
This is often a lazy design choise. Ideally you want a table for these relations or/and a set number of depths. If a parent_id's parent can have it's own parent_id, this means a potential infinite depth.
MySQL isn't a big fan of infinite nesting depths. But php don't mind. Either run multiple queryies in a loop such as Nil'z's1, or consider fetching all rows and sorting them out in arrays in php. Last solution is nice if you pretty much always get all rows, thus making MySQL filtering obsolete.
Lastly, consider if you could have a more ideal approach to this in your database structure. Don't be afraid to use more than one table for this.
This can be a strong performance thief in the future. An uncontrollable amount of mysql queries each time the page loads can easily get out of hands.
Try this:
function all_categories(){
$data = array();
$first = $this->db->select('itemParentCategoryId')->group_by('itemParentCategoryId')->get('table')->result_array();
if( isset( $first ) && is_array( $first ) && count( $first ) > 0 ){
foreach( $first as $key => $each ){
$second = $this->db->select('itemCategoryId, categoryName')->where_in('itemParentCategoryId', $each['itemParentCategoryId'])->get('table')->result_array();
$data[$key]['itemParentCategoryId'] = $each['itemParentCategoryId'];
$data[$key]['subs'] = $second;
print_r( $data );
I don't think you want/can to do this in your query since you can nest a long way.
You should make a getChilds function that calls itself when you retrieve a category. This way you can nest more than 2 levels.
function getCategory()
// Retrieve the category
// Get childs
$childs = $this->getCategoryByParent($categoryId);
function getCategorysByParent($parentId)
// Get category
// Get childs again.
MySQL does not support recursive queries. It is possible to emulate recursive queries through recursive calls to a stored procedure, but this is hackish and sub-optimal.
There are other ways to organise your data, these structures allow very efficient querying.
This question comes up so often I can't even be bothered to complain about your inability to use Google or SO search, or to offer a wordy explanation.
Here - use this library I made: http://codebyjeff.com/blog/2012/10/nested-data-with-mahana-hierarchy-library so you don't bring down your database

MySQL/PHP Search Efficiency

I'm trying to create a small search for my site. I've tried using full-text index search, but I could never get it to work. Here is what I've come up with:
if(isset($_GET['search'])) {
$search = str_replace('-', ' ', $_GET['search']);
$result = array();
$titles = mysql_query("SELECT title FROM Entries WHERE title LIKE '%$search%'");
while($row = mysql_fetch_assoc($titles)) {
$result[] = $row['title'];
$tags = mysql_query("SELECT title FROM Entries WHERE tags LIKE '%$search%'");
while($row = mysql_fetch_assoc($tags)) {
$result[] = $row['title'];
$text = mysql_query("SELECT title FROM Entries WHERE entry LIKE '%$search%'");
while($row = mysql_fetch_assoc($text)) {
$result[] = $row['title'];
$result = array_unique($result);
So basically, it searches through all the titles, body-text, and tags of all the entries in the DB. This works decently well, but I'm just wondering how efficient would it be? This would only be for a small blog, too. Either way I'm just wondering if this could be made any more efficient.
There's no way to make LIKE '%pattern%' queries efficient. Once you get a nontrivial amount of data, using those wildcard queries performs hundreds or thousands of times slower than using a fulltext indexing solution.
You should look at the presentation I did for MySQL University:
Here's how to get it to work:
First make sure your table uses the MyISAM storage engine. MySQL FULLTEXT indexes support only MyISAM tables. (edit 11/1/2012: MySQL 5.6 is introducing a FULLTEXT index type for InnoDB tables.)
Create a fulltext index.
CREATE FULLTEXT INDEX searchindex ON Entries(title, tags, entry);
Search it!
$search = mysql_real_escape_string($search);
$titles = mysql_query("SELECT title FROM Entries
WHERE MATCH(title, tags, entry) AGAINST('$search')");
while($row = mysql_fetch_assoc($titles)) {
$result[] = $row['title'];
Note that the columns you name in the MATCH clause must be the same columns in the same order as those you declared in the fulltext index definition. Otherwise it won't work.
I've tried using full-text index search, but I could never get it to work... I'm just wondering if this could be made any more efficient.
This is exactly like saying, "I couldn't figure out how to use this chainsaw, so I decided to cut down this redwood tree with a pocketknife. How can I make that work as well as the chainsaw?"
Regarding your comment about searching for words that match more than 50% of the rows.
The MySQL manual says this:
Users who need to bypass the 50% limitation can use the boolean search mode; see Section 11.8.2, “Boolean Full-Text Searches”.
And this:
The 50% threshold for natural language
searches is determined by the
particular weighting scheme chosen. To
disable it, look for the following
line in storage/myisam/ftdefs.h:
Change that line to this:
Then recompile MySQL. There is no need
to rebuild the indexes in this case.
Also, you might be searching for stopwords. These are words that are ignored by the fulltext search because they're too common. Words like "the" and so on. See http://dev.mysql.com/doc/refman/5.1/en/fulltext-stopwords.html
Using LIKE is NOT fulltext.
You need to use ... WHERE MATCH(column) AGAINST('the query') in order to access a fulltext search.
MySQL Full-text search works -- I would look into it and debug it rather than trying to do this. Doing 3 separate MySQL queries will not be anywhere near as efficient.
If you want to try to make that much efficient you could separate the LIKE statements in one query with OR between them.
