Avoiding duplicate entries in an mySQL table with non unique columns - php

I am working on a CMS system (largely as a learning exercise) for a private website. Atm I have three tables: one for articles, one for tags and a joining table so that each article can have multiple tags.
The table I am having issues with consists of three columns -
article_tags: id (auto_increment), article_id, tag_id
My problem stems from the fact that an article can appear any number of times, and a tag can also appear any number of times, however a given combination of the two should only appear once - that is, each article should only have one reference to any single tag. Currently it is possible to INSERT "duplicate" rows where the id is different, but the combination of article_id and tag_id are the same:
id , article_id, tag_id
1 1 1
2 1 2
3 2 1
4 1 1 <- this is wrong
I could check in PHP code for a record that contains this combination, but I'd prefer to do it in sql if possible (if it is not, or it is undesirable then I will do it using PHP). Due to the id being different and the inability to set unique columns things like INSERT IGNORE and ON DUPLICATE do not work.
I'm quite new to mySQL so if I'm doing something silly please point me in the right direction.
Thanks

You should review your table definition.
You can (from best to worst):
Add a composite primary key on (article_id and tag_id) and remove auto_increment (previous primary key)
Add an index (UNIQUE) on (article_id and tag_id) and keep your auto_increment primary key
Select distinct in php: SELECT DISTINCT(article_id, tag_id) FROM
... without changing anything in your table
Right now, your table is defined as something like this:
CREATE TABLE IF NOT EXISTS `article_tags` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`article_id` int(11) NOT NULL,
`tag_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
The best solution (option 1) would be to remove your current (auto_increment) primary key and add a primary key (composite) on columns article_id and tag_id:
CREATE TABLE IF NOT EXISTS `article_tags` (
`article_id` int(11) NOT NULL,
`tag_id` int(11) NOT NULL,
PRIMARY KEY (`article_id`,`tag_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
But (option 2) if you absolutely want to keep your auto_increment primary key, add an index (unique) on your columns:
CREATE TABLE IF NOT EXISTS `article_tags` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`article_id` int(11) NOT NULL,
`tag_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `article_id` (`article_id`,`tag_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Anyway, if you don't want to change your table definition, you could always use DISTINCT in your php query:
SELECT DISTINCT(article_id, tag_id) FROM article_tags

Such many-to-many relationship tables, sometimes called join tables, often have just two columns, and have a primary key that's a composite of the two.
article_id
tag_id
pk = (article_id, tag_id)
If you change the definition of that table you will definitively solve that problem.
How should you order the columns in composite keys? It depends on how your application will look up items in the join table. If you'll always start with the article_id and look up the tag_id, then you put the article_id first in the key. The DBMS can random-access values for the first column in the key, but has to scan the index to find values in second (or subsequent) columns in the key.
You may want to create a second index on the table, (tag_id, article_id). This will allow fast lookups based on the tag_id. You may ask, "why bother to put both columns in the index?" That's to make the index into a covering index. In a covering index, the desired value can be retrieved directly from the index. For example, with a covering index,
SELECT article_id FROM article_tag WHERE tag_id = 12345
(or a JOIN that uses similar lookup logic) only needs to access the index on the disk drive to get the result. If you don't have a covering index, the query needs to jump from the index to the data table, which is an extra step.
Join tables typically have very short rows (a couple of integers) so the duplicated data for a couple of covering indexes (the primary key and the extra one) isn't a big disk-space hog.

Related

How to properly index attributes in wordpress database

I am extending a product sales plugin and am trying to understand how wordpress handles database relations. I am building tables on activation using dbDelta. An example of a table schema would be:
$table_schema = [
"CREATE TABLE IF NOT EXISTS `{$wpdb->prefix}plugin_orders` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`people_id` bigint(20) DEFAULT NULL,
`order_id` bigint(20) DEFAULT NULL,
`order_status` varchar(11) DEFAULT NULL,
`order_date` datetime DEFAULT NULL,
`order_total` decimal(13,2) DEFAULT NULL,
`accounting` tinyint(4) DEFAULT '0',
PRIMARY KEY (`id`),
KEY `people_id` (`people_id`),
KEY `order_id` (`order_id`)
) $collate;",
"CREATE TABLE IF NOT EXISTS `{$wpdb->prefix}plugin_order_product` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`order_id` bigint(20) DEFAULT NULL,
`product_id` bigint(20) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `order_id` (`order_id`),
KEY `product_id` (`product_id`)
) $collate;"
];
I see that id in each table is the PRIMARY KEY but what does declaring the other KEYs actually do? I have read that wordpress uses MyISAM which doesn't actually build foreign key connections. While these tables may point to other tables already existing, in this example does declaring KEY order_id (order_id) create a variable of sorts called order_id that any other table can use to reference? Is this code specifically connecting one tables attributes to another tables attributes (it doesn't appear to be)? After these tables are built, I can inspect them in phpMyAdmin and see that there are indexes assigned but no foreign key constraints. How does this code create tables that point one table at another to build relations?
KEY `foo_bar` (`order_id`)
"KEY" is the same as "INDEX". It specifies that a separate data structure is maintained for the efficient access of the table via the column order_id.
foo_bar is the name of the index. It has no special meaning, and has very few uses. For example, DROP KEY foo_bar; is the way to get rid of the index.
In MyISAM, a "FOREIGN KEY" allowed, but ignored. In InnoDB, it does two things:
Create an index if one is not already provided
Provide a constraint. The default effectively "complain if the other table does not already have the value referenced".
Having an index is important for performance. The index above make this
SELECT ... WHERE order_id = 1234 ...
run in milliseconds, even if there are billions of rows in the table. Without the index, the query would take minutes or hours.
A PRIMARY KEY is a UNIQUE key, which is an INDEX.
UNIQUE(widget) says that only one row can have a particular value of `widget in the table.
PRIMARY KEY(id) says that each row is uniquely identified by the column id. InnoDB really wants each table to have a PK.
"id" is a convention (not a requirement) for the name of the PK. It is also INT AUTO_INCREMENT by convention. You may or may not actually ever touch id.
Tables can be related to each other in 3 main ways:
1:1 -- They share the same unique key. This is rarely useful; you may as well have a single table.
1:many -- An "order" has several "items" in it (one-order : many-items). This is usually handled by order_id being a column in the items table.
many:many -- students_classes -- each student is in many classes; each class has many students. This is implemented via a mapping table that has (usually) only two columns: student_id and class_id (no id is needed) and PRIMARY KEY(student_id, class_id) and INDEX(class_id, student_id). Those two indexes make it efficient to go from a known student to their classes, and vice versa.
Another convention for the PK of a table is to include the table name. (It is clutter to do that for other columns, such as order_status.) I was assuming this convention for student_id and class_id.
But now I am confused by your plugin_orders -- it has both id and order_id. If that table describes "orders", then I would expect order_id to be the PK instead of id.
And, if order_product is a list of all the "products" in each "order", then I would expect you to have the 1:many pattern.
What indexes to have?
PRIMARY KEY to uniquely identify each row -- either id or some column (or combination of columns) that are unique.
Other columns, as needed, for the SELECTs, UPDATEs, and DELETEs that you have. Do not blindly add indexes before having some clues of the queries that might need them.
Indexes sometimes help in sorting:
SELECT ... ORDER BY last_name, first_name;
together with
INDEX(last_name, first_name)
Indexes provide performance; FKs provide integrity checks. Neither is "required"; both are "desirable".
MyISAM is ancient; you should change to InnoDB.
Then do something like
SELECT ...
FROM plugin_orders AS o
JOIN plugin_order_product AS op
ON o.order_id = op.order_id
WHERE ...
In this example, the Optimizer will perform the query something like this:
Look at the WHERE to see which table is best filtered by the conditions there. Declare that to be the first table work with.
Scan through the first table, using an index if practical.
For each row in the first table, reach into the second table.
Reaching into the second table would probably be done via INDEX(order_id) on the second table. This would make the JOIN fast and efficient.
Both tables have INDEX(order_id), but that is not relevant.
Next example:
SELECT ...
FROM plugin_orders AS o
JOIN plugin_order_product AS op
ON o.order_id = op.order_id
WHERE o.people_id = 123 -- note
Pick o as the first table due to filtering on people_id
use op INDEX(people_id) to rapidly find the o rows that are relevant.
etc (op is the second table)
Next example:
SELECT ...
FROM plugin_orders AS o
JOIN plugin_order_product AS op
ON o.order_id = op.order_id
WHERE op.product_id = 9887 -- changed again
Pick op as the first table due to filtering on product_id
use o INDEX(people_id) to rapidly find the op rows that are relevant.
etc (o is the second table this time)

Insert data from PHP to MySQL ONLY if multiples values don't exist in multiple columns [duplicate]

I have a table:
table votes (
id,
user,
email,
address,
primary key(id),
);
Now I want to make the columns user, email, address unique (together).
How do I do this in MySql?
Of course the example is just... an example. So please don't worry about the semantics.
To add a unique constraint, you need to use two components:
ALTER TABLE - to change the table schema and,
ADD UNIQUE - to add the unique constraint.
You then can define your new unique key with the format 'name'('column1', 'column2'...)
So for your particular issue, you could use this command:
ALTER TABLE `votes` ADD UNIQUE `unique_index`(`user`, `email`, `address`);
I have a MySQL table:
CREATE TABLE `content_html` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`id_box_elements` int(11) DEFAULT NULL,
`id_router` int(11) DEFAULT NULL,
`content` mediumtext COLLATE utf8_czech_ci NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `my_uniq_id` (`id_box_elements`,`id_router`)
);
and the UNIQUE KEY works just as expected, it allows multiple NULL rows of id_box_elements and id_router.
I am running MySQL 5.1.42, so probably there was some update on the issue discussed above. Fortunately it works and hopefully it will stay that way.
Multi column unique indexes do not work in MySQL if you have a NULL value in row as MySQL treats NULL as a unique value and at least currently has no logic to work around it in multi-column indexes. Yes the behavior is insane, because it limits a lot of legitimate applications of multi-column indexes, but it is what it is... As of yet, it is a bug that has been stamped with "will not fix" on the MySQL bug-track...
Have you tried this ?
UNIQUE KEY `thekey` (`user`,`email`,`address`)
This works for mysql version 5.5.32
ALTER TABLE `tablename` ADD UNIQUE (`column1` ,`column2`);
MySql 5 or higher behaves like this (I've just tested):
you can define unique constraints involving nullable columns. Say you define a constraint unique (A, B) where A is not nullable but B is
when evaluating such a constraint you can have (A, null) as many times you want (same A value!)
you can only have one (A, not null B) pair
Example:
PRODUCT_NAME, PRODUCT_VERSION
'glass', null
'glass', null
'wine', 1
Now if you try to insert ('wine' 1) again it will report a constraint violation
Hope this helps
You can add multiple-column unique indexes via phpMyAdmin. (I tested in version 4.0.4)
Navigate to the structure page for your target table. Add a unique index to one of the columns. Expand the Indexes list on the bottom of the structure page to see the unique index you just added. Click the edit icon, and in the following dialog you can add additional columns to that unique index.
this tutorial works for me
ALTER TABLE table_name
ADD CONSTRAINT constraint_name UNIQUE (column1, column2, ... column_n);
https://www.mysqltutorial.org/mysql-unique-constraint/
I do it like this:
CREATE UNIQUE INDEX index_name ON TableName (Column1, Column2, Column3);
My convention for a unique index_name is TableName_Column1_Column2_Column3_uindex.
If You are creating table in mysql then use following :
create table package_template_mapping (
mapping_id int(10) not null auto_increment ,
template_id int(10) NOT NULL ,
package_id int(10) NOT NULL ,
remark varchar(100),
primary key (mapping_id) ,
UNIQUE KEY template_fun_id (template_id , package_id)
);
For adding unique index following are required:
1) table_name
2) index_name
3) columns on which you want to add index
ALTER TABLE `tablename`
ADD UNIQUE index-name
(`column1` ,`column2`,`column3`,...,`columnN`);
In your case we can create unique index as follows:
ALTER TABLE `votes`ADD
UNIQUE <votesuniqueindex>;(`user` ,`email`,`address`);
If you want to avoid duplicates in future. Create another column say id2.
UPDATE tablename SET id2 = id;
Now add the unique on two columns:
alter table tablename add unique index(columnname, id2);
First get rid of existing duplicates
delete a from votes as a, votes as b where a.id < b.id
and a.user <=> b.user and a.email <=> b.email
and a.address <=> b.address;
Then add the unique constraint
ALTER TABLE votes ADD UNIQUE unique_index(user, email, address);
Verify the constraint with
SHOW CREATE TABLE votes;
Note that user, email, address will be considered unique if any of them has null value in it.
For PostgreSQL...
It didn't work for me with index; it gave me an error, so I did this:
alter table table_name
add unique(column_name_1,column_name_2);
PostgreSQL gave unique index its own name. I guess you can change the name of index in the options for the table, if it is needed to be changed...

MySQL database row COUNT optimization

I have a MySQL (5.6.26) database with large ammount of data and I have problem with COUNT select on table join.
This query takes about 23 seconds to execute:
SELECT COUNT(0) FROM user
LEFT JOIN blog_user ON blog_user.id_user = user.id
WHERE email IS NOT NULL
AND blog_user.id_blog = 1
Table user is MyISAM and contains user data like id, email, name, etc...
CREATE TABLE `user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`username` varchar(50) DEFAULT NULL,
`email` varchar(100) DEFAULT '',
`hash` varchar(100) DEFAULT NULL,
`last_login` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`created` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`id`),
UNIQUE KEY `id` (`id`) USING BTREE,
UNIQUE KEY `email` (`email`) USING BTREE,
UNIQUE KEY `hash` (`hash`) USING BTREE,
FULLTEXT KEY `email_full_text` (`email`)
) ENGINE=MyISAM AUTO_INCREMENT=5728203 DEFAULT CHARSET=utf8
Table blog_user is InnoDB and contains only id, id_user and id_blog (user can have access to more than one blog). id is PRIMARY KEY and there are indexes on id_blog, id_user and id_blog-id_user.
CREATE TABLE `blog_user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`id_blog` int(11) NOT NULL DEFAULT '0',
`id_user` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
UNIQUE KEY `id_blog_user` (`id_blog`,`id_user`) USING BTREE,
KEY `id_user` (`id_user`) USING BTREE,
KEY `id_blog` (`id_blog`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=5250695 DEFAULT CHARSET=utf8
I deleted all other tables and there is no other connection to MySQL server (testing environment).
What I've found so far:
When I delete some columns from user table, duration of query is shorter (like 2 seconds per deleted column)
When I delete all columns from user table (except id and email), duration of query is 0.6 seconds.
When I change blog_user table also to MyISAM, duration of query is 46 seconds.
When I change user table to InnoDB, duration of query is 0.1 seconds.
The question is why is MyISAM so slow executing the command?
First, some comments on your query (after fixing it up a bit):
SELECT COUNT(*)
FROM user u LEFT JOIN
blog_user bu
ON bu.id_user = u.id
WHERE u.email IS NOT NULL AND bu.id_blog = 1;
Table aliases help make it easier to both write and to read a query. More importantly, You have a LEFT JOIN but your WHERE clause is turning it into an INNER JOIN. So, write it that way:
SELECT COUNT(*)
FROM user u INNER JOIN
blog_user bu
ON bu.id_user = u.id
WHERE u.email IS NOT NULL AND bu.id_blog = 1;
The difference is important because it affects choices that the optimizer can make.
Next, indexes will help this query. I am guessing that blog_user(id_blog, id_user) and user(id, email) are the best indexes.
The reason why the number of columns affects your original query is because it is doing a lot of I/O. The fewer columns then the fewer pages needed to store the records -- and the faster the query runs. Proper indexes should work better and more consistently.
To answer the real question (why is myisam slower than InnoDB), I can't give an authoritative answer.
But it is certainly related to one of the more important differences between the two storage engines : InnoDB does support foreign keys, and myisam doesn't. Foreign keys are important for joining tables.
I don't know if defining a foreign key constraint will improve speed further, but for sure, it will guarantee data consistency.
Another note : you observe that the time decreases as you delete columns. This indicates that the query requires a full table scan. This can be avoided by creating an index on the email column. user.id and blog.id_user hopefully already have an index, if they don't, this is an error. Columns that participate in a foreign key, explicit or not, always must have an index.
This is a long time after the event to be much use to the OP and all the foregoing suggestions for speeding up the query are entirely appropriate but I wonder why no one has remarked on the output of EXPLAIN. Specifically, why the index on email was chosen and how that relates to the definition for the email column in the user table.
The optimizer has selected an index on email column, presumably because it's included in the where clause. key_len for this index is comparatively long and it's a reasonably large table given the auto_increment value so the memory requirements for this index would be considerably greater than if it had chosen the id column (4 bytes against 303 bytes). The email column is NULLABLE but has a default of the empty string so, unless the application explicitly sets a NULL, you are not going to find any NULLs in this column anyway. Neither will you find more than one record with the default given the UNIQUE constraint. The column DEFAULT and UNIQUE constraint appear to be completely at odds with each other.
Given the above, and the fact we only want the count in the query, I'd then wonder if the email part of the where clause serves any purpose other than slowing the query down as each value is compared to NULL. Without it the optimizer would probably pick the primary key and do a much better job. Better yet would be a query which ignored the user table entirely and took the count based on the covering index on blog_user that Gordon Linoff highlighted.
There's another indexing issues here worth mentioning:
On the user table
UNIQUE KEY `id` (`id`) USING BTREE,
is redundant since id is the PRIMARY KEY and therefore UNIQUE by definition.
To answer your last question,
The question is why is MyISAM so slow executing the command?
MyISAM is dependent on the speed of your hard drive,
INNODB once the data is read is at speed of RAM. 1st time query is run could be loading data, second and later will avoid hard drive until aged out of RAM.

MySQL with InnoDB revision control rows

I would like to have a way of controlling/tracking revisions of rows. I am trying to find the best solution for this problem.
The first thing that comes to mind is to have a table with a id to identify the row and and id for the revision number. The combined ids would be the primary key. so example data might look like this:
1, 0, "original post"
1, 1, "modified post"
1, 2, "modified again post"
How can I create a table with this behavior? or is there a better solution to do this?
I like InnoDB since it supports transactions, foreign keys and full text in MySQL 5.6+.
I know its possible to "force" this behavior by how I insert the data but I'm wondering if there is a way to have the table do this automatically.
Consider table structure:
TABLE posts
post_id INT AUTO_INCREMENT PK
cur_rev_id INT FK(revisions.rev_id)
TABLE revisions
rev_id INT AUTO_INCREMENT PK
orig_post INT FK(posts.post_id)
post_text VARCHAR
Where the posts table tracks non-versioned information about the post and its current revision, and revisions tracks each version of the post text with a link back to the parent post. Because of the circular FK constraints you'd need to enclose new post insertions in a transaction.
With this you should be able to easily add, remove, track, roll back, and preview revisions to your posts.
Edit:
Yeah, enclosing in a transaction won't exactly help since the keys are set to AUTO_INCREMENT, so you need to dip back in to PHP with LAST_INSERT_ID() and some temporarily NULL indexes.
CREATE TABLE `posts` (
`post_id` INT(10) NOT NULL AUTO_INCREMENT,
`cur_rev_id` INT(10) NULL DEFAULT NULL,
`post_title` VARCHAR(50) NULL DEFAULT NULL,
PRIMARY KEY (`post_id`),
INDEX `FK_posts_revisions` (`cur_rev_id`),
) ENGINE=InnoDB
CREATE TABLE `revisions` (
`rev_id` INT(10) NOT NULL AUTO_INCREMENT,
`orig_post` INT(10) NULL DEFAULT NULL,
`post_text` VARCHAR(32000) NULL DEFAULT NULL,
PRIMARY KEY (`rev_id`),
INDEX `FK_revisions_posts` (`orig_post`),
) ENGINE=InnoDB
ALTER TABLE `posts`
ADD CONSTRAINT `FK_posts_revisions` FOREIGN KEY (`cur_rev_id`) REFERENCES `revisions` (`rev_id`);
ALTER TABLE `revisions`
ADD CONSTRAINT `FK_revisions_posts` FOREIGN KEY (`orig_post`) REFERENCES `posts` (`post_id`);
Then:
$db_engine->query("INSERT INTO posts (cur_rev_id, post_title) VALUES (NULL, 'My post Title!')");
$post_id = $db_engine->last_insert_id();
$db_engine->query("INSERT INTO revisions (orig_post, post_text) VALUES($post_id, 'yadda yadda')");
$rev_id = $db_engine->last_insert_id();
$db_engine->query("UPDATE posts SET cur_rev_id = $rev_id WHERE post_id = $post_id");
If I've understood you correctly and the table doesn't receive large numbers of updates/deletes then you could look at setting a trigger such as:
DELIMITER $$
CREATE TRIGGER t_table_update BEFORE UPDATE ON table_name
FOR EACH ROW
INSERT INTO table_name_revisions (item_id, data, timestamp)
VALUES(OLD.id, OLD.data, NOW());
END$$
DELIMITER ;
See trigger syntax for more information

What would be the most effective way to insert tags into a table

I have the following tables;
CREATE TABLE IF NOT EXISTS `tags` (
`tag_id` int(11) NOT NULL auto_increment,
`tag_text` varchar(255) NOT NULL,
PRIMARY KEY (`tag_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=9 ;
CREATE TABLE IF NOT EXISTS `users` (
`user_id` int(11) NOT NULL auto_increment,
`user_display_name` varchar(128) default NULL,
PRIMARY KEY (`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=10 ;
CREATE TABLE IF NOT EXISTS `user_post_tag` (
`upt_id` int(11) NOT NULL auto_increment,
`upt_user_id` int(11) NOT NULL,
`upt_post_id` int(11) NOT NULL,
`upt_tag_id` int(11) NOT NULL,
PRIMARY KEY (`upt_id`),
KEY `upt_user_id` (`upt_user_id`),
KEY `upt_post_id` (`upt_post_id`),
KEY `upt_tag_id` (`upt_tag_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=9 ;
CREATE TABLE IF NOT EXISTS `view_post` (
`post_id` int(11)
,`post_url` varchar(255)
,`post_text` text
,`post_title` varchar(255)
,`post_date` datetime
,`user_id` int(11)
,`user_display_name` varchar(128)
);
The idea is that I would like to use the most effective way to save tags, for a post and users. Simply once I add a post I pass few tags along that post and user. Later I would like to be able to count tabs for each user and post. Something very similar to Stack Overflow.
I suppose that the 'tag_text' should be unique? Is if effective that I run a function each time I submit a new post to go through the 'tags' table to check if a tag already exists, and if yes, return its 'tag_id' so I can insert it into 'user_post_tag' table.
Is this maybe a bad approach to tackle this kind of issue.
All suggestions are welcome.
Yes, what you are doing is the best way to do it. You created an n to m relationship, as a post can have multiple tags and the same tag can be on multiple posts. You do not want to store the tag name for each of the posts, so you store the id.
But, you should -NOT- have this redudancy of storing multiple times the same tag_id for the same user. It will hit hard your server if the users have multiple tags and you have to execute SELECT count(...) for each of these tags. Do you understand what I'm talking about here? Because right now, how would get how many times the user A has the tag B? You'd have to do SELECT count(*) FROM user_post_tag INNER JOIN tags ON (...) WHERE user_id=A and tag_id=B.
My suggestion is to split user_post_tag into two tables:
user_tags, to count how many times the user has this tag, primary key would be user_id and tag_id and you'd have a count field, which you would just update with count=count+1 everytime this user makes a new post with the tag. This way, you can simply do SELECT tag_text, count FROM user_tags INNER JOIN tags ON (...) WHERE user_id=A to select all tags (with number of times used) of a given user. You're using a fully indexed query. You're not asking MySQL to go over the table, look for a bunch of rows and count them, you're telling to MySQL, go this row at this table and at the other table, join them and give it to me, fast!
post_tags, to store the tags a certain post have, primary key would be post_id and tag_id, no additional fields needed.
I suppose that the 'tag_text' should
be unique? Is if effective that I run
a function each time I submit a new
post to go through the 'tags' table to
check if a tag already exists, and if
yes, return its 'tag_id' so I can
insert it into 'user_post_tag' table.
Yes, it should be unique. It's way better to check if a tag exists before inserting and inserting if it doesn't than having redundancy and having to do SELECT ... count(*) to know how much times the tag has been used. It will be much mess less frequent post creation than post selection, so if you have to pick between being query intensive on insertion and selection, certainly pick insertion.
By the way, if you'd like to have a count of how many posts have the same tag, like in stack overflow, you'd need another table, with primary key tag_id, and then, like on user_tags, you increment the count field everytime a post gets a certain tag.
Hmmm, if your tags are all unique, then you don't need tag_id and tag_text in the tags table. Just use tag_text and make it the primary key. Then look at REPLACE INTO (http://dev.mysql.com/doc/refman/5.0/en/replace.html) to handle new tags.
Associating tags with users or posts? user_tags table and post_tags table. no auto-increment values just a compound key with user_id and tag_text or post_id and tag_text. I don't know if you're looking at the user_post_tags table for a performance increase over joining a post_tags table with posts and users. Still, "replace into" should be your friend here too.

Categories