I cleaned the question a little bit because it was getting very big and unreadable.
Running on my localhost.
As you can see in the image below, the query takes 755.15 ms when selecting from the table Job that contains 15000 rows (with the where conditions returning 6650)
The table Company contains 1000 rows.
The table geo__name contains 84300 rows approx and is not giving me any problem, so I believe the problem is the database structure or something.
The structure of these 2 tables is the following:
Table Job is:
CREATE TABLE `job` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`created_at` datetime NOT NULL,
`updated_at` datetime NOT NULL,
`company_id` int(11) NOT NULL,
`activity_sector_id` int(11) DEFAULT NULL,
`status` int(11) NOT NULL,
`active` datetime NOT NULL,
`contract_type_id` int(11) NOT NULL,
`salary_type_id` int(11) NOT NULL,
`workday_id` int(11) NOT NULL,
`geoname_id` int(11) NOT NULL,
`title` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`minimum_experience` int(11) DEFAULT NULL,
`min_salary` decimal(7,2) DEFAULT NULL,
`max_salary` decimal(7,2) DEFAULT NULL,
`zip_code` int(11) DEFAULT NULL,
`vacancies` int(11) DEFAULT NULL,
`show_salary` tinyint(1) NOT NULL,
PRIMARY KEY (`id`),
KEY `created_at` (`created_at`,`active`,`status`) USING BTREE,
CONSTRAINT `FK_FBD8E0F823F5422B` FOREIGN KEY (`geoname_id`) REFERENCES `geo__name` (`id`),
CONSTRAINT `FK_FBD8E0F8398DEFD0` FOREIGN KEY (`activity_sector_id`) REFERENCES `activity_sector` (`id`),
CONSTRAINT `FK_FBD8E0F85248165F` FOREIGN KEY (`salary_type_id`) REFERENCES `job_salary_type` (`id`),
CONSTRAINT `FK_FBD8E0F8979B1AD6` FOREIGN KEY (`company_id`) REFERENCES `company` (`id`),
CONSTRAINT `FK_FBD8E0F8AB01D695` FOREIGN KEY (`workday_id`) REFERENCES `workday` (`id`),
CONSTRAINT `FK_FBD8E0F8CD1DF15B` FOREIGN KEY (`contract_type_id`) REFERENCES `job_contract_type` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=15001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
The table company is:
CREATE TABLE `company` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`logo` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`created_at` datetime NOT NULL,
`updated_at` datetime NOT NULL,
`website` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`user_id` int(11) NOT NULL,
`phone` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`cifnif` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`type` int(11) NOT NULL,
`subscription_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `UNIQ_4FBF094FA76ED395` (`user_id`),
KEY `IDX_4FBF094F9A1887DC` (`subscription_id`),
KEY `name` (`name`(191)),
CONSTRAINT `FK_4FBF094F9A1887DC` FOREIGN KEY (`subscription_id`) REFERENCES `subscription` (`id`),
CONSTRAINT `FK_4FBF094FA76ED395` FOREIGN KEY (`user_id`) REFERENCES `user` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
The query is the following:
SELECT
j0_.id AS id_0,
j0_.status AS status_1,
j0_.title AS title_2,
j0_.min_salary AS min_salary_3,
j0_.max_salary AS max_salary_4,
c1_.id AS id_5,
c1_.name AS name_6,
c1_.logo AS logo_7,
a2_.id AS id_8,
a2_.name AS name_9,
g3_.id AS id_10,
g3_.name AS name_11,
j4_.id AS id_12,
j4_.name AS name_13,
j5_.id AS id_14,
j5_.name AS name_15,
w6_.id AS id_16,
w6_.name AS name_17
FROM
job j0_
INNER JOIN company c1_ ON j0_.company_id = c1_.id
INNER JOIN activity_sector a2_ ON j0_.activity_sector_id = a2_.id
INNER JOIN geo__name g3_ ON j0_.geoname_id = g3_.id
INNER JOIN job_salary_type j4_ ON j0_.salary_type_id = j4_.id
INNER JOIN job_contract_type j5_ ON j0_.contract_type_id = j5_.id
INNER JOIN workday w6_ ON j0_.workday_id = w6_.id
WHERE
j0_.active >= CURRENT_TIMESTAMP
AND j0_.status = 1
ORDER BY
j0_.created_at DESC
When executing the above query I have these results:
In MYSQL Workbench: 0.578 sec / 0.016 sec
In Symfony profiler: 755.15 ms
The question is: Is the duration of this query correct? if not, how can I improve the speed of the query? it seems too much.
The Symfony debug toolbar if it helps:
As you can see in the below image, I'm only getting the data I really need:
The explain query:
The timeline:
The MySQL server can't handle the load being placed on it. This could be due to resource contention, or because it has not been appropriately tuned and it could also be a problem with your hard drive.
First, I would start your performance by adding MySQL keyword "STRAIGHT_JOIN" which tells MySQL to query the data in the order I have provided, dont try to think the relationships for me. However, on your dataset being so small, and already 1/2 second, don't know if that will help as much, but on larger datasets I have known it to SIGNIFICANTLY improve performance.
Next, you appear to be getting lookup descriptions based on the PK/FK relationship results. Not seeing the indexes on those tables, I would suggest doing covering indexes which contain both the key and description so the join can get the data from the index pages it uses for the JOIN instead of use index page, find the actual data pages to get the description and continue.
Last, your job table with the index on (created_at,active,status), might perform better if the index had the index as ( status, active, created_at ).
With your existing index, think of it this way, each day of data is put into a single box. Within each day box that is sorted by an active timestamp (even if simplified by active date), THEN the status.
So, for each day CREATED, you open a box. Look at secondary boxes, one for each "Active" timestamp (ex: by day). Within each Active timestamp (day), only now can you see if the "Status = 1" records. So open each active timestamp day, assess Status = 1, then close each created day box and go to the next created day box and repeat. So look at the labor intensive of open each box per day, each active box within that day.
Now, under the suggested index starting with status. You now have a very finite number of boxes, one for each status. Open only the 1 box for status = 1 These are the only ones you want to consider... All the others you don't care. Inside that, you have the actual records based on ACTIVE Timestamp and that is sub-sorted. From that, you can jump directly to those at the current timestamp. From the first record and the rest within the box, you now have all the records that qualify. Done. Since these records (index) ALSO has the Created_at as part of the index, it can optimize that with the descending sort order.
For ensuring "covering indexes" for the other lookup tables if they do not yet exist, I suggest the following.
table index
company ( id, name, logo )
activity_sector (id, name )
geo__name ( id, name )
job_salary_type ( id, name )
job_contract_type ( id, name )
workday ( id, name )
And the MySQL Keyword...
SELECT STRAIGHT_JOIN (rest of query...)
There are several reasons as to why Symfony is slow.
1. Server fault
First, it could be the server fault. Server performances may hinder your query time.
2. Data size and defered rendering
Then comes the data size. As you can see on the image below, the query on one of my project have a 50Mb data size (currently about 20k rows).
Parsing 50Mb in HTML can take some time, mostly because of loops.
Still, there are solutions about this, like defered rendering.
Defered rendering is quite simple, instead of parsing data in your twig you,
send all data to a javascript varaible, and use javascript to parse/render data once the DOM is loaded.
3. Query optimisation
As I wrote in comment, you can check the following question, on which I explained why custom queries are important.
Are Doctrine relations affecting application performance?
In this question, you will read that order matter... It's in fact the most important thing.
While static data in your databases are often inserted in the right order,
it's rarely the case for dynamic data (data provided by user during the website life)
Which is why, using ORDER BY in your query will often speed up the page rendering,
as doctrine won't be doing extra queries on it's own.
As exemple, One of my site have about 700 entries diplayed on the index.
First, here is the query count while using findAll() :
It show 254 query (253 duplicates) in 144ms, plus 39 render time.
Next, using the second parameter of findBy(), ORDER BY, I get this result :
You can see the full query here (sreenshot is big)
Much better, 1 query only in 8ms, and about the same render time.
But, here, I don't use any fields from associations.
From the moment I will do it, doctrine qui do some extra query, and query count and time will skyrocket.
In the end, it will turn back to something like findAll()
And last, this is the custom query :
In this custom query, the query time went from 8ms to 38ms.
But, unlike the previous query, I got way more data in my result,
which will prevent doctrine from doing extra query.
Again, ORDER BY() matter in this query. Without it, I skyrocket back to 84 queries.
4. Partials
When you do custom query, you can load partials objects instead of full data.
As you said in your question, description field seems to slow down your loading speed,
with partials, you can avoid to load some fields from the table, which will speed up query speed.
First, instead of your regular syntax, this is how you will create the query builder :
$em=$this->getEntityManager();
$qb=$em->createQueryBuilder();
Just in case, I prefer to keep $em as a separate variable (if I want to fetch some class repository for example).
Then you can start your partial select. Careful, first select can't include any association fields :
$qb->select("partial job.{id, status, title, minimum_experience, min_salary, max_salary, zip_code, vacancies")
->from(Job::class, "job");
Then you can add your associations :
$qb->addSelect("company")
->join("job.company", "company");
Or even add partial association in case you don't need all the data of the association :
$qb->addSelect("partial activitySector.{id}")
->join("job.activitySector", "activitySector");
$qb->addSelect("partial job.{id, company_id, activity_sector_id, status, active, contract_type_id, salary_type_id, workday_id, geoname_id, title, minimum_experience, min_salary, max_salary, zip_code, vacancies, show_salary");
5. Caches
You could also use various caches, like Zend OPCache for PHP, which you will find some advices in this question: Why Symfony3 so slow?
There is also the SQL cache Varnish.
This round up about everything I can share to lower your loading time.
Hope it will prove useful and you will be able to solve your problem.
So many keys , try to minimize the number of keys.
There are numerous 'getting started' tutorials out there on how to implement zfc-user and zfc-rbac into Zend Framework 2. The github pages for zfc-user and zfc-rbac (https://github.com/ZF-Commons) are clear and the implementation is indeed pretty easy (as stated on many of the tutorials). I also found the SQL schemes which are needed for both zfc-user and zfc-rbac (/vendor/zf-commons/zfc-[user/rbac]/data/).
The creation of a user into the database is easy, since zfc-user already sets this up for you (http://example.com/user). Everything fine so far. Now I want to populate the roles, but it's not clear to me on how to populate the rbac tables correctly. The lack on information about this surprises me, since the zfc-rbac component is a popular module for the Zend Framework.
I understand the principal of Role Based Access Control and the population of the tables for the permissions and the table linking the permissions and roles together are clear, it's the role table that's not clear to me. I understand that you can have a role which has a parent role, but it's not clear how to populate the table with a parent role since there is a foreign key constraint which states the 'parent_role_id' has to be a 'role_id'.
Below is the SQL for the role table (this is the SQL provided by zfc-rbac):
CREATE TABLE `rbac_role` (
`role_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`parent_role_id` int(11) unsigned NOT NULL,
`role_name` varchar(32) NULL,
PRIMARY KEY (`role_id`),
KEY `parent_role_id` (`parent_role_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
ALTER TABLE `rbac_role`
ADD CONSTRAINT `rbac_role_ibfk_1` FOREIGN KEY (`parent_role_id`) REFERENCES `rbac_role` (`role_id`);
With the foreign key in place adding a parent role seems impossible?
INSERT INTO `rbac_role` (parent_role_id, role_name) VALUES (NULL, 'admin');
Basically my question is (and I feel very stupid for asking this) but how does an insert for a parent role look like? And if the insert statement I presented is in fact correct, do I always need to remove the foreign key before inserting a parent role?
Change your create table to the following:
CREATE TABLE `rbac_role` (
`role_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`parent_role_id` int(11) unsigned NULL,
`role_name` varchar(32) NULL,
PRIMARY KEY (`role_id`),
KEY `parent_role_id` (`parent_role_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1;
Notice that parent_role_id is NULL instead of NOT NULL. If parent_role_id is NOT NULL then that means that it has to have a parent but since the foreign key reference is to the same table there is no way to insert a parent row!
fyi: This issue has been fixed. Version 0.2.0 of zfc-rbac will allow NULL value as parent_role_id
I have been over this issue for the last year or so, changing what I am doing and trying different things. The issue is to do with the schema so I can still order nicely in player/clan ladders but if we want to add a stat later it won't lock our table changing every row due to one stat per column.
I see two options for how to do this but both don't seem to be right. One is one stat per column. There would be 4 tables, user_stat_summary (for basic stats shown on ladders), user_stat_beast (teams are human vs beast), user_stat_human and user_stat_overall. Stats are shown everywhere from the last 30 days. A cron job will take any dated stats by getting a query on matches that happened after the 30 days and taking away those stats from the 3 main tables and putting them into the overall one. Matches will have blobs for the stats each player got for that match. The issue I see here is when we have a lot of rows that we can't easily add more stats when say the game changes a little. What I was thinking was a extra_stats blob column on each table and if we add new stats they simply aren't going to be sortable on the ladders.
The other option is an EAV model which is what I have been playing around with but can't seem to get it right. I would be getting many more rows per query and then grouping them into users and the order would work for the most part but I couldn't get limits right for pagination since there was generally an unknown number of rows selected.
What I was thinking is the EAV model with a table that stores ranks per stats which could be used for ordering. So the EAV tables are currently as follows...
CREATE TABLE `user_stat` (
`user_id` int(10) unsigned NOT NULL,
`stat_id` varchar(50) NOT NULL,
`value` int(10) unsigned NOT NULL,
PRIMARY KEY (`user_id`,`stat_id`),
CONSTRAINT `user` FOREIGN KEY (`user_id`) REFERENCES `xf_user` (`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
CREATE TABLE `user_human_stat` (
`user_id` int(10) unsigned NOT NULL,
`stat_id` varchar(50) NOT NULL,
`value` int(10) unsigned NOT NULL,
PRIMARY KEY (`user_id`,`stat_id`),
CONSTRAINT `human_user` FOREIGN KEY (`user_id`) REFERENCES `xf_user` (`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
CREATE TABLE `user_beast_stat` (
`user_id` int(10) unsigned NOT NULL,
`stat_id` varchar(50) NOT NULL,
`value` int(10) unsigned NOT NULL,
PRIMARY KEY (`user_id`,`stat_id`),
CONSTRAINT `beast_user` FOREIGN KEY (`user_id`) REFERENCES `xf_user` (`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
CREATE TABLE `user_stat_overall` (
`user_id` int(10) unsigned NOT NULL,
`human` blob NOT NULL,
`beast` blob NOT NULL,
`total` blob NOT NULL,
PRIMARY KEY (`user_id`),
CONSTRAINT `user_overall` FOREIGN KEY (`user_id`) REFERENCES `xf_user` (`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
So I was thinking I could add a user_stat_rank table which would be user_id, stat_id, rank. Then say I want to get the first page of the ladder ordered by the 'kills' stat I could get all the user_ids order by rank where stat_id is kills. Then make a second query to populate all the users stats.
After writing all this out it seems like it would work fine but I might not be seeing something. I also understand this question is all over the place so if you would like me to edit in details at places just say so.
For sake of managibility, I would stick to adding a column for every stat. In the long run, this will probably be the easiest way to manage it without ending up in a corner due to the limitations that for instance the EAV model would impose on you.
If you're worried about the stats table growing too large, you could consider implementing some form of table partitioning where you regularly move the data older than 4 weeks to (a) historic table(s). The historic table(s) can be indexed to the extreme, as they won't require constant updating.
I am in the process of writing a web-based quiz application using PHP and MySQL. I don't want to bore you with the details of it particularly, so here's what (I think) you need to know.
Questions are all multiple choice, and can be stored in a simple table with a few columns:
ID: The question number (primary index)
Category: The category this question falls under (e.g. animals,
vegetables, minerals)
Text: The question stem (e.g. What is 1+1?)
Answer1: A possible answer (e.g. 2)
Answer2: A possible answer (e.g. 3)
Answer3: A possible answer (e.g. 4)
CorrectAnswer: The correct answer to the question (either 1, 2 or 3 (in this case 1))
Users can sign up by creating a username and password, and then attempt questions from categories.
The problem is that the questions I'm writing are designed to be attempted more than once. However, users need to be given detailed feedback on their progress. The FIRST attempt at a question matters, and contributes to a user's overall 'questions answered first time' score. I therefore need to keep track of how many times a question has been attempted.
Since the application is designed to be flexible, I would like to have support for many hundreds of users attempting many thousands of questions. Thus, trying to integrate this information into the user table or questions table seems to be impossible. The way I would like to approach this problem is to create a new table for each user when they have signed up, with various columns.
Table Name: A user's individual table (e.g. TableForUser51204)
QuestionID: The ID of a question that the user has attempted.
CorrectFirstTime: A boolean value stating whether or not the
question was answered correctly first time.
Correct: The number of times the question has been answered
correctly.
Incorrect: The number of times the question has been answered
incorrectly.
So I guess what I would like to ask is whether or not organising the database in this manner is a wise thing to do. Is there a better approach rather than creating a new table for each user? How much would this hinder the performance if there are say 500 users and 2000 questions?
Thanks.
You don't want to be creating a new table per user. Instead, modify your database structure.
Normally, you'd have a table for questions, a table for options (with maybe a boolean column to indicate if it's the correct answer), a users table, and a join table on users and options to store users' responses. A sample schema:
CREATE TABLE `options` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`question_id` int(10) unsigned NOT NULL,
`text` varchar(255) NOT NULL,
`correct` tinyint(1) NOT NULL,
PRIMARY KEY (`id`),
KEY `question_id` (`question_id`)
) TYPE=InnoDB;
CREATE TABLE `options_users` (
`option_id` int(10) unsigned NOT NULL,
`user_id` int(10) unsigned NOT NULL,
`created` timestamp NOT NULL,
KEY `option_id` (`option_id`),
KEY `user_id` (`user_id`)
) TYPE=InnoDB;
CREATE TABLE `questions` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`question` varchar(255) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id` (`id`,`question`)
) TYPE=InnoDB;
CREATE TABLE `users` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`username` varchar(60) NOT NULL,
`password` char(40) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `username` (`username`)
) TYPE=InnoDB;
ALTER TABLE `options`
ADD CONSTRAINT `options_ibfk_1` FOREIGN KEY (`question_id`) REFERENCES `questions` (`id`) ON DELETE CASCADE ON UPDATE CASCADE;
ALTER TABLE `options_users`
ADD CONSTRAINT `options_users_ibfk_2` FOREIGN KEY (`option_id`) REFERENCES `options` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
ADD CONSTRAINT `options_users_ibfk_1` FOREIGN KEY (`user_id`) REFERENCES `users` (`id`) ON DELETE CASCADE ON UPDATE CASCADE;
This links options to questions, and users' responses to options. I've also added a created column to the options_users table so you can see when a user answered the question and track their progress over time.
I have the following tables;
CREATE TABLE IF NOT EXISTS `tags` (
`tag_id` int(11) NOT NULL auto_increment,
`tag_text` varchar(255) NOT NULL,
PRIMARY KEY (`tag_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=9 ;
CREATE TABLE IF NOT EXISTS `users` (
`user_id` int(11) NOT NULL auto_increment,
`user_display_name` varchar(128) default NULL,
PRIMARY KEY (`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=10 ;
CREATE TABLE IF NOT EXISTS `user_post_tag` (
`upt_id` int(11) NOT NULL auto_increment,
`upt_user_id` int(11) NOT NULL,
`upt_post_id` int(11) NOT NULL,
`upt_tag_id` int(11) NOT NULL,
PRIMARY KEY (`upt_id`),
KEY `upt_user_id` (`upt_user_id`),
KEY `upt_post_id` (`upt_post_id`),
KEY `upt_tag_id` (`upt_tag_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=9 ;
CREATE TABLE IF NOT EXISTS `view_post` (
`post_id` int(11)
,`post_url` varchar(255)
,`post_text` text
,`post_title` varchar(255)
,`post_date` datetime
,`user_id` int(11)
,`user_display_name` varchar(128)
);
The idea is that I would like to use the most effective way to save tags, for a post and users. Simply once I add a post I pass few tags along that post and user. Later I would like to be able to count tabs for each user and post. Something very similar to Stack Overflow.
I suppose that the 'tag_text' should be unique? Is if effective that I run a function each time I submit a new post to go through the 'tags' table to check if a tag already exists, and if yes, return its 'tag_id' so I can insert it into 'user_post_tag' table.
Is this maybe a bad approach to tackle this kind of issue.
All suggestions are welcome.
Yes, what you are doing is the best way to do it. You created an n to m relationship, as a post can have multiple tags and the same tag can be on multiple posts. You do not want to store the tag name for each of the posts, so you store the id.
But, you should -NOT- have this redudancy of storing multiple times the same tag_id for the same user. It will hit hard your server if the users have multiple tags and you have to execute SELECT count(...) for each of these tags. Do you understand what I'm talking about here? Because right now, how would get how many times the user A has the tag B? You'd have to do SELECT count(*) FROM user_post_tag INNER JOIN tags ON (...) WHERE user_id=A and tag_id=B.
My suggestion is to split user_post_tag into two tables:
user_tags, to count how many times the user has this tag, primary key would be user_id and tag_id and you'd have a count field, which you would just update with count=count+1 everytime this user makes a new post with the tag. This way, you can simply do SELECT tag_text, count FROM user_tags INNER JOIN tags ON (...) WHERE user_id=A to select all tags (with number of times used) of a given user. You're using a fully indexed query. You're not asking MySQL to go over the table, look for a bunch of rows and count them, you're telling to MySQL, go this row at this table and at the other table, join them and give it to me, fast!
post_tags, to store the tags a certain post have, primary key would be post_id and tag_id, no additional fields needed.
I suppose that the 'tag_text' should
be unique? Is if effective that I run
a function each time I submit a new
post to go through the 'tags' table to
check if a tag already exists, and if
yes, return its 'tag_id' so I can
insert it into 'user_post_tag' table.
Yes, it should be unique. It's way better to check if a tag exists before inserting and inserting if it doesn't than having redundancy and having to do SELECT ... count(*) to know how much times the tag has been used. It will be much mess less frequent post creation than post selection, so if you have to pick between being query intensive on insertion and selection, certainly pick insertion.
By the way, if you'd like to have a count of how many posts have the same tag, like in stack overflow, you'd need another table, with primary key tag_id, and then, like on user_tags, you increment the count field everytime a post gets a certain tag.
Hmmm, if your tags are all unique, then you don't need tag_id and tag_text in the tags table. Just use tag_text and make it the primary key. Then look at REPLACE INTO (http://dev.mysql.com/doc/refman/5.0/en/replace.html) to handle new tags.
Associating tags with users or posts? user_tags table and post_tags table. no auto-increment values just a compound key with user_id and tag_text or post_id and tag_text. I don't know if you're looking at the user_post_tags table for a performance increase over joining a post_tags table with posts and users. Still, "replace into" should be your friend here too.