Keep updated mysql data between multiple mysql tables

Keep updated mysql data between multiple mysql tables - php

I have two tables in mysql. When I insert/delete values in the first table I want that the values get duplicated in table 2 to keep them "aligned".
table1:
id - username
1 - test_user
table2:
Same id as table1 and username as table1 (on insert/delete)
I want to keep the data between the tables aligned without doing multiple queries. I've read about triggers not sure if it's the correct road, i am a beninner.
I said two tables but i will need to do this in multiple tables.

You can use Mysql triggers. This way you can auto insert/update/delete datas from second table.
MySql Using Triggers

When you INSERT new records, given that you don't want to do two inserts for some reason, using a trigger to insert into the second table will work. For UPDATE and DELETE you might want to look at the CASCADE option with foreign keys. If all you are doing is keeping the data consistent between tables, that's exactly what cascade is for.
When you create table2 you just add a foreign key like this:
FOREIGN KEY (id, username)
REFERENCES table1(id, username) ON UPDATE CASCADE ON DELETE CASCADE
Then whenever you alter table1 the changes will automatically get pushed through to table2.
Couple prerequisites for this to work:
You have to use a storage engine that supports foreign keys, something like InnoDB and not MyISAM
You need to have an index on (id,username) in table1; the foriegn key needs to match a key in the parent table
You should read the doc page for foreign keys. There are a couple other ways you can tweak them, and you should figure out what works best for your purposes.

You can certainly put triggers on your table1 to make parallel changes to your other tables as your application changes table1.
See here for the documentation: http://dev.mysql.com/doc/refman/5.0/en/trigger-syntax.html
But, you should think over your design. It will take multiple queries to do your inserts and updates; they'll just be done "behind your back" on the server. They'll still take time. Triggers can really slow things down.
Also, triggers are a little bit fragile. If you add a column to a table, you'll have to rework your triggers. Triggers are generally a pain in the neck to keep in a source-control system and a huge pain in the neck to test, so using them will make your application more troublesome to maintain.
Could you think of another approach to handling this need for duplication? Could you, for example, use a view or a join to present the data you need to your application program without actually duplicating tables and the rows in them? If you figure out how to do that you'll be much happier in the long run.
CREATE VIEW table2 AS
SELECT *
FROM table1;
will produce a "fake" table2 with the contents of table1.
Or if you're hoping to view only the test users in a second table, a view can do that for you too, for example:
CREATE VIEW table3 AS
SELECT *
FROM table1
WHERE usertype = 'test_user' ;
If you're using duplicate tables for "backup," that's a bad way to make sure your information is safe. Instead, you need to back up your MySQL server instance.
Formal relational database design principles teach us to duplicating data, but instead use view and joins to structure the data the way applications need to see it.

Related

Do you need to set foreign keys in MySQL?

Let's say you have got two tables like the following in a MySQL database:
TABLE people:
primary key: PERSON_ID,
NAME,
SURNAME, etc.
TABLE addresses:
primary key: ADDRESS_ID,
foreign key: PERSON_ID,
addressLine1, etc.
If you manage the creation of rows (in both table) and the retrieving of data trough PHP do you still need to create a physical relationship in the database? If yes, why?

Yes, one concrete reason is to have faster retrieving of rows if you want to join tables. Creating a foreign key constraint automatically creates a an index on the column.
So table address' schema should look like this, (assuming People's table primary key is PERSON_ID)
CREATE TABLE Address
(
Address_ID INT,
Person_ID INT,
......,
CONSTRAINT tb_pk PRIMARY KEY (Address_ID),
CONTRRAINT tb_fk FOREIGN KEY (Person_ID)
REFERENCES People(Person_ID)
)

Strictly speaking: You don't need to use FK's. careful indexing and well written query's might seem to be sufficient. However FK's and certainly FK constraints are very useful when it comes to securing data consistency (avoiding orphaned data, for example)
Suppose you wrote your application, everything is tested and it works like a charm. Great, but who's to say that you'll be around every time something has to be changed? Are you going to maintain the code by yourself or is it likely that someone else might end up doing a quick fix/tweak or implement another feature down the road? In reality, you're never going to be the only one writing and maintaining the code, and even if you are the only one maintaining the code, you're almost certainly going to encounter bugs as time passes...Foreign keys inform both your co-workers and you that data from tbl1 depends on the data from tbl2 and vice-versa. Just like comments, this makes the application easier to maintain.
Bugs are easier to detect: creating a method deleting a record from tbl1, but forgetting to update tbl2 to reflect the changes made to the first tbl. When this happens, the data is corrupted, but the query that caused this won't result in errors: the SQL is syntactically correct and the action it performs is the desired action. These kind of bugs could remain hidden for quite some time, and by the time this is spotted, god knows how much data has been corrupted...
Lastly, and this is an argument that is used all too often, what if the connection to the DB is lost mid-way through a series of update/delete query's? FK Constraints enable you to cascade certain actions. I haven't actually seen this happen, but I know of anybody who doesn't write code to protect against just such a scenarioDeleting or updating several relational records, but mid-way, the connection with the DB gets cut off for some reason. You might have edited tbl2, but the connection was lost before the query to tbl1 was sent. Again, we end up with corrupted data. FK CASCADE's are very useful here. Delete from tbl1, and set an ON DELETE CASCADE rule, so that you can rest assured that the related records are deleted from tbl2. In the same situation, ON DELETE RESTRICT, can be a fairly useful rule, too.
Note that FK's aren't the ultimate answer to life, the universe and everything (that's 42 - as we all know), but they are a vital part of true relational database-designs.

Referential integrity is an article that you should read and comprehend.

there are two ways
-first one is to handle all the things on coding end manage the things on deleting or updating a record
but when you use foreign key you are enforcing the relation and Db don't allow you to delete records with foreign key constraint especially when you don't want to delete the records related to it there is some situations accrue where you need to do this kind of tasks.
-Second way is to manage things on the Db side. If you have 1-to-many or many-to-many relations in database, foreign keys will be very useful. Also they have some good actions - RESTRICT, CASCADE, SET NULL, NO ACTION those can do some work for you

Trigger to multiple tables on INSERT

quick question.
In my user database I have 5 separate tables all containing different information. 4 tables are connected by foreign key to the primary key of the first table.
I am wanting to trigger row inserts on the other 4 tables when I do an insert on the first (primary). I thought that with ON UPDATE CASCADE would do this for me but after trying it I realised it did not...I know clue is in the name ON UPDATE!!!!!
I also tried and failed at multiple triggers on the same table but found this was not possible either.
What I am planning on doing is putting a trigger on the first to INSERT on the second and then putting a trigger on the second to insert on the third......etc
Would just like to know if this is a wise thing to do or not or if I am missing a better and simpler way of doing this.
Any help/advice much appreciated.

Based on the given information, it "feels" as if there might be a flaw in the database design if each of the child tables requires a row for every single row in the parent table. There is a reason that "ON INSERT CASCADE" does not exist; it is typically not considered meaningful.
The first thought that comes to mind is that the child tables should actually be part of the parent table; it sounds as if there is a one-to-one relationship. It still may make sense to have separate tables from an organizational standpoint (and size of records), but it is something to think about.
If there is not a one-to-one relationship, then the ability to add meaningful data beyond default values to the child tables would imply there might be a bit more normalization of data required. If the only values to be added are NULLs, then one could maybe argue that there is no real point in having the record because a LEFT JOIN could produce the same results without that record.
Having said all that, if it is required, I would think that it would be better to have a single trigger on the parent table add all the records to the child tables rather than chain them in several triggers. That way the logic would be contained in a single location.

Not understanding your structure (the information you need in each of these tables is pertinent to correctly answer), I can only guess that a trigger might not be what you want to do this. If your tables have other fields beyond what is in table 1 and they do not have default values, how will you get the value for those other fields inthe trigger? Personally I would use a stored proc to insert to table1 and get the id value back from the insert and then insert to the other tables with the additonal information needed and put it all in a transaction so that if one insert fails all are rolled back.

Purpose of Secondary Key

What is the purpose of the Secondary key? Say I have a table that logs down all the check-ins (similar to Foursquare), with columns id, user_id, location_id, post, time, and there can be millions of rows, many people have stated to use secondary keys to speed up the process.
Why does this work? And should both user_id and location_id be secondary keys?
I'm using mySQL btw...
Edit: There will be a page that lists/calculates all the check-ins for a particular user, and another page that lists all the users who has checked-in to a particular location
mySQL Query
Type 1
SELECT location_id FROM checkin WHERE user_id = 1234
SELECT user_id FROM checkin WHERE location_id = 4321
Type 2
SELECT COUNT(location_id) as num_users FROM checkin
SELECT COUNT(user_id) as num_checkins FROM checkin

The key (also called index) is for speeding up queries. If you want to see all checkins for a given user, you need a key on user_id field. If you want to see all checking for a given location, you need index on location_id field. You can read more at mysql documentation

I want to comment on your question and your examples.
Let me just suggest strongly to you that since you are using MySQL you make sure that your tables are using the innodb engine type for many reasons you can research on your own.
One important feature of InnoDB is that you have referential integrity. What does that mean? In your checkin table, you have a foreign key of user_id which is the primary key of the user table. With referential integrity, MySQL will not let you insert a row with a user_id that doesn't exist in the user table. Using MyISAM, you can. That alone should be enough to make you want to use the innodb engine.
To your question about keys/indexes, essentially when a table is defined and a key is declared for a column or some combination of columns, mysql will create an index.
Indexes are essential for performance as a table grows with the insert of rows.
All relational databases and Document databases depend on an implementation of BTree indexing. What Btree's are very good for, is finding an item (or not) using a predictable number of lookups. So when people talk about the performance of a relational database the essential building block of that is use of btree indexes, which are created via KEY statements or with alter table or create index statements.
To understand why this is, imagine that your user table was simply a text file, with one line per row, perhaps separated by commas. As you add a row, a new line in the text file gets added at the bottom.
Eventually you get to the point that you have 10,000 lines in the file.
Now you want to find out if you entered a line for one particular person with the last name of Smith. How can you find that out?
Without any sort of sortation of the file, or a separate index, you have but one option and that is to start at the first line in the file and scan through every line in the table looking for a match. Even if you found a Smith, that might not be the only 'Smith' in the table, so you have to read the entire file from top to bottom every time you want do do this search.
Obviously as the table grows the performance of searching gets worse and worse.
In relational database parlance, this is known as a "table scan". The database has to start at the first row and scan through reading every row until it gets to the end.
Without indexes, relational databases still work, but they are highly dependent on IO performance.
With a Btree index, the rows you want to find are found in the index first. The indexes have a pointer directly to the data you want, so the table no longer needs to be scanned, but instead the individual data pages required are read. This is how a database can maintain adequate performance even when there are millions or 10's or 100's of millions of rows.
To really start to gain insight into how mysql works, you need to get familiar with EXPLAIN EXTENDED ... and start looking at the explain plans for queries. Simple ones like those you've provided will have simple plans that show you how many rows are being examined to get a result and whether or not they are using one or more indexes.
For your summary queries, indexes are not helpful because you are doing a COUNT(). The table will need to be scanned when you have no other criteria constraining the search.
I did notice what looks like a mistake in your summary queries. Just based on your labels, I would think that these are the right queries to get what you would want given your column alias names.
SELECT COUNT(DISTINCT user_id) as num_users FROM checkin
SELECT COUNT(*) as num_checkins FROM checkin
This is yet another reason to use InnoDB, which when properly configured has a data cache (innodb buffer pool) similar to other rdbms's like oracle and sql server. MyISAM doesn't cache data at all, so if you are repeatedly querying the same sorts of queries that might require a lot of IO, MySQL will have to do all that data reading work over and over, whereas with InnoDB, that data could very well be sitting in cache memory and have the result returned without having to go back and read from storage.
Primary vs Secondary
There really is no such concept internally. A Primary key is special because it allows the database to find one single row. Primary keys must be unique, and to reflect that, the associated Btree index is unique, which simply means that it will not allow you to have 2 keys with the same data to exist in the index.
Whether or not an index is unique is an excellent tool that allows you to maintain the consistency of your database in many other cases. Let's say you have an 'employee' table with the SS_Number column to store social security #. It makes sense to have an index on that column if you want the system to support finding an employee by SS number. Without an index, you will tablescan. But you also want to have that index be unique, so that once an employee with a SS# is inserted, there is no way the database will let you enter a duplicate employee with the same SS#.
But to demystify this for you, when you declare keys these indexes are just being created for you and used automagically in most cases, when you define the tables.
It's when you aren't dealing with keys (primary or foreign) as in the example of usernames, first, last & last names, ss#'s etc., that you need to also be aware of how to create an index because you are searching (using where clause criteria) on one or more columns that aren't keys.

Ids from mysql massive insert from simultaneous sources

I've got an application in php & mysql where the users writes and reads from a particular table. One of the write modes is in a batch, doing only one query with the multiple values. The table has an ID which auto-increments.
The idea is that for each row in the table that is inserted, a copy is inserted in a separate table, as a history log, including the ID that was generated.
The problem is that multiple users can do this at once, and I need to be sure that the ID loaded is the correct.
Can I be sure that if I do for example:
INSERT INTO table1 VALUES ('','test1'),('','test2')
that the ids generated are sequential?
How can I get the Id's that were just loaded, and be sure that those are the ones that were just loaded?
I've thinked of the LOCK TABLE, but the users shouldn't note this.
Hope I made myself clear...

Building an application that requires generated IDs to be sequential usually means you're taking a wrong approach - what happens when you have to delete a value some day, are you going to re-sequence the entire table? Much better to just let the values fall as they may, using a primary key to prevent duplication.

based on the current implementation of myisam and innodb, yes. however, this is not guaranteed to be so in the future, so i would not rely on it.

Fastest way to fill a table

I am trying to find the fastest way to insert data into a table (data from a select)
I always clear the table:
TRUNCATE TABLE table;
Then I do this to insert the data:
INSERT INTO table(id,total) (SELECT id, COUNT(id) AS Total FROM table2 GROUP BY id);
Someone told me I shouldn't do this.
He said this would be much faster:
CREATE TABLE IF NOT EXISTS table (PRIMARY KEY (inskey)) SELECT id, count(id) AS total FROM table2 GROUP BY id
Any ideas on this one?
I think my solution is cleaner, because I don't have to check for the table.
This will be ran in a cron job a few times a day
EDIT: I wasn't clear. The truncate is always ran. It's just the matter of the fastest why to insert all the data

I also think your solution is cleaner, plus the solution by "someone" looks to me to have some problems:
it does not actually delete old data that may be in the table
create table...select will create table columns with types based on what the select returns. That means changes in the table structure of table2 will propagate to table. That may or may not be what you want. It at least introduces an implicit coupling, which I find to be a bad idea.
As for performance, I see no reason why one should be faster than the other. So the usual advice applies: Choose the cleanest, most maintainable solution, test it, only optimize if performance is a problem :-).

Your solution would be my choice, the performance difference loss (if any, which I'm not sure because you don't drop/create the table and re-compute column type) is negligible and IMHO overweight cleanliness.

CREATE TABLE IF NOT EXISTS table (PRIMARY KEY (inskey))
SELECT id, count(id) AS total
FROM table2
GROUP BY
id
This will not delete old values from the table.
If that's what you want, it will be faster indeed.

Perhaps something has been lost in the translation between your Someone and yourself. One possibility s/he might have been referring to is DROP/SELECT INTO vs TRUNCATE/INSERT.
I have heard that the latter is faster as it is minimally logged (but then again, what's the eventual cost of the DROP here?). I have no hard stats to back this up.

I agree with "sleske"s suggestion in asking you test it and optimize the solution yourself. DIY!
Every self respecting DB will give you the opportunity to rollback your transaction.
1. Rolling back your INSERT INTO... will require DB to keep track of every row inserted into the table
2. Rolling back the CREATE TABLE... is super easy for the DB - Simply get rid of the table.
Now, if you were designing & coding the DB, which would be faster? 1 or 2?
"someone"s suggestion DOES have merit especially if you are using Oracle.
Regards,
Shiva

I'm sure that any time difference is indistinguishable, but yours is IMHO preferable because it's one SQL statement rather than two; any change in your INSERT statement doesn't require more work on the other statement; and yours doesn't require the host to validate that your INSERT matches the fields in the table.

From the manual: Beginning with MySQL 5.1.32, TRUNCATE is treated for purposes of binary logging and replication as DROP TABLE followed by CREATE TABLE — that is, as DDL rather than DML. This is due to the fact that, when using InnoDB and other transactional storage engines where the transaction isolation level does not allow for statement-based logging (READ COMMITTED or READ UNCOMMITTED), the statement was not logged and replicated when using STATEMENT or MIXED logging mode.
You can simplify your insert to:
INSERT INTO table
( SELECT id, COUNT(id) FROM table2 GROUP BY id );

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.