Question to all Yii2 normalization geeks out there.
Where is the best place to set non-normalized columns in Yii2?
Example, I have models Customer, Branch, CashRegister, and Transaction.
In a perfect world, and in a perfectly normalized Database, our Transaction model would have only the cashregister_id, The CashRegister would store branch_id, and the Branch would store customer_id. However due to performance issues, we find ourselves obliged sometimes though to have a non-normalized Transaction model containing the following:
cashregister_id
branch_id
customer_id
When creating a transaction, I want to store all 3 values. Setting
$transaction->branch_id = $transaction->cashRegister->branch_id;
$transaction->customer_id = $transaction->cashRegister->branch->customer_id;
however in the controller does not feel correct.
One solution would be to do this in aftersave() in the Transaction model and make those columns read-only. But this also seems better but not perfect.
I wanted to know what is the best practice or where is the best place to set those duplicate columns, to make sure that the data integrity is maintained?
The following is a DB-only solution.
I assume your relations are:
A customer has many branches
A branch has many cashregisters
A cashregister has many transactions
The corresponding schema could be:
create table customers (
customer_id int auto_increment,
customer_data text,
primary key (customer_id)
);
create table branches (
branch_id int auto_increment,
customer_id int not null,
branch_data text,
primary key (branch_id),
index (customer_id),
foreign key (customer_id) references customers(customer_id)
);
create table cashregisters (
cashregister_id int auto_increment,
branch_id int not null,
cashregister_data text,
primary key (cashregister_id),
index (branch_id),
foreign key (branch_id) references branches(branch_id)
);
create table transactions (
transaction_id int auto_increment,
cashregister_id int not null,
transaction_data text,
primary key (transaction_id),
index (cashregister_id),
foreign key (cashregister_id) references cashregisters(cashregister_id)
);
(Note: This should be part of your question - so we wouldn't need to guess.)
If you want to include redundant columns (branch_id and customer_id) in the transactions table, you should make them part of the foreign key. But first you will need to include a customer_id column in the cashregisters table and also make it part of the foreign key.
The extended schema would be:
create table customers (
customer_id int auto_increment,
customer_data text,
primary key (customer_id)
);
create table branches (
branch_id int auto_increment,
customer_id int not null,
branch_data text,
primary key (branch_id),
index (customer_id, branch_id),
foreign key (customer_id) references customers(customer_id)
);
create table cashregisters (
cashregister_id int auto_increment,
branch_id int not null,
customer_id int not null,
cashregister_data text,
primary key (cashregister_id),
index (customer_id, branch_id, cashregister_id),
foreign key (customer_id, branch_id)
references branches(customer_id, branch_id)
);
create table transactions (
transaction_id int auto_increment,
cashregister_id int not null,
branch_id int not null,
customer_id int not null,
transaction_data text,
primary key (transaction_id),
index (customer_id, branch_id, cashregister_id),
foreign key (customer_id, branch_id, cashregister_id)
references cashregisters(customer_id, branch_id, cashregister_id)
);
Notes:
Any foreign key constraint needs an index in the child (referencing) and the parent (referenced) table, which can support the constraint check. The given column order in the keys allows us to define the schema with only one index per table.
A foreign key should always reference a unique key in the parent table. However in this example the composition of referenced columns is (at least) implicitly unique, because it contains the primary key. In almost any other RDBMS you would need to define the indices in the "middle" tables (branches and cashregisters) as UNIQUE. This however is not necessary in MySQL.
The composite foreign keys will take care of the data integrity/consistency. Example: If you have a branch entry with branch_id = 2 and customer_id = 1 - you wan't be able to insert a cashregister with branch_id = 2 and customer_id = 3, because this would violate the foreign key constraint.
You will probably need more indices for your queries. Most probably you will need cashregisters(branch_id) and transactions(cashregister_id). With these indices you might not even need to change your ORM relation code. (though AFAIK Yii supports composite foreign keys.)
You can define relations like "customer has many transactions". Previously you would need to use "has many through", involving two middle/bridge tables. This will save you two joins in many cases.
If you want the redundant data to be maintained by the database, you can use the following triggers:
create trigger cashregisters_before_insert
before insert on cashregisters for each row
set new.customer_id = (
select b.customer_id
from branches b
where b.branch_id = new.branch_id
)
;
delimiter $$
create trigger transactions_before_insert
before insert on transactions for each row
begin
declare new_customer_id, new_branch_id int;
select c.customer_id, c.branch_id into new_customer_id, new_branch_id
from cashregisters c
where c.cashregister_id = new.cashregister_id;
set new.customer_id = new_customer_id;
set new.branch_id = new_branch_id;
end $$
delimiter ;
Now you can insert new entries without defining the redundant values:
insert into cashregisters (branch_id, cashregister_data) values
(2, 'cashregister 1'),
(1, 'cashregister 2');
insert into transactions (cashregister_id, transaction_data) values
(2, 'transaction 1'),
(1, 'transaction 2');
See demo: https://www.db-fiddle.com/f/fE7kVxiTcZBX3gfA81nJzE/0
If your business logic allows to update the relations, you should extend your foreign keys with ON UPDATE CASCADE. This will make the changes through the relation chain down to the transactions table.
I had similar problem once and using afterSave() or beforeSave() looked as a great solution at the beginning, but finally resulted hard to maintain spaghetti code. I ended up with creating separate component for managing such relations. Something like:
class TransactionsManager extends Component {
public function createTransaction(TransactionInfo $info, CashRegister $register) {
// magic
}
}
Then you're not creating or updating Transaction model directly, you're alway using this component and encapsulates all logic in it. Then ActiveRecord works more like a data representation and does not contain any advanced business logic. It looks more complicated in some cases than $model->load($data) && $model->save() but after all it is much easier to maintain when you have all logic in one place and you don't need to debug save() calls chains (one model runs save() of different model in afterSave() which runs save() of different model in afterSave()... and so on).
Related
I am extending a product sales plugin and am trying to understand how wordpress handles database relations. I am building tables on activation using dbDelta. An example of a table schema would be:
$table_schema = [
"CREATE TABLE IF NOT EXISTS `{$wpdb->prefix}plugin_orders` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`people_id` bigint(20) DEFAULT NULL,
`order_id` bigint(20) DEFAULT NULL,
`order_status` varchar(11) DEFAULT NULL,
`order_date` datetime DEFAULT NULL,
`order_total` decimal(13,2) DEFAULT NULL,
`accounting` tinyint(4) DEFAULT '0',
PRIMARY KEY (`id`),
KEY `people_id` (`people_id`),
KEY `order_id` (`order_id`)
) $collate;",
"CREATE TABLE IF NOT EXISTS `{$wpdb->prefix}plugin_order_product` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`order_id` bigint(20) DEFAULT NULL,
`product_id` bigint(20) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `order_id` (`order_id`),
KEY `product_id` (`product_id`)
) $collate;"
];
I see that id in each table is the PRIMARY KEY but what does declaring the other KEYs actually do? I have read that wordpress uses MyISAM which doesn't actually build foreign key connections. While these tables may point to other tables already existing, in this example does declaring KEY order_id (order_id) create a variable of sorts called order_id that any other table can use to reference? Is this code specifically connecting one tables attributes to another tables attributes (it doesn't appear to be)? After these tables are built, I can inspect them in phpMyAdmin and see that there are indexes assigned but no foreign key constraints. How does this code create tables that point one table at another to build relations?
KEY `foo_bar` (`order_id`)
"KEY" is the same as "INDEX". It specifies that a separate data structure is maintained for the efficient access of the table via the column order_id.
foo_bar is the name of the index. It has no special meaning, and has very few uses. For example, DROP KEY foo_bar; is the way to get rid of the index.
In MyISAM, a "FOREIGN KEY" allowed, but ignored. In InnoDB, it does two things:
Create an index if one is not already provided
Provide a constraint. The default effectively "complain if the other table does not already have the value referenced".
Having an index is important for performance. The index above make this
SELECT ... WHERE order_id = 1234 ...
run in milliseconds, even if there are billions of rows in the table. Without the index, the query would take minutes or hours.
A PRIMARY KEY is a UNIQUE key, which is an INDEX.
UNIQUE(widget) says that only one row can have a particular value of `widget in the table.
PRIMARY KEY(id) says that each row is uniquely identified by the column id. InnoDB really wants each table to have a PK.
"id" is a convention (not a requirement) for the name of the PK. It is also INT AUTO_INCREMENT by convention. You may or may not actually ever touch id.
Tables can be related to each other in 3 main ways:
1:1 -- They share the same unique key. This is rarely useful; you may as well have a single table.
1:many -- An "order" has several "items" in it (one-order : many-items). This is usually handled by order_id being a column in the items table.
many:many -- students_classes -- each student is in many classes; each class has many students. This is implemented via a mapping table that has (usually) only two columns: student_id and class_id (no id is needed) and PRIMARY KEY(student_id, class_id) and INDEX(class_id, student_id). Those two indexes make it efficient to go from a known student to their classes, and vice versa.
Another convention for the PK of a table is to include the table name. (It is clutter to do that for other columns, such as order_status.) I was assuming this convention for student_id and class_id.
But now I am confused by your plugin_orders -- it has both id and order_id. If that table describes "orders", then I would expect order_id to be the PK instead of id.
And, if order_product is a list of all the "products" in each "order", then I would expect you to have the 1:many pattern.
What indexes to have?
PRIMARY KEY to uniquely identify each row -- either id or some column (or combination of columns) that are unique.
Other columns, as needed, for the SELECTs, UPDATEs, and DELETEs that you have. Do not blindly add indexes before having some clues of the queries that might need them.
Indexes sometimes help in sorting:
SELECT ... ORDER BY last_name, first_name;
together with
INDEX(last_name, first_name)
Indexes provide performance; FKs provide integrity checks. Neither is "required"; both are "desirable".
MyISAM is ancient; you should change to InnoDB.
Then do something like
SELECT ...
FROM plugin_orders AS o
JOIN plugin_order_product AS op
ON o.order_id = op.order_id
WHERE ...
In this example, the Optimizer will perform the query something like this:
Look at the WHERE to see which table is best filtered by the conditions there. Declare that to be the first table work with.
Scan through the first table, using an index if practical.
For each row in the first table, reach into the second table.
Reaching into the second table would probably be done via INDEX(order_id) on the second table. This would make the JOIN fast and efficient.
Both tables have INDEX(order_id), but that is not relevant.
Next example:
SELECT ...
FROM plugin_orders AS o
JOIN plugin_order_product AS op
ON o.order_id = op.order_id
WHERE o.people_id = 123 -- note
Pick o as the first table due to filtering on people_id
use op INDEX(people_id) to rapidly find the o rows that are relevant.
etc (op is the second table)
Next example:
SELECT ...
FROM plugin_orders AS o
JOIN plugin_order_product AS op
ON o.order_id = op.order_id
WHERE op.product_id = 9887 -- changed again
Pick op as the first table due to filtering on product_id
use o INDEX(people_id) to rapidly find the op rows that are relevant.
etc (o is the second table this time)
Let's imagine simple real world customer-loan relationship scenario, where loan existence without customer is impossible, hence the relationship logically should be many-to-one identifying relationship with the following structure:
CREATE TABLE `customer` (
`id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
`name` VARCHAR(50)
) ENGINE = InnoDB;
CREATE TABLE `loan` (
`id` INT NOT NULL AUTO_INCREMENT,
`customer_id` INT NOT NULL,
`amount` FLOAT,
`currency` VARCHAR(10),
PRIMARY KEY (`id`, `customer_id`),
CONSTRAINT `identifying_fk` FOREIGN KEY (`customer_id`) REFERENCES `customer` (`id`)
) ENGINE = InnoDB;
On the other hand, the same logic technically can be applied with many-to-one non-identifying mandatory relationship with the following structure:
CREATE TABLE `customer` (
`id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
`name` VARCHAR(50)
) ENGINE = InnoDB;
CREATE TABLE `loan` (
`id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
`customer_id` INT NOT NULL,
`amount` FLOAT,
`currency` VARCHAR(10),
CONSTRAINT `non-identifying_fk` FOREIGN KEY (`customer_id`) REFERENCES `customer` (`id`)
) ENGINE = InnoDB;
Question: What are the advantages and disadvantages of using identifying relationship over non-identifying relationship or vice versa? Are there any technical preferences choosing one over another?
NB. One of the disadvantage using identifying relationship is composite PRIMARY KEY, which are generally difficult to maintain.
For example PHP Doctrine ORM does not support operating on such composite key, where one id is auto generated and the second key (foreign key) is the identifier of parent entity.
If you have an auto_increment column, then that should be the primary key. In general, I avoid composite primary keys. They just introduce scope for error in foreign key definitions and join conditions. You also point out the limitation when using other tools.
I would expect this question for an n-m relationship. That is one case where there is a good argument for a composite primary key. However, in your case, loans have only one customer, so the second method seems more "correct".
Meanwhile I read about the difference between identifying relationships and non-identifying relationships.
In your example, you have a many to one relationship. As such, the loans do not qualify for an identifying relationship, because the customer id is not sufficient to identify a loan. Thus the relationship is non-identifying.
If each customer can have only one loan, there would be a one to one relationship between the loans and the customers. The customer id would be sufficient to identify a loan, thus we have an identifying relationship. In this case, it would be a good choice to set the customer_id column of the loans table as a primary key.
Identifying relationships are also used with the link table in a many to many relationship.
I have an InnoDB MySQL database with a table that needs to be able to connect to one of 26 other tables via a foreign key. Each record will only connect to one of these 26 at a time. The table will probably consist of no more than 10,000 records. Is there an alternative way to do this?
-- -----------------------------------------------------
-- Table `db_mydb`.`tb_job`
-- -----------------------------------------------------
CREATE TABLE IF NOT EXISTS `db_mydb`.`tb_job` (
`job_id` INT(11) NOT NULL AUTO_INCREMENT ,
// Removed 26 other fields that the table requires
`job_foreignkey_a_id` INT(11) NULL DEFAULT NULL ,
`job_foreignkey_b_id` INT(11) NULL DEFAULT NULL ,
`job_foreignkey_c_id` INT(11) NULL DEFAULT NULL ,
// Removed the other 23 foreign keys fields that are the same
PRIMARY KEY (`job_id`) ,
CONSTRAINT `fka_tb_job_tb`
FOREIGN KEY (`job_foreignkey_a_id` )
REFERENCES `db_mydb`.`tb_foreignkey_a` (`foreignkey_a_id` )
ON DELETE NO ACTION
ON UPDATE NO ACTION,
CONSTRAINT `fkb_tb_job_tb`
FOREIGN KEY (`job_foreignkey_b_id` )
REFERENCES `db_mydb`.`tb_foreignkey_b` (`foreignkey_b_id` )
ON DELETE NO ACTION
ON UPDATE NO ACTION,
CONSTRAINT `fkc_tb_job_tb`
FOREIGN KEY (`job_foreignkey_c_id` )
REFERENCES `db_mydb`.`tb_foreignkey_c` (`foreignkey_c_id` )
ON DELETE NO ACTION
ON UPDATE NO ACTION)
// Removed the other 23 foreign keys constraints that are the same
ENGINE = InnoDB
DEFAULT CHARACTER SET = utf8;
CREATE INDEX `fka_tb_job_tb` ON `db_mydb`.`tb_job` (`job_foreignkey_a_id` ASC) ;
CREATE INDEX `fkb_tb_job_tb` ON `db_mydb`.`tb_job` (`job_foreignkey_b_id` ASC) ;
CREATE INDEX `fkc_tb_job_tb` ON `db_mydb`.`tb_job` (`job_foreignkey_c_id` ASC) ;
// Removed the other 23 foreign keys indexes that are the same
This is the problem of generic foreign keys, which MySQL and friends tend not to support. There are two ways you can do this.
The first, as you have done, is nullable foreign keys, one for every type.
The other, as in Django's Content Types, is to have a join table, each row having a row id and a field that specifies the table to look up on. Your code then has to formulate the SQL query depending on the contents of the field. It works well, but has limitations:
The downside of the first one is bloat, but it brings you the upsides of normal FKs, i.e. referential integrity and SQL joins etc, both of which are very valuable. You can't get those with the second method.
Depends if you want to maintain foreign key constraint, you can have one table that references one of the tables by a key or table type. Problem is you will loose the foreign key constraint. Of course, if you can create a function based constraint, then it can work for you. Or you can enforce the relationship using a trigger. Function based constraints are not available in mysql.
Yes, you can do that. These two StackOverflow answers illustrate the underlying principles in a slightly different context.
Same data from different entities in Database - Best Practice - Phone numbers example
Different user types / objects own content in same table - how?
Using MySQL, you'll need to replace critical CHECK() constraints with foreign key references. This doesn't work in the most general case for MySQL, but it does work in this particular application.
If this isn't enough information to get you going, leave me a comment, and I'll try to expand this answer a little more.
Ok, so i'm a newbie here at SQL..
I'm settings up my tables, and i'm getting confused on indexes, keys, foreign keys..
I have a users table, and a projects table.
I want to use the users (id) to attach a project to a user.
This is what I have so far:
DROP TABLE IF EXISTS projects;
CREATE TABLE projects (
id int(8) unsigned NOT NULL,
user_id int(8),
name varchar(120) NOT NULL,
description varchar(300),
created_at date,
updated_at date,
PRIMARY KEY (id),
KEY users_id (user_id)
) ENGINE=InnoDB;
ALTER TABLE projects (
ADD CONSTRAINT user_projects,
FOREIGN KEY (user_id) REFERENCES users(id),
ON DELETE CASCADE
)
So what I'm getting lost on is what is the differences between a key, an index, a constraint and a foreign key?
I've been looking online and can't find a newbie explanation for it.
PS. I'm using phpactiverecord and have the relationships set up in the models
user-> has_many('projects');
projects -> belongs_to('user');
Not sure if that has anything to do with it, but thought i'd throw it in there..
Thanks.
EDIT:
I thought it could possible be something to do with Navicat, so I went into WampServer -> phpMyAdmin and ran this...
DROP TABLE IF EXISTS projects;
CREATE TABLE projects (
id int(8) unsigned NOT NULL,
user_id int(8) NOT NULL,
name varchar(120) NOT NULL,
description varchar(300),
created_at date,
updated_at date,
PRIMARY KEY (id),
KEY users_id (user_id),
FOREIGN KEY (user_id) REFERENCES users(id)
) ENGINE=InnoDB;
Still nothing... :(
Expanding on Shamil's answers:
INDEX is similar to the index at the back of a book. It provides a simplified look-up for the data in that column so that searches on it are faster. Fun details: MyISAM uses a hashtable to store indexes, which keys the data, but is still linearly proportional in depth to the table size. InnoDB uses a B-tree structure for its indexes. A B-tree is similar to a nested set - it breaks down the data into logical child groups, meaning search depth is significantly smaller. As such, lookups by ranges are faster in a InnoDB, whereas lookups of a single key are faster in MyISAM (try to remember the Big O of hashtables and binary trees).
UNIQUE INDEX is an index in which each row in the database must have a unique value for that column or group of columns. This is useful for preventing duplication, e.g. for an email column in a users table where you want only one account per email address. Important note that in MySQL, an INSERT... ON DUPLICATE KEY UPDATE statement will execute the update if it finds a duplicate unique index match, even if it's not your primary key. This is a pitfall to be aware of when using INSERT... UPDATE statements on tables with uniques. You may wind up unintentionally overwriting records! Another note about Uniques in MySQL - per the ANSI-92 standard, NULL values are not to be considered unique, which means you can have multiple NULL values in a nullable unique-indexed column. Although it's a standard, some other RDBMSes differ on implementation of this.
PRIMARY KEY is a UNIQUE INDEX that is the identifier for any given row in the table. As such, it must not be null, and is saved as a clustered index. Clustered means that the data is written to your filesystem in ascending order on the PK. This makes searches on primary key significantly faster than any other index type (as in MySQL, only the PK may be your clustered index). Note that clustering also causes concerns with INSERT statements if your data is not AUTO_INCREMENTed, as MySQL will have to shift data around on the filesystem if you insert a new row with a PK with a lower ordinal value. This could hamper your DB performance. So unless you're certain you know what you're doing, always use an auto-incremented value for your PK in MySQL.
FOREIGN KEY is a reference to a column in another table. It enforces Referential Integrity, which means that you cannot create an entry in a column which has a foreign key to another table if the entered value does not exist in the referenced table. In MySQL, a FOREIGN KEY does not improve search performance. It also requires that both tables in the key definition use the InnoDB engine, and have the same data type, character set, and collation.
KEY is just another word for INDEX.
A UNIQUE index means that all values within that index must be unique, and not the same as ant other within that index. An example would be an Id column in a table.
A PRIMARY KEY is a unique index where all key columns must be defined as NOT NULL, i.e, all values in the index must be set. Ideally, each table should have (and can have) one primary key only.
A FOREIGN KEY is a referential constraint between two tables. This column/index must have the same type and length as the referred column within the referred table. An example of a FOREIGN KEY is a userId, between a user-login table and a users table. Note that it usually points to a PRIMARY KEY in the referred table.
http://dev.mysql.com/doc/refman/5.1/en/create-table.html
I'm trying to create some tables in a mysql db to handle customers, assign them to groups and give customers within these groups unique promotion codes/coupons.
there are 3 parent(?) tables - customers, groups, promotions
then I have table - customerGroups to assign each customer_id to many group_id's
also I have - customerPromotions to assign each customer_id to many promotion_id's
I know I need to use cascade on delete and update so that when I delete a customer, promotion or group the data is also removed from the child tables. I put together some php to create the tables easily http://pastebin.com/gxhW1PGL
I've been trying to read up on cascade, foreign key references but I think I learn better by trying to do things then learning why they work. Can anyone please give me their input on what I should do to these tables to have them function correctly.
I would like to have the database and tables set up correctly before I start with queries or anything further so any advice would be great.
You seem to want just a little guidance. So I'll try to be brief.
$sql = "CREATE TABLE customerGroups (
customer_id int(11) NOT NULL,
group_id int(11) NOT NULL,
PRIMARY KEY (customer_id, group_id),
CONSTRAINT customers_customergroups_fk
FOREIGN KEY (customer_id)
REFERENCES customers (customer_id)
ON DELETE CASCADE,
CONSTRAINT groups_customergroups_fk
FOREIGN KEY (group_id)
REFERENCES groups (group_id)
ON DELETE CASCADE
)ENGINE = INNODB;";
You only need id numbers when identity is hard to nail down. When you're dealing with people, identity is hard to nail down. There are lots of people named "John Smith".
But you're dealing with two things that have already been identified. (And identified with id numbers, of all things.)
Cascading deletes makes sense. It's relatively rare to cascade updates on id numbers; they're presumed to never change. (The main reason Oracle DBAs insist that primary keys must always be ID numbers, and that they must never change is because Oracle can't cascade updates.) If, later, some id numbers need to change for whatever reason, you can alter the table to include ON UPDATE CASCADE.
$sql = "CREATE TABLE groups
(
group_id int(11) NOT NULL AUTO_INCREMENT,
group_title varchar(50) NOT NULL UNIQUE,
group_desc varchar(140),
PRIMARY KEY (group_id)
)ENGINE = INNODB;";
Note the additional unique constraint on group_title. You don't want to allow anything like this (below) in your database.
group_id group_title
--
1 First group
2 First group
3 First group
...
9384 First group
You'll want to carry those kinds of changes through all your tables. (Except, perhaps, your table of customers.)