I currently am working on some project that insert a lot of data into some tables. To ensure that my system is fast enough, I want to fragment my huge table into some smaller tables representing the months data. I have an idea of how it will work, but I still need some more informations.
The primary keys of my tables must be continuous so I thought of an architecture that would look like this:
CREATE TABLE `foo` (
`id` bigint(11) unsigned NOT NULL AUTO_INCREMENT,
}
CREATE TABLE `foo012014` (
`id` bigint(11),
`description` varchar(255),
}
CREATE TABLE `foo022014` (
`id` bigint(11),
`description` varchar(255),
}
On every insertion, the PHP page will look if a table already exists for the month and if not will create it.
The thing is, how do I get to bind the "foo" child table primary key to the "foo" mother table? Plus, is this design a bad practice or is it good?
It's not a good pratice, and difficult your queries.
With just the id you already have an index, which allows for better indexing of your data.
If your queries are also nicely written and organized, the time to execute a query in your database will be relatively small with 1 million rows or 20.
Solutions
First
For a better maintenance I recommend the following:
Add a new field in your table food: created datetime DEFAULT CURRENT_TIMESTAMP (works in MySQL 5.6+, for other versions, or set manually in every insert, or change to timestamp)
And, just use this field for group your data basead in datetime values, like that: 2014-01-24 13:18.
It's easy to select and manipulate.
Second
Create a external table with month and year, like that:
drop table if exists foo_periods;
create table foo_periods (
id int not null auto_increment primary key,
month smallint(4) not null,
year smallint(4) not null,
created datetime,
modified datetime,
active boolean not null default 1,
index foo_periods_month (month),
index foo_periods_year (year)
);
You can change smallint in month to varchar if you feels better.
Then, just create a FK, and done!
ALTER TABLE foo
ADD COLUMN foo_period_id int not null;
ALTER TABLE foo
ADD CONSTRAINT foo_foo_period_id
FOREIGN KEY (foo_period_id)
REFERENCES foo_periods (id);
References
If you want read more about fragmentation / optimization in MySQL, this is a great post.
Related
First of all, I'm from Spain so I'm sorry if I made some mistakes writing. So, I have two problems. It will be better if I give context before. I am not even junior, still learning code, and I thought that it will be a good proyect to create a web page where you can add ingredients, foods with that ingredients, etc. So I decided to start learning PHP and SQL. Now I'm trying to create a database, starting with some ingredients and two kinds of rices. My 1st problem is that I don't know if I need to create a data base for that. The second and main one is that I don't have any idea about how to get this working as I want.
See, First of all I created the table for ingredients´
CREATE TABLE ingredientes(
id int(255) auto_increment not null,
ingrediente varchar(255) not null,
CONSTRAINT pk_ingredientes PRIMARY KEY(id) )ENGINE=InnoDb;
Sorry 'cause it's on spanish :/, but nothing to hard to understand.
So I add some ingredients.
Here the pic showing them
After that I created two tables, and add ingredients to them.
CREATE TABLE arroz_con_pollo(
id int(255) auto_increment not null,
ingrediente int(255) not null,
CONSTRAINT pk_arroz_con_pollo PRIMARY KEY(id),
CONSTRAINT fk_pollo_ingredientes FOREIGN KEY(ingrediente) REFERENCES ingredientes(id) )ENGINE=InnoDb;
CREATE TABLE arroz_cubana(
id int(255) auto_increment not null,
ingrediente int(255) not null,
CONSTRAINT pk_arroz_cubana PRIMARY KEY(id),
CONSTRAINT fk_cubana_ingredientes FOREIGN KEY(ingrediente) REFERENCES ingredientes(id))ENGINE=InnoDb;
Here the picture showing the ID's.
Here
So now I spend a lot of time researching and find out that I can show the names by using this command
SELECT a.id,i.ingrediente
FROM ingredientes i, arroz_cubana a
WHERE i.id = a.id;
And have something like this
At this point, everything is, more or less, working. My issue came when I want to create a data base that keep all the names (arroz con pollo, arroz cubana...) in an only table named as 'rices' to be able to choose a name, and automatically have the ingredients there, without any complication for the user. But, I literally have no idea. I've been coding for hours without any victory on that. And I haven't see anything similar on the web so, if someone tell me how to fix that issue or how to make that idea of a web to keep ingredients and foods, I'll be very greatful.
Your data structure is messed up. SQL is not designed to have a separate table for each ingredient. Instead, you want two other tables.
The first is for dishes:
CREATE TABLE dishes (
dish_id int auto_increment not null,
name varchar(255)
);
You would then insert appropriate rows into this:
INSERT INTO dishes (name)
VALUES ('arroz_on_pollo');
Then you have another table for the ingredients:
CREATE TABLE dishes_ingredients (
dish_ingredient_id int auto_increment primary key,
dish_id int not null
ingredient_id int not null,
CONSTRAINT fk_dish_ingredientes_dish FOREIGN KEY(dish_id) REFERENCES dishes(dish_id)
CONSTRAINT fk_dish_ingredientes_dish FOREIGN KEY(ingredient_id) REFERENCES ingredientes(ingredient_id)
);
Voila! New dishes are just rows in a table, so you can get the names using a SELECT.
Notes on structure:
int(255) really makes no sense. Just use int. The number in parentheses is a width for the value when printing it and 255 is a ridiculous width.
I am a fan of naming primary keys with the table name. That way, the primary key and foreign key typically have the same name.
You should not have a table per dish. Create one table "dish", that includes a column "name". Each row represents a dish. Then create a supporting table where you list the (multiple) ingredients for each dish. Look around for a tutorial on databases, this topic is too large to explain in a stackoverflow question (or several).
And so you do not need to be able to list the table names, the way you were considering. (Which is not something SQL supports directly; different databases provide non-standard ways to do it, but as explained you do not actually need such a feature.)
I am extending a product sales plugin and am trying to understand how wordpress handles database relations. I am building tables on activation using dbDelta. An example of a table schema would be:
$table_schema = [
"CREATE TABLE IF NOT EXISTS `{$wpdb->prefix}plugin_orders` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`people_id` bigint(20) DEFAULT NULL,
`order_id` bigint(20) DEFAULT NULL,
`order_status` varchar(11) DEFAULT NULL,
`order_date` datetime DEFAULT NULL,
`order_total` decimal(13,2) DEFAULT NULL,
`accounting` tinyint(4) DEFAULT '0',
PRIMARY KEY (`id`),
KEY `people_id` (`people_id`),
KEY `order_id` (`order_id`)
) $collate;",
"CREATE TABLE IF NOT EXISTS `{$wpdb->prefix}plugin_order_product` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`order_id` bigint(20) DEFAULT NULL,
`product_id` bigint(20) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `order_id` (`order_id`),
KEY `product_id` (`product_id`)
) $collate;"
];
I see that id in each table is the PRIMARY KEY but what does declaring the other KEYs actually do? I have read that wordpress uses MyISAM which doesn't actually build foreign key connections. While these tables may point to other tables already existing, in this example does declaring KEY order_id (order_id) create a variable of sorts called order_id that any other table can use to reference? Is this code specifically connecting one tables attributes to another tables attributes (it doesn't appear to be)? After these tables are built, I can inspect them in phpMyAdmin and see that there are indexes assigned but no foreign key constraints. How does this code create tables that point one table at another to build relations?
KEY `foo_bar` (`order_id`)
"KEY" is the same as "INDEX". It specifies that a separate data structure is maintained for the efficient access of the table via the column order_id.
foo_bar is the name of the index. It has no special meaning, and has very few uses. For example, DROP KEY foo_bar; is the way to get rid of the index.
In MyISAM, a "FOREIGN KEY" allowed, but ignored. In InnoDB, it does two things:
Create an index if one is not already provided
Provide a constraint. The default effectively "complain if the other table does not already have the value referenced".
Having an index is important for performance. The index above make this
SELECT ... WHERE order_id = 1234 ...
run in milliseconds, even if there are billions of rows in the table. Without the index, the query would take minutes or hours.
A PRIMARY KEY is a UNIQUE key, which is an INDEX.
UNIQUE(widget) says that only one row can have a particular value of `widget in the table.
PRIMARY KEY(id) says that each row is uniquely identified by the column id. InnoDB really wants each table to have a PK.
"id" is a convention (not a requirement) for the name of the PK. It is also INT AUTO_INCREMENT by convention. You may or may not actually ever touch id.
Tables can be related to each other in 3 main ways:
1:1 -- They share the same unique key. This is rarely useful; you may as well have a single table.
1:many -- An "order" has several "items" in it (one-order : many-items). This is usually handled by order_id being a column in the items table.
many:many -- students_classes -- each student is in many classes; each class has many students. This is implemented via a mapping table that has (usually) only two columns: student_id and class_id (no id is needed) and PRIMARY KEY(student_id, class_id) and INDEX(class_id, student_id). Those two indexes make it efficient to go from a known student to their classes, and vice versa.
Another convention for the PK of a table is to include the table name. (It is clutter to do that for other columns, such as order_status.) I was assuming this convention for student_id and class_id.
But now I am confused by your plugin_orders -- it has both id and order_id. If that table describes "orders", then I would expect order_id to be the PK instead of id.
And, if order_product is a list of all the "products" in each "order", then I would expect you to have the 1:many pattern.
What indexes to have?
PRIMARY KEY to uniquely identify each row -- either id or some column (or combination of columns) that are unique.
Other columns, as needed, for the SELECTs, UPDATEs, and DELETEs that you have. Do not blindly add indexes before having some clues of the queries that might need them.
Indexes sometimes help in sorting:
SELECT ... ORDER BY last_name, first_name;
together with
INDEX(last_name, first_name)
Indexes provide performance; FKs provide integrity checks. Neither is "required"; both are "desirable".
MyISAM is ancient; you should change to InnoDB.
Then do something like
SELECT ...
FROM plugin_orders AS o
JOIN plugin_order_product AS op
ON o.order_id = op.order_id
WHERE ...
In this example, the Optimizer will perform the query something like this:
Look at the WHERE to see which table is best filtered by the conditions there. Declare that to be the first table work with.
Scan through the first table, using an index if practical.
For each row in the first table, reach into the second table.
Reaching into the second table would probably be done via INDEX(order_id) on the second table. This would make the JOIN fast and efficient.
Both tables have INDEX(order_id), but that is not relevant.
Next example:
SELECT ...
FROM plugin_orders AS o
JOIN plugin_order_product AS op
ON o.order_id = op.order_id
WHERE o.people_id = 123 -- note
Pick o as the first table due to filtering on people_id
use op INDEX(people_id) to rapidly find the o rows that are relevant.
etc (op is the second table)
Next example:
SELECT ...
FROM plugin_orders AS o
JOIN plugin_order_product AS op
ON o.order_id = op.order_id
WHERE op.product_id = 9887 -- changed again
Pick op as the first table due to filtering on product_id
use o INDEX(people_id) to rapidly find the op rows that are relevant.
etc (o is the second table this time)
I had originally wanted my alarmID value to be the primary key and to be Auto incremented. But I have decided to make my Title value as so.
How can I manually auto increment alarmID so that every time I insert values, the alarmID value gets incremented by exactly 1.
I want a way to keep track of entries by when they were inserted to be display chronologically later on.
Here is how I have my php code.
$sql = "CREATE TABLE IF NOT EXISTS alarms (
alarmID INT NOT NULL,
PRIMARY KEY (Title),
Title CHAR(30) NOT NULL,
Description TEXT,
DT DATETIME
)";
Something like this should work, you still get a unique indexed title and the auto increment alarmID, it's much more subtle than using a mysql function / proc.
CREATE TABLE IF NOT EXISTS alarms (
alarmID MEDIUMINT NOT NULL AUTO_INCREMENT,
Title CHAR(30) NOT NULL,
Description TEXT,
DT DATETIME,
PRIMARY KEY (alarmID),
UNIQUE KEY title (Title)
);
Your best bet here would be to make alarmID an auto incrementing primary key and, if Title has to be unique, place a unique constraint on it.
Manually computing the new incremented ID could in fact lead to issues if multiple users use your system, so it is better to leave the job to the DBMS itself.
I have come up with a total of three different, equally viable methods for saving data for a graph.
The graph in question is "player's score in various categories over time". Categories include "buildings", "items", "quest completion", "achievements" and so on.
Method 1:
CREATE TABLE `graphdata` (
`userid` INT UNSIGNED NOT NULL,
`date` DATE NOT NULL,
`category` ENUM('buildings','items',...) NOT NULL,
`score` FLOAT UNSIGNED NOT NULL,
PRIMARY KEY (`userid`, `date`, `category`),
INDEX `userid` (`userid`),
INDEX `date` (`date`)
) ENGINE=InnoDB
This table contains one row for each user/date/category combination. To show a user's data, select by userid. Old entries are cleared out by:
DELETE FROM `graphdata` WHERE `date` < DATE_ADD(NOW(),INTERVAL -1 WEEK)
Method 2:
CREATE TABLE `graphdata` (
`userid` INT UNSIGNED NOT NULL,
`buildings-1day` FLOAT UNSIGNED NOT NULL,
`buildings-2day` FLOAT UNSIGNED NOT NULL,
... (and so on for each category up to `-7day`
PRIMARY KEY (`userid`)
)
Selecting by user id is faster due to being a primary key. Every day scores are shifted down the fields, as in:
... SET `buildings-3day`=`buildings-2day`, `buildings-2day`=`buildings-1day`...
Entries are not deleted (unless a user deletes their account). Rows can be added/updated with an INSERT...ON DUPLICATE KEY UPDATE query.
Method 3:
Use one file for each user, containing a JSON-encoded array of their score data. Since the data is being fetched by an AJAX JSON call anyway, this means the file can be fetched statically (and even cached until the following midnight) without any stress on the server. Every day the server runs through each file, shift()s the oldest score off each array and push()es the new one on the end.
Personally I think Method 3 is by far the best, however I've heard bad things about using files instead of databases - for instance if I wanted to be able to rank users by their scores in different categories, this solution would be very bad.
Out of the two database solutions, I've implemented Method 2 on one of my older projects, and that seems to work quite well. Method 1 seems "better" in that it makes better use of relational databases and all that stuff, but I'm a little concerned in that it will contain (number of users) * (number of categories) * 7 rows, which could turn out to be a big number.
Is there anything I'm missing that could help me make a final decision on which method to use? 1, 2, 3 or none of the above?
If you're going to use a relational db, method 1 is much better than method 2. It's normalized, so it's easy to maintain and search. I'd change the date field to a timestamp and call it added_on (or something that's not a reserved word like 'date' is). And I'd add an auto_increment primary key score_id so that user_id/date/category doesn't have to be unique. That way, if a user managed to increment his building score twice in the same second, both would still be recorded.
The second method requires you to update all the records every day. The first method only does inserts, no updates, so each record is only written to once.
... SET buildings-3day=buildings-2day, buildings-2day=buildings-1day...
You really want to update every single record in the table every day until the end of time?!
Selecting by user id is faster due to being a primary key
Since user_id is the first field in your Method 1 primary key, it will be similarly fast for lookups. As first field in a regular index (which is what I've suggested above), it will still be very fast.
The idea with a relational db is that each row represents a single instance/action/occurrence. So when a user does something to affect his score, do an INSERT that records what he did. You can always create a summary from data like this. But you can't get this kind of data from a summary.
Secondly, you seem unwontedly concerned about getting rid of old data. Why? Your select queries would have a date range on them that would exclude old data automatically. And if you're concerned about performance, you can partition your tables based on row age or set up a cronjob to delete old records periodically.
ETA: Regarding JSON stored in files
This seems to me to combine the drawbacks of Method 2 (difficult to search, every file must be updated every day) with the additional drawbacks of file access. File accesses are expensive. File writes are even more so. If you really want to store summary data, I'd run a query only when the data is requested and I'd store the results in a summary table by user_id. The table could hold a JSON string:
CREATE TABLE score_summaries(
user_id INT unsigned NOT NULL PRIMARY KEY,
gen_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
json_data TEXT NOT NULL DEFAULT '{}'
);
For example:
Bob (user_id=7) logs into the game for the first time. He's on his profile page which displays his weekly stats. These queries ran:
SELECT json_data FROM score_summaries
WHERE user_id=7
AND gen_date > DATE_SUB(CURDATE() INTERVAL 1 DAY);
//returns nothing so generate summary record
SELECT DATE(added_on), category, SUM(score)
FROM scores WHERE user_id=7 AND added_on < CURDATE() AND > DATE_SUB(CURDATE(), INTERVAL 1 WEEK)
GROUP BY DATE(added_on), category; //never include today's data, encode as json with php
INSERT INTO score_summaries(user_id, json_data)
VALUES(7, '$json') //from PHP, in this case $json == NULL
ON DUPLICATE KEY UPDATE json_data=VALUES(json_data)
//use $json for presentation too
Today's scores are generated as needed and not stored in the summary. If Bob views his scores again today, the historical ones can come from the summary table or could be stored in a session after the first request. If Bob doesn't visit for a week, no summary needs to be generated.
method 1 seems like a clear winner to me . If you are concerned about size of single table (graphData) being too big you could reduce it by creating
CREATE TABLE `graphdata` (
`graphDataId` INT UNSIGNED NOT NULL,
`categoryId` INT NOT NULL,
`score` FLOAT UNSIGNED NOT NULL,
PRIMARY KEY (`GraphDataId'),
) ENGINE=InnoDB
than create 2 tables because you obviosuly need to have info connecting graphDataId with userId
create table 'graphDataUser'(
`graphDataId` INT UNSIGNED NOT NULL,
`userId` INT NOT NULL,
)ENGINE=InnoDB
and graphDataId date connection
create table 'graphDataDate'(
`graphDataId` INT UNSIGNED NOT NULL,
'graphDataDate' DATE NOT NULL
)ENGINE=InnoDB
i think that you don't really need to worry about number of rows some table contains because most of dba does a good job regarding number of rows. Its your job only to get data formatted in a way it is easly retrived no matter what is the task for which data is retrieved. Using that advice i think should pay off in a long run.
I have the following schema with the following attributes:
USER(TABLE_NAME)
USER_ID|USERNAME|PASSWORD|TOPIC_NAME|FLAG1|FLAG2
I have 2 questions basically:
How can I make an attribute USER_ID as primary key and it should
automatically increment the value each time I insert the value into
the database.It shouldn't be under my control.
How can I retrieve a record from the database, based on the latest
time from which it was updated.( for example if I updated a record
at 2pm and same record at 3pm, if I retrieve now at 4pm I should get
the record that was updated at 3pm i.e. the latest updated one.)
Please help.
I'm assuming that question one is in the context of MYSQL. So, you can use the ALTER TABLE statement to mark a field as PRIMARY KEY, and to mark it AUTOINCREMENT
ALTER TABLE User
ADD PRIMARY KEY (USER_ID);
ALTER TABLE User
MODIFY COLUMN USER_ID INT(4) AUTO_INCREMENT; -- of course, set the type appropriately
For the second question I'm not sure I understand correctly so I'm just going to go ahead and give you some basic information before giving an answer that may confuse you.
When you update the same record multiple times, only the most recent update is persisted. Basically, once you update a record, it's previous values are not kept. So, if you update a record at 2pm, and then update the same record at 3pm - when you query for the record you will automatically receive the most recent values.
Now, if by updating you mean you would insert new values for the same USER_ID multiple times and want to retrieve the most recent, then you would need to use a field in the table to store a timestamp of when each record is created/updated. Then you can query for the most recent value based on the timestamp.
I assume you're talking about Oracle since you tagged it as Oracle. You also tagged the question as MySQL where the approach will be different.
You can make the USER_ID column a primary key
ALTER TABLE <<table_name>>
ADD CONSTRAINT pk_user_id PRIMARY KEY( user_id );
If you want the value to increment automatically, you'd need to create a sequence
CREATE SEQUENCE user_id_seq
START WITH 1
INCREMENT BY 1
CACHE 20;
and then create a trigger on the table that uses the sequence
CREATE OR REPLACE TRIGGER trg_assign_user_id
BEFORE INSERT ON <<table name>>
FOR EACH ROW
BEGIN
:new.user_id := user_id_seq.nextval;
END;
As for your second question, I'm not sure that I understand. If you update a row and then commit that change, all subsequent queries are going to read the updated data (barring exceptionally unlikely cases where you've set a serializable transaction isolation level and you've got transactions that run for multiple hours and you're running the query in that transaction). You don't need to do anything to see the current data.
(Answer based on MySQL; conceptually similar answer if using Oracle, but the SQL will probably be different.)
If USER_ID was not defined as a primary key or automatically incrementing at the time of table creation, then you can use:
ALTER TABLE tablename MODIFY USER_ID INT NOT NULL PRIMARY KEY AUTO_INCREMENT;
To issue queries based on record dates, you have to have a field defined to hold date-related datetypes. The date and time of record modifications would be something you would manage (e.g. add/change) based on the way in which you are accessing the records (some PHP-related way? it's unclear what scripts you have in play, based on your question.) Once you have dates in your records you can ORDER BY the date field in your SELECT query.
Check this out
For your AUTOINCREMENT, Its a question already asked here
For your PRIMARY KEY use this
ALTER TABLE USER ADD PRIMARY KEY (USER_ID)
Can you provide more information. If the value gets updated you definitely do NOT have your old value that you entered at 2pm present in the dB. So querying for it will be fine
You can use something like this:
CREATE TABLE IF NOT EXISTS user (
USER_ID unsigned int(8) NOT NULL AUTO_INCREMENT,
username varchar(25) NOT NULL,
password varchar(25) NOT NULL,
topic_name varchar(100) NOT NULL,
flag1 smallint(1) NOT NULL DEFAULT 0,
flag2 smallint(1) NOT NULL DEFAULT 0,
update_time TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (uid)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
For selection use query:
SELECT * from user ORDER BY update_time DESC