i am building a website and i have a little knowledge of php and sql.
I have many problems when it comes to many to many relationship in a database.
I have product that are specifical for every material e.g. there can't be the same product for 2 material
material are leather, simil-leather, cloth, PVC
field of use are the field which that material can be used: sport, leisure, work
The problem is that material can be used in many field and a field have associated many material, so it's N:M
a product can be used in many field and a field can be used for many product so it's too N:M
For example, leather can be used in work, sport, cloth in work sport and office
product can be used in some or all field of application and vice versa.
to achieve this is better architecture A or B?
(product have always the same application field of materials, can't be a product belonging to a materials that has an application field materials doesn't have)
A) http://i60.tinypic.com/27zdk4k.jpg
B) http://i57.tinypic.com/2mhc03o.jpg
If i understand correctly, many to many relationship work with a "mid"table between the two.
So, when it comes to insert data and values in my database, what i have is:
MATERIAL
1 leather
2 cloth
MAtERIAL_APPL_FIELD
1 1
1 2
2 2
APPLICATION_FIELD
1 nautic
2 leisure
In this way, leather has 2 application field. But how i can fill the mid table in a smart way?
Also
When i want to cancel something, which is the better architecture?
and i should cancel from all table?
Here's another good solution:
create table materials (material_id tinyiny unsigned not null auto_increment primary key, material varchar(60));
create table uses (use_id tinyiny unsigned not null auto_increment primary key, use varchar(60));
create table applications (application_id int unsigned not null auto_increment primary key, material tinyint unsigned, use tinyint unsigned, active tinyint unsigned default = 1);
insert into applications (material, use) values (?,?);
The question marks would represent the ID of the material and the ID of the use.
It might be better not to actually delete rows from your table. Instead you may wish to inactivate them. Try this:
update applications set active = 0 where application_id = ?;
update applications set active = 0 where material = 1 and use = 2;
update applications set active = 0 where material = (select material_id from materials where material = 'leather') and use = (select use_id from uses where use = 'sport');
If I understand your question correctly, the solution below should work. Keep in mind that their may be multiple correct solutions to your problem, and that I did not actually test this code so there may be typos.
create table applications (application_id int unsigned not null auto_increment primary key, material enum('leather', 'semi-leather', 'cloth', 'PVC'), use enum('sport','leisure','work'));
insert into applications (material,use) values ('leather','work');
insert into applications (material,use) values ('leather','leisure');
Related
I have two tables in MySQL. Table Person has the following columns:
id
name
fruits
The fruits column may hold null or an array of strings like ('apple', 'orange', 'banana'), or ('strawberry'), etc. The second table is Table Fruit and has the following three columns:
fruit_name
color
price
apple
red
2
orange
orange
3
-----------
--------
------
So how should I design the fruits column in the first table so that it can hold array of strings that take values from the fruit_name column in the second table? Since there is no array data type in MySQL, how should I do it?
The proper way to do this is to use multiple tables and JOIN them in your queries.
For example:
CREATE TABLE person (
`id` INT NOT NULL PRIMARY KEY,
`name` VARCHAR(50)
);
CREATE TABLE fruits (
`fruit_name` VARCHAR(20) NOT NULL PRIMARY KEY,
`color` VARCHAR(20),
`price` INT
);
CREATE TABLE person_fruit (
`person_id` INT NOT NULL,
`fruit_name` VARCHAR(20) NOT NULL,
PRIMARY KEY(`person_id`, `fruit_name`)
);
The person_fruit table contains one row for each fruit a person is associated with and effectively links the person and fruits tables together, I.E.
1 | "banana"
1 | "apple"
1 | "orange"
2 | "straberry"
2 | "banana"
2 | "apple"
When you want to retrieve a person and all of their fruit you can do something like this:
SELECT p.*, f.*
FROM person p
INNER JOIN person_fruit pf
ON pf.person_id = p.id
INNER JOIN fruits f
ON f.fruit_name = pf.fruit_name
The reason that there are no arrays in SQL, is because most people don't really need it. Relational databases (SQL is exactly that) work using relations, and most of the time, it is best if you assign one row of a table to each "bit of information". For example, where you may think "I'd like a list of stuff here", instead make a new table, linking the row in one table with the row in another table.[1] That way, you can represent M:N relationships. Another advantage is that those links will not clutter the row containing the linked item. And the database can index those rows. Arrays typically aren't indexed.
If you don't need relational databases, you can use e.g. a key-value store.
Read about database normalization, please. The golden rule is "[Every] non-key [attribute] must provide a fact about the key, the whole key, and nothing but the key.". An array does too much. It has multiple facts and it stores the order (which is not related to the relation itself). And the performance is poor (see above).
Imagine that you have a person table and you have a table with phone calls by people. Now you could make each person row have a list of his phone calls. But every person has many other relationships to many other things. Does that mean my person table should contain an array for every single thing he is connected to? No, that is not an attribute of the person itself.
[1]: It is okay if the linking table only has two columns (the primary keys from each table)! If the relationship itself has additional attributes though, they should be represented in this table as columns.
MySQL 5.7 now provides a JSON data type. This new datatype provides a convenient new way to store complex data: lists, dictionaries, etc.
That said, arrays don't map well databases which is why object-relational maps can be quite complex. Historically people have stored lists/arrays in MySQL by creating a table that describes them and adding each value as its own record. The table may have only 2 or 3 columns, or it may contain many more. How you store this type of data really depends on characteristics of the data.
For example, does the list contain a static or dynamic number of entries? Will the list stay small, or is it expected to grow to millions of records? Will there be lots of reads on this table? Lots of writes? Lots of updates? These are all factors that need to be considered when deciding how to store collections of data.
Also, Key/Value data stores, Document stores such as Cassandra, MongoDB, Redis etc provide a good solution as well. Just be aware of where the data is actually being stored (if its being stored on disk or in memory). Not all of your data needs to be in the same database. Some data does not map well to a relational database and you may have reasons for storing it elsewhere, or you may want to use an in-memory key:value database as a hot-cache for data stored on disk somewhere or as an ephemeral storage for things like sessions.
A sidenote to consider, you can store arrays in Postgres.
In MySQL, use the JSON type.
Contra the answers above, the SQL standard has included array types for almost twenty years; they are useful, even if MySQL has not implemented them.
In your example, however, you'll likely want to create three tables: person and fruit, then person_fruit to join them.
DROP TABLE IF EXISTS person_fruit;
DROP TABLE IF EXISTS person;
DROP TABLE IF EXISTS fruit;
CREATE TABLE person (
person_id INT NOT NULL AUTO_INCREMENT,
person_name VARCHAR(1000) NOT NULL,
PRIMARY KEY (person_id)
);
CREATE TABLE fruit (
fruit_id INT NOT NULL AUTO_INCREMENT,
fruit_name VARCHAR(1000) NOT NULL,
fruit_color VARCHAR(1000) NOT NULL,
fruit_price INT NOT NULL,
PRIMARY KEY (fruit_id)
);
CREATE TABLE person_fruit (
pf_id INT NOT NULL AUTO_INCREMENT,
pf_person INT NOT NULL,
pf_fruit INT NOT NULL,
PRIMARY KEY (pf_id),
FOREIGN KEY (pf_person) REFERENCES person (person_id),
FOREIGN KEY (pf_fruit) REFERENCES fruit (fruit_id)
);
INSERT INTO person (person_name)
VALUES
('John'),
('Mary'),
('John'); -- again
INSERT INTO fruit (fruit_name, fruit_color, fruit_price)
VALUES
('apple', 'red', 1),
('orange', 'orange', 2),
('pineapple', 'yellow', 3);
INSERT INTO person_fruit (pf_person, pf_fruit)
VALUES
(1, 1),
(1, 2),
(2, 2),
(2, 3),
(3, 1),
(3, 2),
(3, 3);
If you wish to associate the person with an array of fruits, you can do so with a view:
DROP VIEW IF EXISTS person_fruit_summary;
CREATE VIEW person_fruit_summary AS
SELECT
person_id AS pfs_person_id,
max(person_name) AS pfs_person_name,
cast(concat('[', group_concat(json_quote(fruit_name) ORDER BY fruit_name SEPARATOR ','), ']') as json) AS pfs_fruit_name_array
FROM
person
INNER JOIN person_fruit
ON person.person_id = person_fruit.pf_person
INNER JOIN fruit
ON person_fruit.pf_fruit = fruit.fruit_id
GROUP BY
person_id;
The view shows the following data:
+---------------+-----------------+----------------------------------+
| pfs_person_id | pfs_person_name | pfs_fruit_name_array |
+---------------+-----------------+----------------------------------+
| 1 | John | ["apple", "orange"] |
| 2 | Mary | ["orange", "pineapple"] |
| 3 | John | ["apple", "orange", "pineapple"] |
+---------------+-----------------+----------------------------------+
In 5.7.22, you'll want to use JSON_ARRAYAGG, rather than hack the array together from a string.
Use database field type BLOB to store arrays.
Ref: http://us.php.net/manual/en/function.serialize.php
Return Values
Returns a string containing a byte-stream representation of value that
can be stored anywhere.
Note that this is a binary string which may include null bytes, and
needs to be stored and handled as such. For example, serialize()
output should generally be stored in a BLOB field in a database,
rather than a CHAR or TEXT field.
you can store your array using group_Concat like that
INSERT into Table1 (fruits) (SELECT GROUP_CONCAT(fruit_name) from table2)
WHERE ..... //your clause here
HERE an example in fiddle
I am unable to figure out an efficient way to establish relationships between tables. I want to have a database of books, authors, publishers and the users that sign-up and have their bookshelves (Read, Currently Reading, Want to Read (or Plan to Read)). I want the users to be able to select which books they've read, want to read or are currently reading.
P.s. I am aware of PK and FK in database table relations.
Edit: maybe this is a better way of doing it:
Then I shall use "Status" = (Read, Plant to Read and Currently reading) - please tell me if this is good and efficient!
You'll need a N:M link between books and authors, since a book might have multiple authors and each author might have written more than one book. In a RDBMS that means you'll need a written_by table.
The link between books and publishers however is different. Any given book can only have one publisher (unless in your system different editions of a book are considered the same book). So all you need here is a publisher_id foreign key in books
Lastly, and most importantly you're looking at the readers / users. And their relation to books. Naturally, this is also a N:M relation. I sure hope that people read more than one book (we all know what happens if you only ever read one...) and surely a book is read by more than one person. That calls for a book_users connection table. The real question here is, how to design it. There are three basic designs.
Separate tables by type of relation. (as outlined by #just_somebody ) Advantages: You only have INSERTS and DELETES, never UPDATES. While this looks kind of neat, and somewhat helps with query optimization, most of the time it serves no actual purpose other than showing off a big database chart.
One table with a status indicator. (as outlined by #Hardcoded) Advantages: You only have one table. Disadvantages: You'll have INSERTS, UPDATES and DELETES - something RDBMS can easily handle, but which has its flaws for various reasons (more on that later) Also, a single status field implies that one reader can have only one connection to the book at any time, meaning he could only be in the plan_to_read, is_reading or has_read status at any point in time, and it assumes an order in time this happens. If that person would ever plan to read it again, or pause, then reread from the begining etc, such a simple series of status indicators can easily fail, because all of a sudden that person is_reading now, but also has_read the thing. For most applications this still is a reasonable approach, and there are usually ways to design status fields so they are mutually exclusive.
A log. You INSERT every status as a new row in a table - the same combination of book and reader will appear more than once. You INSERT the first row with plan_to_read, and a timestamp. Another one with is_reading. Then another one with has_read. Advantages: You will only ever have to INSERT rows, and you get a neat chronology of things that happened. Disadvantages: Cross table joins now have to deal with a lot more data (and be more complex) than in the simpler approaches above.
You may ask yourself, why is there the emphasis on whether you INSERT, UPDATE or DELETE in what scenario? In short, whenever you run an UPDATE or DELETE statement you are very likely to in fact lose data. At that point you need to stop in your design process and think "What is it I am losing here?" In this case, you lose the chronologic order of events. If what users are doing with their books is the center of your application, you might very well want to gather as much data as you can. Even if it doesn't matter right now, that is the type of data which might allow you to do "magic" later on. You could find out how fast somebody is reading, how many attempts they need to finish a book, etc. All that without asking the user for any extra input.
So, my final answer is actually a question:
Would it be helpful to tell someone how many books they read last year?
Edit
Since it might not be clear what a log would look like, and how it would function, here's an example of such a table:
CREATE TABLE users_reading_log (
user_id INT,
book_id INT,
status ENUM('plans_to_read', 'is_reading', 'has_read'),
ts TIMESTAMP DEFAULT NOW()
)
Now, instead of updating the "user_read" table in your designed schema whenever the status of a book changes you now INSERT that same data in the log which now fills with a chronology of information:
INSERT INTO users_reading_log SET
user_id=1,
book_id=1,
status='plans_to_read';
When that person actually starts reading, you do another insert:
INSERT INTO users_reading_log SET
user_id=1,
book_id=1,
status='is_reading';
and so on. Now you have a database of "events" and since the timestamp column automatically fills itself, you can now tell what happened when. Please note that this system does not ensure that only one 'is_reading' for a specific user-book pair exists. Somebody might stop reading and later continue. Your joins will have to account for that.
a database table is a mathematical relation, in other words a predicate and a set of tuples ("rows") for which that predicate is true. that means each "row" in a "table" is a (true) proposition.
this may all look scary but the basic principles are really simple and worth knowing and applying rigorously: you'll better know what you're doing.
relations are simple if you start small, with the binary relation. for example, there's a binary relation > (greater than) on the set of all integers which "contains" all ordered pairs of integers x, y for which the predicate x > y holds true. note: you would not want to materialize this specific relation as a database table. :)
you want Books, Authors, Publishers and Users with their bookshelfs (Read, Currently Reading, Want to Read). what are the predicates in that? "user U has read book B", "user U is reading book B", "user U wants to read book B" would be some of them; "book B has ISBN# I, title T, author A" would be another, but some books have multiple authors. in that case, you'll do well to split it out into a separate predicate: "book B was written by author A".
CREATE TABLE book (
id INT NOT NULL PRIMARY KEY
);
CREATE TABLE author (
id INT NOT NULL PRIMARY KEY
, name TEXT NOT NULL
);
CREATE TABLE written_by (
book INT NOT NULL REFERENCES book (id)
, author INT NOT NULL REFERENCES author (id)
);
CREATE TABLE reader (
id INT NOT NULL PRIMARY KEY
);
CREATE TABLE has_read (
reader INT NOT NULL REFERENCES reader (id)
, book INT NOT NULL REFERENCES book (id)
);
CREATE TABLE is_reading (
reader INT NOT NULL REFERENCES reader (id)
, book INT NOT NULL REFERENCES book (id)
);
CREATE TABLE plans_reading (
reader INT NOT NULL REFERENCES reader (id)
, book INT NOT NULL REFERENCES book (id)
);
etc etc.
edit: C. J. Date's Introduction to Database Systems
If I was you, I'd use a schema much like the following:
TABLE user
-- Stores user's basic info.
( user_id INTEGER PRIMARY KEY
, username VARCHAR(50) NOT NULL
, password VARCHAR(50) NOT NULL
, ...
, ...
, ...
);
TABLE author
-- Stores author's basic info
( author_id INTEGER PRIMARY KEY
, author_name VARCHAR(50)
, date_of_birth DATE
, ...
, ...
, ...
);
TABLE publisher
-- Stores publisher's basic info
( publisher_id INTEGER PRIMARY KEY
, publisher_name VARCHAR(50)
, ...
, ...
, ...
);
TABLE book
-- Stores book info
( book_id INTEGER PRIMARY KEY
, title VARCHAR(50) NOT NULL
, author_id INTEGER NOT NULL
, publisher_id INTEGER NOT NULL
, published_dt DATE
, ...
, ...
, ...
, FOREIGN KEY (author_id) REFERENCES author(author_id)
, FOREIGN KEY (publisher_id) REFERENCES publisher(publisher_id)
);
TABLE common_lookup
-- This column stores common values that are used in various select lists.
-- The first three values are going to be
-- a - Read
-- b - Currently reading
-- c - Want to read
( element_id INTEGER PRIMARY KEY
, element_value VARCHAR(2000) NOT NULL
);
TABLE user_books
-- This table contains which user has read / is reading / want to read which book
-- There is a many-to-many relationship between users and books.
-- One user may read many books and one single book can be read by many users.
-- Hence we use this table to maintain that information.
( user_id INTEGER NOT NULL
, book_id INTEGER NOT NULL
, status_id INTEGER NOT NULL
, ...
, ...
, ...
, FOREIGN KEY (user_id) REFERENCES user(user_id)
, FOREIGN KEY (book_id) REFERENCES book(book_id)
, FOREIGN KEY (status_id) REFERENCES common_lookup(element_id)
);
TABLE audit_entry_log
-- This is an audit entry log table where you can track changes and log them here.
( audit_entry_log_id INTEGER PRIMARY KEY
, audit_entry_type VARCHAR(10) NOT NULL
-- Stores the entry type or DML event - INSERT, UPDATE or DELETE.
, table_name VARCHAR(30)
-- Stores the name of the table which got changed
, column_name VARCHAR(30)
-- Stores the name of the column which was changed
, primary_key INTEGER
-- Stores the PK column value of the row which was changed.
-- This is to uniquely identify the row which has been changed.
, ts TIMESTAMP
-- Timestamp when the change was made.
, old_number NUMBER(36, 2)
-- If the changed field was a number, the old value should be stored here.
-- If it's an INSERT event, this would be null.
, new_number NUMBER(36,2)
-- If the changed field was a number, the new value in it should be stored here.
-- If it's a DELETE statement, this would be null.
, old_text VARCHAR(2000)
-- Similar to old_number but for a text/varchar field.
, new_text VARCHAR(2000)
-- Similar to new_number but for a text/varchar field.
, old_date VARCHAR(2000)
-- Similar to old_date but for a date field.
, new_date VARCHAR(2000)
-- Similar to new_number but for a date field.
, ...
, ... -- Any other data types you wish to include.
, ...
);
I would then create triggers on a few tables that would track changes and enter data in the audit_entry_log table.
First of all create 4 tables for books, authors, publishers & the users. than
create a table books_authers which has relationship with table books and table authers.
create a table books_publishers which has relationship with table books and table publishers.
create a table books_user which has relationship with table books and table users. also in this table use a flag to show the book id which user Read, Currently Reading, Want to Read (or Plan to Read).
This is just markup try it
I would have a Books table, containing: title, author, publisher, isbn. A Book_Statuses table, containing an id (PK) and a status (Read, Reading, etc..). A third table for user_books, in which there would be a fk_book_id related with the Books table, and a fk_status_id which would be linked to the Book_Statuses table.
All this together gives you an easily accessible data structure.
This is assuming I understand your question. If you want to have tables for authors, publishers and books. I'd need clarification on your needs.
Your answer is the best way to do this. For example, suppose that you have books and categories tables and a book can suit more than one category. best way to keep this data creating a third table to keep book-category relations. otherwise you have to create columns for every category.
ID name comedy adventure etc
5 BookName yes no no
like this. this is the baddest thing to do. believe me. your solution is best way to do it.
and don't aware of PK & FK in Database Table Relations. if you use them good, it will be faster and safer than doing their works manually.
Weirdly I have done a lot of development with mySQL and never encountered some of the things I have encountered todays.
So, I have a user_items table
ID | name
---------
1 | test
I then have an item_data table
ID | item | added | info
-------------------------
1 | test | 12345 | important info
2 | test | 23456 | more recent important info
I then have an emails table
ID | added | email
1 | 12345 | old#b.com
2 | 23456 | a#b.com
3 | 23456 | b#c.com
and an emails_verified table
ID | email
-----------
1 | a#b.com
Now I appreciate the setup of these tables may not be efficient etc, but this cannot be changed, and is a lot more complex than it may seem.
What i want to do is as follows. I want to be able to search through a users items and display the associated info, as well as any emails associated, as well as displaying if the email has been verified.
user_items.name = item_data.item
item_data.added = emails.added
emails.email = emails_verified.email
So for user item 1, test. I want to be able to return its ID, its name, the most recent information, the most recent emails, and their verification status.
So I woud like to return
ID => 1
name => test
information => more recent important info
emails => array('0' => array('email' => 'a#b.com' , 'verified' => 'YES'),'1' => array('email' => 'b#c.com' , 'verified' => 'NO'))
Now I could do this with multiple queries with relative ease. My research however suggests that this is significantly more resource/time costly then using one (albeit very complex) mysql query with loads of join statements.
The reason using one query would also would be useful (I believe) is because I can then add search functionality with relative ease - adding to the query complex where statements.
To further complicated matters I am using CodeIgniter. I cannot be too picky :) so any none CI answers would still be very useful.
The code I have got thus far is as follows. It is however very much 'im not too sure what im doing'.
function test_search()
{
$this->load->database();
$this->db->select('user_items.*,item_data.*');
$this->db->select('GROUP_CONCAT( emails.email SEPARATOR "," ) AS emails', FALSE);
$this->db->select('GROUP_CONCAT( IF(emailed.email,"YES","NO") SEPARATOR "," ) AS emailed', FALSE);
$this->db->where('user_items.name','test');
$this->db->join('item_data','user_items.name = item_data.name','LEFT');
$this->db->join('emails','item_data.added = emails.added','LEFT');
$this->db->join('emailed','emails.email = emailed.email','LEFT');
$this->db->group_by('user_items.name');
$res = $this->db->get('user_items');
print_r($res->result_array());
}
Any help with this would be very much appreciated.
This is really complex sql - is this really the best way to achieve this functionality?
Thanks
UPDATE
Following on from Cryode's excellent answer.
The only thing wrong with it is that it only returns one email. By using GROUP_CONCAT however I have been able to get all emails and all email_verified statuses into a string which I can then explode with PHP.
To clarify is the subquery,
SELECT item, MAX(added) AS added
FROM item_data
GROUP BY item
essentially creating a temporary table?
Similar to that outlined here
Surely the subquery is necessary to make sure you only get one row from item_data - the most recent one?
And finally to answer the notes about the poorly designed database.
The database was designed this way as item_data is changed regularly but we want to keep historical records.
The emails are part of the item data but because there can be any number of emails, and we wanted them to be searchable we opted for a seperate table. Otherwise the emails would have to be serialized within the item_data table.
The emails_verified table is seperate as an email can be associated with more than one item.
Given that, although (clearly) complicated for querying it still seems a suitable setup..?
Thanks
FINAL UPDATE
Cryodes answer is a really useful answer relating to database architecture in general.
Having conceptualised this a little more, if we store the version id in user_items we dont need the subquery.
Because none of the data between versions is necessarily consistent we will scrap his proposed items table(for this case).
We can then get the correct version from a item_data tables
We can also get the items_version_emails rows based on the version id and from this get the respective emails from our 'emails' table.
I.E It works perfectly.
The downside of this is that when I add new version data in item_data I have to update the user_items table with the new version that has been inserted.
This is fine, but simply as a generalized point what is quicker?
I assume the reason such a setup has been suggested is that it is quicker - an extra update each time new data is added is worth it to save potentially hundreds of subqueries when lots of rows are being displayed. Especially given that we display the data more than we update it.
Just for knowledge when in future designing database architecture does anyone have any links/general guidance on what is quicker and why such that we can all make better optimized databases.
Thanks again to Cryode !!
Using your database structure, this is what I came up with:
SELECT ui.name, id.added, id.info, emails.email,
CASE WHEN ev.id IS NULL THEN 'NO' ELSE 'YES' END AS email_verified
FROM user_items AS ui
JOIN item_data AS id ON id.item = ui.name
JOIN (
SELECT item, MAX(added) AS added
FROM item_data
GROUP BY item
) AS id_b ON id_b.item = id.item AND id_b.added = id.added
JOIN emails ON emails.added = id.added
LEFT JOIN emails_verified AS ev ON ev.email = emails.email
But as others have pointed out, the database is poorly designed. This query will not perform well on a table with a lot of data, since there are no aggregate functions for this purpose. I understand that in certain situations you have little to no control over database design, but if you want to actually create the best situation, you should be emphatic to whomever can control it that it can be improved.
One of the biggest optimizations that could be made is to add the current item_data ID to the user_items table. That way the subquery to pull that wouldn't be necessary (since right now we're essentially joining item_data twice).
Converting this to CI's query builder is kind of a pain in the ass because of the sub query. Assuming you're only working with MySQL DBs, just stick with $this->db->query().
Added from your edit:
This query returns one email per row, it does not group them together. I left the CONCAT stuff out because it's one more thing that slows down your query -- your PHP can put the emails together afterwards much faster.
Yes, the subquery is that part -- a query within a query (pretty self-explanatory name :wink:). I wouldn't call it creating a temporary table, because that's something you can actually do. More like retrieving a subset of the information in the table, and using it kind of like a WHERE clause. The subquery is what finds the most recent row in your item_data table, since we have to figure it out ourselves (again, proper database design would eliminate this).
When we say you can optimize your database design, it doesn't mean you can't have it set up in a similar way. You made it sound like the DB could not be altered at all. You have the right idea as far as the overall scheme, you're just implementing it poorly.
Database Design
Here's how I would lay this out. Note that without knowing the whole extent of your project, this may need modification. May also not be 100% the best optimized on the planet -- I'm open for suggestions for improvement. Your mileage may vary.
User Items
CREATE TABLE `users_items` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`item_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Defines the relationship between a base item and a user.
Items
CREATE TABLE `items` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`item_name` varchar(50) NOT NULL DEFAULT '',
`created_on` datetime NOT NULL,
`current_version` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Your items table should have all of your items' base information -- things that will not change on a per-revision basis. Notice the current_version column -- this is where you'll store the ID from the versions table, indicating which is most recent (so we don't have to figure it out ourselves).
Items Versions (history)
CREATE TABLE `items_versions` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`item_id` int(10) unsigned NOT NULL,
`added` datetime NOT NULL,
`info` text,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Here is where you'd store the history of an item -- each update would create a new row here. Note that the item_id column is what ties this row to a particular base item.
Emails
CREATE TABLE `emails` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`email` varchar(100) NOT NULL DEFAULT '',
`verified` tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Since emails can be shared between multiple products, we'll end up using what's called a many-to-many relationship. Emails can be tied to multiple products, and a product can be tied to multiple emails. Here we defined our emails, and include a verified column for whether it has been verified or not.
Item Emails
CREATE TABLE `items_versions_emails` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`version_id` int(11) NOT NULL,
`email_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Assuming the emails are tied to an item version and not the base item, this is the structure you want. Unfortunately, if you have a ton of versions and never change the email(s), this will result in a lot of repeated data. So there's room for optimization here. If you tie emails to the base item, you'll have less repeated data, but you'll lose the history. So there's options for this. But the goal is to show how to set up DB relationships, not be 100% perfect.
That should give you a good start on how to better lay out your DB structure.
Another Update
Regarding speed, inserting a new item version and then updating the related item row with the new version ID will give you much better performance than requiring a subquery to pull the latest update. You'll notice in the solution for your original structure, the item_info table is being joined twice -- once to join the most recent rows, and again to grab the rest of the data from that recent row (because of the way GROUP BY works, we can't get it in a single join). If we have the recent version ID stored already, we don't need the first join at all, which will improve your speed dramatically (along with proper indexing, but that's another lesson).
I wouldn't recommend ditching the base items table, but that's really up to you and your application's needs. Without a base item, there's no real way to track the history of that particular item. There's nothing in the versions that shows a common ancestor/history, assuming you're removing the item_id column.
The table
I got a table that contains price for some 1 000 000 articles. The articles got a uniques ID-number but the table contains prices from multiple stores. Thus if two stores got the same article the uniques ID will not be unique for the table.
Table Structure
table articles
id INT
price IN
store VARCHAR(40)
Daily use
Except for queries using the ID-number by users I need to run daily updates where data from csv-files insert/update each article in the table. The choosen procedure is to try to select an article and then perform either an insert or an update.
Question
With this in mind, which key should I choose?
Here are some solutions that Ive been considering:
FULLTEXT index of the fields isbn and store
Add a field with a value generated by isbn and store that is set as PRIMARY key
One table per store and use isbn as PRIMARY key
Use a compound primary key consisting of the store ID and the article ID - that'll give you a unique primary key for each item on a per-store basis and you don't need a separate field for it (assuming the store id and article id are already in the table).
Ideally you should have 3 tables... something like:
article
--------------------------------------------
id | isbn | ... etc ...
store
--------------------------------------------
id | description | ... etc ...
pricelist
--------------------------------------------
article_id | store_id | price | ... etc ...
With the PRIMARY KEY for pricelist being a compound key made up of article_id and store_id.
EDIT : (updated to incorporate an answer from the comment)
Even on a million rows the UPDATE should be OK (for a certain definition of OK, it might still take a little while with 1 million+ rows) since the article_id and store_id comprise the PRIMARY KEY - they'll both be indexed.
You'll just need to write your query so that it's along the lines of:
UPDATE pricelist SET price = {$fNewPrice}
WHERE article_id = {$iArticleId}
AND store_id =` '{$sStoreId}'
Though you may want to consider converting the PRIMARY KEY in the store table (store.id - and therefore also pricelist.store_id in the pricelist table) to either an unsigned INT or something like CHAR(30).
Whilst VARCHAR is more efficient when it comes to disk space it has a couple of drawbacks:
1: MySQL isn't too keen on updating VARCHAR values and it can make the indexes bloat a bit so you may need to occasionally run OPTIMIZE TABLE on it (I've found this on an order_header table before).
2: Any (MyISAM) table with non-fixed length fields (such as VARCHAR) will have to have a DYNAMIC row format which is slightly less efficient when it comes to querying it - there's more information about that on this SO post: MySQL Row Format: Difference between fixed and dynamic?
Your indexes should be aligned with your queries. Certainly there should be a primary key on the articles table using STORE and ID - but the order in which they are declared will affect performance - depending on the data in the related tables and the queries applied. Indeed the simplest solution might be PRIMARY KEY(STORE, ID) and UNIQUE KEY(ID, STORE) along with foreign key constraints on the two fields.
i.e. since it makes NO SENSE to call this table 'articles', I'll use the same schema as CD001:
CREATE TABLE pricelist (
id INT NOT NULL ,
price INT,
store VARCHAR(40) NOT NULL
PRIMARY KEY(store,id),
UNIQUE KEY rlookup (id, store)
CONSTRAINT id FOREIGN KEY articles.id,
CONSRAINT store FOREIGN KEY store.name
);
Which also entails having a primary key on store using name.
The difference between checking a key based on a single column and one based on 2 columns is negligible - and normalising your database properyl will save you a LOT of pain.
which will be the best approach for a table design for keeping statuses for a product:
Product Accepted
Product Declined
Product Finished (Shipped)
These are my ideas:
a int field for "accepted" another for "declined" and another for "finished" (with values 0 or 1)
a datetime for each "accepted", "declined" and "finished" (by default the field will be NULL)
just a field "status" with values 1, 2 or 3.
If the status can only be one of those three at any time, then there should only be one field. So, your third idea would be correct. Having three fields when only one is necessary would be a waste of storage space, so should be avoided.
The best solution is to simply have a status field that holds which one of the three statuses the product is at. This will also make adding or removing a status easy since you will only have to modify the code to insert a different value, not the database itself.
If you do need the date at which the status was last changed to also be stored, you could have a field called last_status_change which is a datetime and you update to the current time whenever you change the status.
Another variation of your 3rd option is some enum or else. But int is always a better idea. For performance may be even tinyint (less bytes).
But you should have then some product_status (id, name) table with 3 records.
It depends on how detailed you want future reporting to be. Your options 1 and 3 are effectively the same, so throw out option 1 in favor of the better option 3.
Now you're left with two options, and to pick one you need to answer the following:
Do I want to know when each status was set, or just if?
Personally, I would choose option 2 just in case you want to add any detailed reporting capability in the future. The DATETIME fields will take more space than a single INT, but in most cases this isn't a problem.
Edit:
Alternatively, you can do something like this:
CREATE TABLE `status` (
`id` INT NOT NULL AUTO_INCREMENT,
`title` VARCHAR(64),
PRIMARY KEY (`id`)
)
CREATE TABLE `product_status` (
`product_id` INT NOT NULL,
`status_id` INT NOT NULL,
`date` DATETIME,
PRIMARY KEY(`product_id`, `status_id`)
)
That way, you have the flexibility of all different kinds of status fields, and by using the many-to-many table you gain the ability to see when each status was set. Plus, if you're using InnoDB tables you can set up foreign keys.
To query a product:
SELECT s.title, ps.date
FROM status s
LEFT JOIN product_status ps ON (s.id = ps.status_id)
WHERE (ps.product_id = MYPID)