I have two tables in MySQL. Table Person has the following columns:
id
name
fruits
The fruits column may hold null or an array of strings like ('apple', 'orange', 'banana'), or ('strawberry'), etc. The second table is Table Fruit and has the following three columns:
fruit_name
color
price
apple
red
2
orange
orange
3
-----------
--------
------
So how should I design the fruits column in the first table so that it can hold array of strings that take values from the fruit_name column in the second table? Since there is no array data type in MySQL, how should I do it?
The proper way to do this is to use multiple tables and JOIN them in your queries.
For example:
CREATE TABLE person (
`id` INT NOT NULL PRIMARY KEY,
`name` VARCHAR(50)
);
CREATE TABLE fruits (
`fruit_name` VARCHAR(20) NOT NULL PRIMARY KEY,
`color` VARCHAR(20),
`price` INT
);
CREATE TABLE person_fruit (
`person_id` INT NOT NULL,
`fruit_name` VARCHAR(20) NOT NULL,
PRIMARY KEY(`person_id`, `fruit_name`)
);
The person_fruit table contains one row for each fruit a person is associated with and effectively links the person and fruits tables together, I.E.
1 | "banana"
1 | "apple"
1 | "orange"
2 | "straberry"
2 | "banana"
2 | "apple"
When you want to retrieve a person and all of their fruit you can do something like this:
SELECT p.*, f.*
FROM person p
INNER JOIN person_fruit pf
ON pf.person_id = p.id
INNER JOIN fruits f
ON f.fruit_name = pf.fruit_name
The reason that there are no arrays in SQL, is because most people don't really need it. Relational databases (SQL is exactly that) work using relations, and most of the time, it is best if you assign one row of a table to each "bit of information". For example, where you may think "I'd like a list of stuff here", instead make a new table, linking the row in one table with the row in another table.[1] That way, you can represent M:N relationships. Another advantage is that those links will not clutter the row containing the linked item. And the database can index those rows. Arrays typically aren't indexed.
If you don't need relational databases, you can use e.g. a key-value store.
Read about database normalization, please. The golden rule is "[Every] non-key [attribute] must provide a fact about the key, the whole key, and nothing but the key.". An array does too much. It has multiple facts and it stores the order (which is not related to the relation itself). And the performance is poor (see above).
Imagine that you have a person table and you have a table with phone calls by people. Now you could make each person row have a list of his phone calls. But every person has many other relationships to many other things. Does that mean my person table should contain an array for every single thing he is connected to? No, that is not an attribute of the person itself.
[1]: It is okay if the linking table only has two columns (the primary keys from each table)! If the relationship itself has additional attributes though, they should be represented in this table as columns.
MySQL 5.7 now provides a JSON data type. This new datatype provides a convenient new way to store complex data: lists, dictionaries, etc.
That said, arrays don't map well databases which is why object-relational maps can be quite complex. Historically people have stored lists/arrays in MySQL by creating a table that describes them and adding each value as its own record. The table may have only 2 or 3 columns, or it may contain many more. How you store this type of data really depends on characteristics of the data.
For example, does the list contain a static or dynamic number of entries? Will the list stay small, or is it expected to grow to millions of records? Will there be lots of reads on this table? Lots of writes? Lots of updates? These are all factors that need to be considered when deciding how to store collections of data.
Also, Key/Value data stores, Document stores such as Cassandra, MongoDB, Redis etc provide a good solution as well. Just be aware of where the data is actually being stored (if its being stored on disk or in memory). Not all of your data needs to be in the same database. Some data does not map well to a relational database and you may have reasons for storing it elsewhere, or you may want to use an in-memory key:value database as a hot-cache for data stored on disk somewhere or as an ephemeral storage for things like sessions.
A sidenote to consider, you can store arrays in Postgres.
In MySQL, use the JSON type.
Contra the answers above, the SQL standard has included array types for almost twenty years; they are useful, even if MySQL has not implemented them.
In your example, however, you'll likely want to create three tables: person and fruit, then person_fruit to join them.
DROP TABLE IF EXISTS person_fruit;
DROP TABLE IF EXISTS person;
DROP TABLE IF EXISTS fruit;
CREATE TABLE person (
person_id INT NOT NULL AUTO_INCREMENT,
person_name VARCHAR(1000) NOT NULL,
PRIMARY KEY (person_id)
);
CREATE TABLE fruit (
fruit_id INT NOT NULL AUTO_INCREMENT,
fruit_name VARCHAR(1000) NOT NULL,
fruit_color VARCHAR(1000) NOT NULL,
fruit_price INT NOT NULL,
PRIMARY KEY (fruit_id)
);
CREATE TABLE person_fruit (
pf_id INT NOT NULL AUTO_INCREMENT,
pf_person INT NOT NULL,
pf_fruit INT NOT NULL,
PRIMARY KEY (pf_id),
FOREIGN KEY (pf_person) REFERENCES person (person_id),
FOREIGN KEY (pf_fruit) REFERENCES fruit (fruit_id)
);
INSERT INTO person (person_name)
VALUES
('John'),
('Mary'),
('John'); -- again
INSERT INTO fruit (fruit_name, fruit_color, fruit_price)
VALUES
('apple', 'red', 1),
('orange', 'orange', 2),
('pineapple', 'yellow', 3);
INSERT INTO person_fruit (pf_person, pf_fruit)
VALUES
(1, 1),
(1, 2),
(2, 2),
(2, 3),
(3, 1),
(3, 2),
(3, 3);
If you wish to associate the person with an array of fruits, you can do so with a view:
DROP VIEW IF EXISTS person_fruit_summary;
CREATE VIEW person_fruit_summary AS
SELECT
person_id AS pfs_person_id,
max(person_name) AS pfs_person_name,
cast(concat('[', group_concat(json_quote(fruit_name) ORDER BY fruit_name SEPARATOR ','), ']') as json) AS pfs_fruit_name_array
FROM
person
INNER JOIN person_fruit
ON person.person_id = person_fruit.pf_person
INNER JOIN fruit
ON person_fruit.pf_fruit = fruit.fruit_id
GROUP BY
person_id;
The view shows the following data:
+---------------+-----------------+----------------------------------+
| pfs_person_id | pfs_person_name | pfs_fruit_name_array |
+---------------+-----------------+----------------------------------+
| 1 | John | ["apple", "orange"] |
| 2 | Mary | ["orange", "pineapple"] |
| 3 | John | ["apple", "orange", "pineapple"] |
+---------------+-----------------+----------------------------------+
In 5.7.22, you'll want to use JSON_ARRAYAGG, rather than hack the array together from a string.
Use database field type BLOB to store arrays.
Ref: http://us.php.net/manual/en/function.serialize.php
Return Values
Returns a string containing a byte-stream representation of value that
can be stored anywhere.
Note that this is a binary string which may include null bytes, and
needs to be stored and handled as such. For example, serialize()
output should generally be stored in a BLOB field in a database,
rather than a CHAR or TEXT field.
you can store your array using group_Concat like that
INSERT into Table1 (fruits) (SELECT GROUP_CONCAT(fruit_name) from table2)
WHERE ..... //your clause here
HERE an example in fiddle
Related
I have this kind of text as "description" but some values are numeric and must be changed based on a ratio. I was wondering on how to properly store that in database.
"- Add 49 things in my 7 bags"
My initial idea was to do that :
+-------+------+---------------+------+---------+------------+-----------+
| part1 | num1 | part2 | num2 | part3 | rationum1 | rationum2 |
+-------+------+---------------+------+---------+------------+-----------+
| - Add | 49 | things in my | 7 | bags | 1.3 | 1.2 |
+-------+------+---------------+------+---------+------------+-----------+
It seems however very inefficient. Plus, I want to add a tooltip on some things. For example, "-Add" must have a tooltip linked but I don't know how to apply a property on only one part of the table.
Any advices would be welcome!
EDIT : I'm using PHP to fetch data as JSON, and then I'm using JavaScript (React) for the display.
There is nothing wrong with your proposed table layout. It's not inefficient either. MySql is built for this. It can handle millions of rows of this kind of thing without breaking a sweat.
Do add an autoincrementing id value to each row, to use as a primary key. You may wish to consider adding a timestamp column too.
Define your num1 and num2 columns as int, or if you need fractional values, as double. (Javascript treats all numbers as double).
Define your fractional columns as double.
Define your textual columns as varchar(250) or some such thing, and add a textual column for your tooltip's text.
And, you're done.
But when I look at your example Add 49 things in my 7 bags I see more meaning than just a phrase.
a verb: Add.
a source_count: 49
a source description: things.
a preposition: in
a possessive: my
a target_count: 7
a target_description: bags
Does your system also need to say Steal 5 grenades from Joe's 2 ammo cases or some such thing (I'm assuming you are making some kind of game)?
If so, you may want a more elaborate set of table layouts taking into account the parts of the phrase. Then your query can use appropriate JOIN operations.
Perhaps normalize it.
F.e. put the descriptions with placeholders in another table, together with the tooltip.
Then put a foreign key in the table with the items.
Example code:
DROP TABLE IF EXISTS tst_stuff;
DROP TABLE IF EXISTS tst_stufftodo;
CREATE TABLE tst_stufftodo (id int primary key auto_increment, description varchar(100), tooltip varchar(1000));
CREATE TABLE tst_stuff (id int primary key auto_increment, name varchar(100), num1 int not null default 0, num2 int not null default 0, rationum1 decimal(4,1) not null default 0, rationum2 decimal(4,1) not null default 0,
std_id int,
FOREIGN KEY (std_id) REFERENCES stufftodo(id)
);
INSERT INTO tst_stufftodo (description, tooltip)
VALUES
('Add &num1& things in my &num2& &name&', 'Add the stuff');
INSERT INTO tst_stuff (name, num1, num2, rationum1, rationum2, std_id) VALUES
('bags', 49, 7, 1.2, 1.3, 1),
('socks', 1000000, 2, 0.5, 0.6, 1);
select s.id, replace(replace(replace(std.description,'&name&',s.name), '&num1&',s.num1), '&num2&',s.num2) as description
from tst_stuff s
join tst_stufftodo std on std.id = s.std_id;
Result:
id description
1 Add 49 things in my 7 bags
2 Add 1000000 things in my 2 socks
But it's probably better to do the replacement of the placeholders in the PHP presentation layer.
I'm not sure how to store or insert this data. I am using PHP and MySQL.
Let's say we're trying to keep track of people who enter marathons (like jogging or whatever). So far, I have a Person table that has all my person information. Each person happens to be associated with a unique varchar(40) key. There is a table for the marathon information (Marathon). I receive the person data in an CSV that as about 130,000 rows and import that into the database.
So - now the question is... how do I deal with that association between Person and Marathon? For each Marathon, I get a huge list of participants (by that unique varchar key) that I need to import. So... If I go the foreign key route, it seems like the insert would be very heavy and cumbersome to look up the appropriate foreign key for the person. I'm not even sure how I would write that insert... I guess it would look like this:
insert into person_marathon
select p.person_id, m.marathon_id
from ( select 'person_a' as p_name, 'marathon_a' as m_name union
select 'person_b' as p_name, 'marathon_a' as m_name )
as imported_marathon_person_list
join person p
on p.person_name = imported_marathon_person_list.p_name
join marathon m
on m.marathon_name = imported_marathon_person_list.m_name
There are not a lot of marathons to deal with at one time. There a lot of people, though.
--> Should I even give the person an ID and require all the foreign keys? or just use the unique varchar(40) as the true table key? But then I would have to join tables on a varchar and that's bad. A marathon can have anywhere from 1k to 30k participants.
--> Or, I could select the person info and the marathon info from the database and join it with the marathon_person data in PHP before I send it over to MySQL.
--> Or, I guess, maybe make a temporary table, then join in the db, then insert (through PHP)? It's been already strongly suggested that I do not use temporary tables ever (this is a work thing and this isn't my database).
Edit: I am not sure on what schema to use because I'm not sure if I should be using foreign keys or not (purpose of this whole post is to answer that question) but the basic design would be something like...
create table person (
person_id int unisgned auto_incrememnt,
person_key varchar(40) not null,
primary key (person_id),
constraint uc_person_key unique (person_key)
)
create table marathon (
marathon_id int unisgned auto_incrememnt,
marathon_name varchar(60) not null,
primary key (marathon_id)
)
create table person_marathon (
person_marathon_id int unsigned auto_increment,
person_id int unsigned,
marathon_id int unsigned,
primary key (person_marathon_id),
constraint uc_person_marathon unique (person_id, marathon_id),
foreign key person_id references person (person_id),
foreign key marathon_id references marathon (marathon_id)
)
I'm going to repeat the actual question really quick.... If I choose to use a foreign key for person, how do I import all the person_marathon data with the person_id in an efficient way? The insert statement I included above is my best guess....
The person data comes in a CSV of about 130,000 rows so that is a straight import into the person table. The person data comes with a unique varchar(40) for each person.
The person_marathon data comes in a CSV for each marathon, as a list of 1,000 to 30,000 unique varchar(40)'s that represent each person who participated in that marathon.
Summary: I am using PHP. So what is the best way to write the insert/import of the person_marathon data if I am using foreign keys? Would I have to do it like the insert statement above or is there a better way?
This is a many-to-many relationship, one person can enter many marathons, one marathon can be entered by many persons. You need additional table in your data model to track this relation, for example:
CREATE TABLE persons_marathons(
personID int FOREIGN KEY REFERENCES Persons(P_Id),
marathonID int FOREIGN KEY REFERENCES Marathons(M_Id)
)
This table uses foreign key constraints. The foreign key constraint prevents from inserting bad data (for example you cannot insert a row with personID = 123 when there is no such id in Persons table), it prevents also from deletes that would destroy a link between tables (for example you cannot delete a person X when exists a record in person_marathon table witth such personID).
If this table contains the following rows:
personID | MarathonID
----------+-----------
2 | 3
3 | 3
2 | 8
3 | 8
it means that persons 2 and 3 both entered marathons 3 and 8
In MySQL, is it possible to have a column in two different tables that auto-increment? Example: table1 has a column of 'secondaryid' and table2 also has a column of 'secondaryid'. Is it possible to have table1.secondaryid and table2.secondaryid hold the same information? Like table1.secondaryid could hold values 1, 2, 4, 6, 7, 8, etc and table2.secondaryid could hold values 3, 5, 9, 10? The reason for this is twofold: 1) the two tables will be referenced in a separate table of 'likes' (similar to users liking a page on facebook) and 2) the data in table2 is a subset of table1 using a primary key. So the information housed in table2 is dependent on table1 as they are the topics of different categories. (categories being table1 and topics being table2). Is it possible to do something described above or is there some other structural work around that im not aware of?
It seems you want to differentiate categories and topics in two separate tables, but have the ids of both of them be referenced in another table likes to facilitate users liking either a category or a topic.
What you can do is create a super-entity table with subtypes categories and topics. The auto-incremented key would be generated in the super-entity table and inserted into only one of the two subtype tables (based on whether it's a category or a topic).
The subtype tables reference this super-entity via the auto-incremented field in a 1:1 relationship.
This way, you can simply link the super-entity table to the likes table just based on one column (which can represent either a category or a topic), and no id in the subtype tables will be present in both.
Here is a simplified example of how you can model this out:
This model would allow you to maintain the relationship between categories and topics, but having both entities generalized in the superentity table.
Another advantage to this model is you can abstract out common fields in the subtype tables into the superentity table. Say for example that categories and topics both contained the fields title and url: you could put these fields in the superentity table because they are common attributes of its subtypes. Only put fields which are specific to the subtype tables IN the subtype tables.
If you just want the ID's in the two tables to be different you can initially set table2's AUTO_INCREMENT to some big number.
ALTER TABLE `table2` AUTO_INCREMENT=1000000000;
You can't have an auto_increment value shared between tables, but you can make it appear that it is:
set ##auto_increment_increment=2; // change autoinrement to increase by 2
create table evens (
id int auto_increment primary key
);
alter table evens auto_increment = 0;
create table odds (
id int auto_increment primary key
);
alter table odds auto_increment = 1;
The downside to this is that you're changing a global setting, so ALL auto_inc fields will now be growing by 2 instead of 1.
It sounds like you want a MySQL equivalent of sequences, which can be found in DBMS's like PosgreSQL. There are a few known recipes for this, most of which involve creating table(s) that track the name of the sequence and an integer field that keeps the current value. This approach allows you to query the table that contains the sequence and use that on one or more tables, if necessary.
There's a post here that has an interesting approach on this problem. I have also seen this approach used in the DB PEAR module that's now obsolete.
You need to set the other table's increment value manually either by the client or inside mysql via an sql function:
ALTER TABLE users AUTO_INCREMENT = 3
So after inserting into table1 you get back the last auto increment then modify the other table's auto increment field by that.
I'm confused by your question. If table 2 is a subset of table 3, why would you have it share the primary key values. Do you mean that the categories are split between table 2 and table 3?
If so, I would question the design choice of putting them into separate tables. It sounds like you have one of two different situations. The first is that you have a "category" entity that comes in two flavors. In this case, you should have a single category table, perhaps with a type column that specifies the type of category.
The second is that your users can "like" things that are different. In this case, the "user likes" table should have a separate foreign key for each object. You could pull off a trick using a composite foreign key, where you have the type of object and a regular numeric id afterwards. So, the like table would have "type" and "id". The person table would have a column filled with "PERSON" and another with the numeric id. And the join would say "on a.type = b.type and a.id = b.id". (Or the part on the "type" could be implicit, in the choice of the table).
You could do it with triggers:
-- see http://dev.mysql.com/doc/refman/5.0/en/information-functions.html#function_last-insert-id
CREATE TABLE sequence (id INT NOT NULL);
INSERT INTO sequence VALUES (0);
CREATE TABLE table1 (
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
secondardid INT UNSIGNED NOT NULL DEFAULT 0,
PRIMARY KEY (id)
);
CREATE TABLE table2 (
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
secondardid INT UNSIGNED NOT NULL DEFAULT 0,
PRIMARY KEY (id)
);
DROP TRIGGER IF EXISTS table1_before_insert;
DROP TRIGGER IF EXISTS table2_before_insert;
DELIMITER //
CREATE
TRIGGER table1_before_insert
BEFORE INSERT ON
table1
FOR EACH ROW
BEGIN
UPDATE sequence SET id=LAST_INSERT_ID(id+1);
NEW.secondardid = LAST_INSERT_ID();
END;
//
CREATE
TRIGGER table2_before_insert
BEFORE INSERT ON
table2
FOR EACH ROW
BEGIN
UPDATE sequence SET id=LAST_INSERT_ID(id+1);
NEW.secondardid = LAST_INSERT_ID();
END;
//
The table
I got a table that contains price for some 1 000 000 articles. The articles got a uniques ID-number but the table contains prices from multiple stores. Thus if two stores got the same article the uniques ID will not be unique for the table.
Table Structure
table articles
id INT
price IN
store VARCHAR(40)
Daily use
Except for queries using the ID-number by users I need to run daily updates where data from csv-files insert/update each article in the table. The choosen procedure is to try to select an article and then perform either an insert or an update.
Question
With this in mind, which key should I choose?
Here are some solutions that Ive been considering:
FULLTEXT index of the fields isbn and store
Add a field with a value generated by isbn and store that is set as PRIMARY key
One table per store and use isbn as PRIMARY key
Use a compound primary key consisting of the store ID and the article ID - that'll give you a unique primary key for each item on a per-store basis and you don't need a separate field for it (assuming the store id and article id are already in the table).
Ideally you should have 3 tables... something like:
article
--------------------------------------------
id | isbn | ... etc ...
store
--------------------------------------------
id | description | ... etc ...
pricelist
--------------------------------------------
article_id | store_id | price | ... etc ...
With the PRIMARY KEY for pricelist being a compound key made up of article_id and store_id.
EDIT : (updated to incorporate an answer from the comment)
Even on a million rows the UPDATE should be OK (for a certain definition of OK, it might still take a little while with 1 million+ rows) since the article_id and store_id comprise the PRIMARY KEY - they'll both be indexed.
You'll just need to write your query so that it's along the lines of:
UPDATE pricelist SET price = {$fNewPrice}
WHERE article_id = {$iArticleId}
AND store_id =` '{$sStoreId}'
Though you may want to consider converting the PRIMARY KEY in the store table (store.id - and therefore also pricelist.store_id in the pricelist table) to either an unsigned INT or something like CHAR(30).
Whilst VARCHAR is more efficient when it comes to disk space it has a couple of drawbacks:
1: MySQL isn't too keen on updating VARCHAR values and it can make the indexes bloat a bit so you may need to occasionally run OPTIMIZE TABLE on it (I've found this on an order_header table before).
2: Any (MyISAM) table with non-fixed length fields (such as VARCHAR) will have to have a DYNAMIC row format which is slightly less efficient when it comes to querying it - there's more information about that on this SO post: MySQL Row Format: Difference between fixed and dynamic?
Your indexes should be aligned with your queries. Certainly there should be a primary key on the articles table using STORE and ID - but the order in which they are declared will affect performance - depending on the data in the related tables and the queries applied. Indeed the simplest solution might be PRIMARY KEY(STORE, ID) and UNIQUE KEY(ID, STORE) along with foreign key constraints on the two fields.
i.e. since it makes NO SENSE to call this table 'articles', I'll use the same schema as CD001:
CREATE TABLE pricelist (
id INT NOT NULL ,
price INT,
store VARCHAR(40) NOT NULL
PRIMARY KEY(store,id),
UNIQUE KEY rlookup (id, store)
CONSTRAINT id FOREIGN KEY articles.id,
CONSRAINT store FOREIGN KEY store.name
);
Which also entails having a primary key on store using name.
The difference between checking a key based on a single column and one based on 2 columns is negligible - and normalising your database properyl will save you a LOT of pain.
I need an elegant way to store dynamic arrays (basically spreadsheets without all the functionality) of various sizes (both x and y), mostly being used as ENUMs, lists, lookup data, price sheets, that sort of thing. Multi-lingual would be a great bonus. Speed of the essence.
Here's an example of a typical "sheet" ;
| 1 | 2 | 3 | 4
-------------------------------
model A | 2$ | 5$ | 8$ | 10$
model B | 3$ | 6$ | 9$ | 12$
model C | 4$ | 8$ | 10$ | 13$
So, to get info, I would do ;
$price = this_thing_im_after ( '3', 'model B' ) ;
echo $price ; // Prints '9$'
I'm in the PHP5 and Zend Framework world, but thoughts on design and SQL is just as dandy, even suggestions on and from the outside world, libs, extensions, etc. as I don't want to reinvent too much of the wheel. I need the backend stuff the most, and I'll write a GUI for dynamic sheets later. Thoughts, ideas, pointers?
Just an edit to point out that I'd prefer not to serialize and blob the data as I would like to query the indeces and sheets, perhaps even the data (or type for those who support such, now that would be awsome!) if I'm in a crazy mood. But again, this is not a breaker for me; if someone has a nice library or class for serializing in and out quickly out of a database with some simple querying, I'm all happy.
Other than serializing the whole thing into a blob field, you probably end up with a key/value table where your key is the row and col fields:
CREATE TABLE sheet (
sheet_id int not null,
name varchar(32),
rows int, -- stores max dimension if needed
cols int, -- stores max dimension if needed
primary key (sheet_id)
);
CREATE TABLE cells (
cell_id identity, -- auto inc field for ease of updates
sheet_id int not null, -- foreign key to sheet table
row int not null,
col int not null,
value smalltext, -- or a big varchar depending on need
primary key (cell_id), -- for updates
unique index (sheet_id, row, col), -- for lookup
index (value) -- for search
);
CREATE TABLE row_labels (
sheet_id int not null,
row int not null,
label varchar(32),
primary key (sheet_id, row)
);
CREATE TABLE col_labels (
sheet_id int not null,
col int not null,
label varchar(32),
primary key (sheet_id, col)
);
This allows you to slice the data nicely:
// Slice [4:20][3:5]
SELECT row, col, value FROM cells
WHERE sheet_id = :sheet
AND row BETWEEN 4 AND 20
AND col BETWEEN 3 AND 5
ORDER BY row, col
while ($A = fetch()) {
$cell[$A['row'][$A['col']] = $A['value']; // or unserialize($A['value']);
}
Is there any need to fetch only part of a spreadsheet, or query by contained data?
If you just want to store and retrieve the whole thing, I would just use an unambiguous textual representation of the array (e.g. serialize()) and store it as TEXT.