I am wondering what is the best solutions to store relations between 2 tables in mysql.
I have following structure
Table: categories
id | name | etc...
_______________________________
1 | Graphic cards | ...
2 | Processors | ...
3 | Hard Drives | ...
Table: properties_of_categories
id | name
_____________________
1 | Capacity
2 | GPU Speed
3 | Memory size
4 | Clock rate
5 | Cache
Now I need them to have connections, and question is what is a better, more efficient and lighter solution, which is important because there may be hundreds of categories and thousands of properties assigned to them.
Should I just create another table with a structure like
categoryId | propertyId
Or perhaps add another column to categories table and store properties in text field like 1,7,19,23
Or maybe create json files named for example 7.json with content like
{1,7,19,23}
As this question is pertaining to Relational World, I would suggest to add another table to store many to many relationship between Category and Property.
You can also use JSON column to store many values in one of the table.
JSON Datatype is introduced in MYSQL 5.7 and it comes with various features for JSON data retrieval and updation. However if you are using older version, you would need to manage it with string column with some cumbersome queries for string manipulation.
The required structure depends on the relationship type: one-to-many, many-to-one, or many-to-many (M2M).
For a one-to-many, a foreign key (FK) on the 'many' side relates many items to the 'one' side. The reverse is correct for many-to-one.
For many-to-many (M2M) you need an intermediate relational (or junction) table exactly as you suggest. This allows you to "reuse" both categories and properties in any combinations. However it's slightly more SQL - requiring 2 JOINs.
If you are looking for performance, then using FKs to primary keys (PKs) would be very efficient and the queries are pretty simple. Using JSON would presumably require you to parse in PHP and construct on-the-fly second queries which would multiply your coding work and testing, data transfer, CPU overhead, and limit scalability.
In your case I'm guessing that both "graphics cards" and "hard drives" could have e.g. "memory size" plus other properties, so you would need a M2M relational table as you suggest.
As long as your keys are indexed (which PKs are), your JOIN to this relational table will be very quick and efficient.
If you use CONSTRAINTs with your relations, they you ensure you maintain data integrity: you cannot delete a category to which a property is "attached". This is a good feature in the long run.
Hundreds and thousands of records is a tiny amount for MySQL. You would use this technique even with millions of records. So there's no worry about size.
RDBMS databases are designed specifically to do this, so I would recommend using the native features than try to do it yourself in JSON. (unless I'm missing some new JSON MySQL feature! *)
* Since posting this, I indeed stumbled across a new JSON MySQL feature. It seems, from a quick read, you could implement all sorts of new structures and relations using JSON and virtual column keys, possibly removing the need for junction tables. This will probably blur the line between MySQL as an RDBMS and NoSQL.
The first solution is better when it comes to relational databases. You should create a table that will pair each category to multiple properties (1:n relationship)
You could structure the table like so:
CREATE TABLE categories_properties_match(
categoryId INTEGER NOT NULL,
propertyId INTEGER NOT NULL,
PRIMARY KEY(categoryId, propertyId),
FOREIGN KEY(categoryId) REFERENCES categories(id) ON UPDATE CASCADE ON DELETE CASCADE,
FOREIGN KEY(propertyId) REFERENCES properties_of_categories(id) ON UPDATE CASCADE ON DELETE CASCADE
);
The primary key ensures that there will be no duplicate entries, that means entries that match one category to the same property twice
Related
I have two different tables and I am not sure of the best way to get it out of the first normal form and into the second normal form. The first table hold the user information while the second is the products associated with the account. If I do it this way, I know it is only in the NF1 and that the foreign key of User_ID will be repeated many times in Table 2. See the tables below.
Table 1
|User_ID (primary)| Name | Address | Email | Username | Password |
Table 2
| Product_ID (Primary Key) | User_ID (Foreign Key) |
Is this a better way to make table two in which the user ID is not repeated? I have thought about having a separate table in the database for each user, but from all of the other questions I read on StackOverFlow, this is not a good idea.
The constraints I am working with are 1-1000 users and Table Two will have approximately 1-1000 indexes per user. Is there a better way to create this set of tables?
I don't see NF2 violated. It states:
a table is in 2NF if it is in 1NF and no non-prime attribute is dependent on any proper subset of any candidate key of the table.
quoted from Wikipedia article "Second normal form", 2016-11-26
Table 2 has only one candidate key, the primary key. The primary key consists of only one column. So, there is no proper subset of a candidate key. So, NF2 can't be violated unless NF1 is not fulfilled.
you says "to make table two in which the user ID is not repeated"
then why you dont do
Table 1
|User_ID (primary)| Name | Address | Email | Username | Password | Product_ID ( Foreign Key nullable)|
Table 2
| Product_ID (Primary Key)|
There's nothing wrong with a value appearing many times. Redundancy arises when two queries that aren't syntactically equivalent always both return the same value. Only uncontrolled redundancy is bad. Normalization controls some redundancy by replacing a table by smaller ones that join to it.
Normalization decomposes a table independently of other tables. (We define the normal form of a database as the lowest normal form that all of its tables are in.) Foreign keys have nothing to do with violating normal forms.
Learn what it means for a table to be in a given normal form. You will need to learn a definition. And the definitions of the terms it uses. And the definitions of the terms they use. Etc. A table is in 2NF when every non-prime column has a functional dependency that is full on every candidate key. Also learn the algorithm for decomposing a table into components that are in a given normal form. Assuming that these tables can hold more than one row, so that {} is not a candidate key, both these tables are in 2NF.
A table in 2NF is also in 1NF. So you don't want "to get it out of the first normal form".
2NF is unimportant. When dealing with functional dependencies, what matters is BCNF, which decomposes as much as possible but requires certain higher-cost contraints, and 3NF, which doesn't decompose as much as possble but requires certain lower-cost constraints.
So. I'm building a multi-tenant Laravel SaaS web-app, and am a little stuck when it comes to the database design. I have been searching around trying to find a solution, but i really can't decide on which one to go with. I really hope some of you with more experience and knowledge than me can come up with some advice. Sorry about the long post, but i hope you’ll hang in.
Problem:
In the app my users will be importing data from an external database of their own (with a know schema).
E.g.: I will be importing products with realtions to categories. The easiest way would just be to import the external product_id to the new primary key of the product.
BUT as the users product_id’s will probably conflict, i will have to assign each product with a new primary key, while still keeping the external product_id for reference when syncing back to the external db.
E.g.: external product_id will be ext_product_id and i will assign a new product_id as a primary key.
As of now i can think of 3 ways to do this:
Solution 1 - Single database with new primary keys:
So if i import a list of products and categories i will have to save each external product_id as ext_product_id and assign a new primary key to the product. I will then have to query the categories ext_category_id = the products ext_category_id and then create a new relation with the new primary key product_id and primary key category_id.
These looping queries takes forever when importing several thousands of rows, and some tables has 4 different relations which mean a lot of “ext_” columns to keep track of and sync.
Solution 2 - composite primary key:
As each user will have no reference to an external database i could create composite keys consisting of the tenant_id and e.g. the external product_id. This would allow my to just batch insert the external data with a key prefix consisting of the tenant. This way the relations should be working "out of the box".
But Laravel doesn't support the feature as far as i understand? Any ideas?
Solution 3 - multiple databases:
To create a separate database for each tenant would probably be the best solution performance and sanity wise (to begin with), as i would just be able to copy/batch insert the external database, and the relations would be working right away.
But i'm really worried about the scalability of this design: How many databases would i realistically be able to manage? Say i have 1000 or even 10000 customers?
What if i want to add a column in an update - would i be able to perform some kind of loop-migration to all databases?
I really hope that some of you can help me move on with this as i am stuck and have no experience with solution 2 and 3.
Thanks in advance!
I would personally go for Solution 2 as that is probably the safest.
Solution 1 should be ruled out since you don't want to confuse the users of your application by modifying their data.
Solution 3 would probably be a pain to maintain and is more likely to fail (back-end of the application) + you will lose all track of whose database it is.
As for solution 2 that seems to me like the ideal one:
I don't know what you are using (PHPMyAdmin or another type) but basically what you want to do is have 2 columns:
table
id(PK, AI) original_id(PK)
and then just the rest of your table.
Like this you will have your own Auto Increment (AI) key and you won't get any conflicts from your users since the combination of your auto_increment and that of the user is going to ALWAYS be unique.
for example:
user1:
id = 1 | original_id = 1
user2:
id = 2 | original_id = 1
This still works because the combination is unique.
Another pro of using this composite UID is that you can still use your own id to perform queries or actions on the desired rows etc...
Hope this helps
There are many things to consider when choosing an architecture, but from what you've described, I suggest you use Solution 3 because:
as you've very well pointed out, it's the best solution performance wise (especially if you end up with a lot of customers) and you won't need to handle the overhead of having large amounts of entries for all customers in one table
you have a clear database structure where only the necessary relations are present, no extra fuss to track different customers
As far as maintaining and updating database structure, you can can create Laravel Commands to automate running migrations for multiple databases. You can have a look at this answer to get an idea of how you could do that (although that situation is a little different from what you'll be needing, it offers some insight). Also anything else that needs to be handled in batch can be automated via Laravel commands or other scripts, so the amount of databases should not hinder maintenance.
A more modern way of doing this is to use UUID as primary keys. If you also,
when you import data have a source_uuid, import_time etc, in the table you can bookkeep all import (and export).
It might be hard to convince all parties to use UUID - but that is the best way go.
/gh
I am developing a MySQL db for a user list, and I am trying to determine the most efficient way to design it.
My issue comes in that there are 3 types of users: "general", "normal", and "super". General and normal users differ only in the values of certain columns, so the schema to store them is identical. However, super users have at least 4 extra columns of info that needs to be stored.
In addition, each user needs a unique user_id for reference from other parts of the site.
So, I can keep all 3 users in the same table, but then I would have a lot of NULL values stored for the general and normal user rows.
Or, I can split the users into 2 tables: general/normal and super. This would get rid of the abundance of NULLs, but would require a lot more work to keep track of the user_ids and ensure they are unique, as I would have to handle that in my PHP instead of just doing a SERIAL column in the single table solution above.
Which solution is more efficient in terms of memory usage and performance?
Or is there another, better solution I am not seeing?
Thanks!
If each user needs a unique id, then you have the answer to your question: You want one users table with a UserId column. Often, that column would be an auto-incremented integer primary key column -- a good approach to the implementation.
What to do about the other columns? This depends on a number different factors, which are not well explained in your question.
You can store all the columns in the same table. In fact, you could then implement views so you can see users of only one type. However, if a lot of the extra columns are fixed-width (such as numbers) then space is still allocated. Whether or not this is an issue is simply a question of the nature of the columns and the relative numbers of different users.
You can also store the extra columns for each type in its own table. This would have a foreign key relationship to the original table, using the UserId. If both these keys are primary keys, then the joins should be very fast.
There are more exotic possibilities as well. If the columns do not need to be indexed, then MySQL 5.7 has support for JSON, so they could all go into one column. Some databases (particularly columnar-oriented ones) allows "vertical partitioning" where different columns in a single table are stored in separate allocation units. MySQL does not (yet) support vertical partitioning.
why not build an extra table; but only for the extra coloumns you need for super users? so 2 tables one with all the users and one with super users's extra info
If you want to have this type of schema. try to create a relation
like:
tb_user > user_id , user_type_id(int)
tb_user_type > user_type_id(int) , type_name
this way you will have just 2 tables and if the type is not set you can set a default value to a user.
I have a system which has (for the sake of a simple example) tables consisting of Customers, Vendors, Products, Sales, etc. Total 24 tables at present.
I want to add the ability to have multiple notes for each record in these tables. E.g., Each Customers record could have 0 to many notes, each Vendor record could have 0 to many notes, etc.
My thinking is that I would like to have one "Notes" table, indexed by a Noteid and Date/time stamp. The actual note data would be a varchar(255).
I am looking for a creative way to bi-directionally tie the source tables to the Notes table. The idea of having 24 foreign key type cross reference tables or 24 Notes tables doesn't really grab me.
Programming is being done in PHP with Apache server. Database is mysql/InnoDB.
Open to creative ideas.
Thanks
Ralph
I would sugges a table like this
note_id : int autoincrement primary
type_id : int, foreign key from f Customers, Vendors, Products etc
type : varchar, code indicating the type, like Vendors, VENDORS or just V
note : varchar, the actual node
CREATE TABLE IF NOT EXISTS `notes` (
`note_id` int(11) NOT NULL AUTO_INCREMENT,
`type_id` int(11) NOT NULL,
`type` varchar(20) CHARACTER SET utf8 NOT NULL,
`note` varchar(255) CHARACTER SET utf8 NOT NULL,
PRIMARY KEY (`note_id`)
)
With a setup like that you can have multiple notes for each type, like Vendors, and also hold notes for multiple types.
data sample
note_id type_id type note
--------------------------------------------------------------------
1 45 Vendors a note
2 45 Vendors another note
3 3 Customers a note for customer #3
4 67 Products a note for product #67
SQL sample
select note from notes where type="Vendors" and type_id=45
To reduce table size, I would prefer aliases for the types, like V, P, C and so on.
Don't do a "universal" table, e.g.
id, source_table, source_record_id, note_text
might sound good in practice, but you can NOT join this table against your others without writing dynamic SQL.
It's far better to simply add a dedicated notes field to every table. This eliminates any need for dynamic sql, and the extra space usage will be minimal if you use varchar/text fields, since those aren't stored in-table anyways.
I've done a structure like this before where I used a format like this:
id (int)
target_type (enum/varchar)
target_id (int)
note (text)
Each data element just has to query for it's own type then, so for your customer object you would query for notes attached to it like this
SELECT * FROM notes where target_type='customer' AND target_id=$this->id
You can also link target_type to the actual class, so that you write to the database using get_class($this) to fill out target type, in which case a single function inside of the Note class could take in any other object type you have.
In my opinion, there isn't a clean solution for this.
option 1: Master entity table
Every (relevant) row of every (relevant) table has a master entry inside a table (let's call it entities_tbl. The ids of each derived table isn't an autoincrement but it's a foreign key referencing the master table.
Now you can easily link the notes table with the master entity id.
PRO: It's an object oriented idea. Like a base "Object" class which is the father of every other class. Also, each entity has an unique id across the database.
CON: It's a mess. Every entity ID is scattered among (at least) two tables. You'd need JOINs every single time, and the master entity table will be HUGE (it will contain the same number of rows as the sum of every other child table, combined)
option 2: meta-attributes
inside the notes table, the primary key would contain an autoincrement, the entity_id and item_table_name. This way you can easily extract the notes of any entity from any table.
PRO: Easy to write, easy to populate
CON: It needs meta-values to extract real values. No foreign keys to grant referential integrity, messy and sloppy joins, table names as where conditions.
option 3: database denormalization
(sigh, I've never considered to ever give this suggestion)
Add a column inside each table where you need notes. Store the notes as json encoded strings. (this means to denormalize a database because you will introduce non-atomic values)
PRO: easy and fast to write, uses some form of standard even for future database users, the notes are centralized and easily accessible from each entity
CON: the database isn't normalized, poor search and comparison between notes
i have a similar issue as espoused in How to design a product table for many kinds of product where each product has many parameters
i am convinced to use RDF now. only because of one of the comments made by Bill Karwin in the answer to the above issue
but i already have a database in mysql and the code is in php.
1) So what RDF database should I use?
2) do i combine the approach? meaning i have a class table inheritance in the mysql database and just the weird product attributes in the RDF? I dont think i should move everything to a RDF database since it is only just products and the wide array of possible attributes and value that is giving me the problem.
3) what php resources, articles should i look at that will help me better in the creation of this?
4) more articles or resources that helps me to better understand RDF in the context of the above challenge of building something that will better hold all sorts of products' attributes and values will be greatly appreciated. i tend to work better when i have a conceptual understanding of what is going on.
Do bear in mind i am a complete novice to this and my knowledge of programming and database is average at best.
Ok, one of the main benefits of RDF is global identity, so if you use surrogate ids in your RDBMS schema, then you could assign the surrogate ids from a single sequence. This will make certain RDF extensions to your schema easier.
So in your case you would use a single sequence for the ids of products and other entities in your schema (maybe users etc.)
You should probably keep essential fields in normalized RDBMS tables, for example a products table with the fields which have a cardinality of 0-1.
The others you can keep in an additional table.
e.g. like this
create table product (
product_id int primary key,
// core data here
)
create table product_meta (
product_id int,
property_id int,
value varchar(255)
)
create table properties (
property_id int primary key,
namespace varchar(255),
local_name varchar(255)
)
If you want also reference data in dynamic form you can add the following meta table :
create table product_meta_reference (
product_id int,
property_id int,
reference int
)
Here reference refers via the surrogate id to another entity in your schema.
And if you want to provide metadata for another table, let's say user, you can add the following table :
create table user_meta (
user_id int,
property_id int,
value varchar(255)
)
I put this as a different answer, because it is a more specific suggestion then my first answer.
1 & 3) As you're using PHP and MySQL you're best bet would be either ARC 2 (although the website states this is a preview release this is the release you want) or RAP both of which allow for database based storage allowing you to store your data in MySql
ARC 2 seems to be more popular and well maintained in my experience
2) Depends how much of your existing code base would have to change if you move everything to RDF and what kinds of queries you do with the existing data in your database. Some things may be quicker using RDF and SPARQL while some may be slower.
I haven't used RDF with PHP, but in general if you use two persistence technologies in one project then the result is probably more complex and risky than using one alone.
If you stay with the RDBMS approach you can make it more RDF like by introducing the following attributes :
use a single sequence for all surrogate ids, this way you get unique identifiers, which is a requirement for RDF
use base tables for mandatory properties and extension tables with subject + property + values columns for additional data
You don't have to use an RDF engine to keep your data in RDF mappable form.
Compared to EAV RDF is a more expressive paradigm for dynamic data modeling.