So. I'm building a multi-tenant Laravel SaaS web-app, and am a little stuck when it comes to the database design. I have been searching around trying to find a solution, but i really can't decide on which one to go with. I really hope some of you with more experience and knowledge than me can come up with some advice. Sorry about the long post, but i hope you’ll hang in.
Problem:
In the app my users will be importing data from an external database of their own (with a know schema).
E.g.: I will be importing products with realtions to categories. The easiest way would just be to import the external product_id to the new primary key of the product.
BUT as the users product_id’s will probably conflict, i will have to assign each product with a new primary key, while still keeping the external product_id for reference when syncing back to the external db.
E.g.: external product_id will be ext_product_id and i will assign a new product_id as a primary key.
As of now i can think of 3 ways to do this:
Solution 1 - Single database with new primary keys:
So if i import a list of products and categories i will have to save each external product_id as ext_product_id and assign a new primary key to the product. I will then have to query the categories ext_category_id = the products ext_category_id and then create a new relation with the new primary key product_id and primary key category_id.
These looping queries takes forever when importing several thousands of rows, and some tables has 4 different relations which mean a lot of “ext_” columns to keep track of and sync.
Solution 2 - composite primary key:
As each user will have no reference to an external database i could create composite keys consisting of the tenant_id and e.g. the external product_id. This would allow my to just batch insert the external data with a key prefix consisting of the tenant. This way the relations should be working "out of the box".
But Laravel doesn't support the feature as far as i understand? Any ideas?
Solution 3 - multiple databases:
To create a separate database for each tenant would probably be the best solution performance and sanity wise (to begin with), as i would just be able to copy/batch insert the external database, and the relations would be working right away.
But i'm really worried about the scalability of this design: How many databases would i realistically be able to manage? Say i have 1000 or even 10000 customers?
What if i want to add a column in an update - would i be able to perform some kind of loop-migration to all databases?
I really hope that some of you can help me move on with this as i am stuck and have no experience with solution 2 and 3.
Thanks in advance!
I would personally go for Solution 2 as that is probably the safest.
Solution 1 should be ruled out since you don't want to confuse the users of your application by modifying their data.
Solution 3 would probably be a pain to maintain and is more likely to fail (back-end of the application) + you will lose all track of whose database it is.
As for solution 2 that seems to me like the ideal one:
I don't know what you are using (PHPMyAdmin or another type) but basically what you want to do is have 2 columns:
table
id(PK, AI) original_id(PK)
and then just the rest of your table.
Like this you will have your own Auto Increment (AI) key and you won't get any conflicts from your users since the combination of your auto_increment and that of the user is going to ALWAYS be unique.
for example:
user1:
id = 1 | original_id = 1
user2:
id = 2 | original_id = 1
This still works because the combination is unique.
Another pro of using this composite UID is that you can still use your own id to perform queries or actions on the desired rows etc...
Hope this helps
There are many things to consider when choosing an architecture, but from what you've described, I suggest you use Solution 3 because:
as you've very well pointed out, it's the best solution performance wise (especially if you end up with a lot of customers) and you won't need to handle the overhead of having large amounts of entries for all customers in one table
you have a clear database structure where only the necessary relations are present, no extra fuss to track different customers
As far as maintaining and updating database structure, you can can create Laravel Commands to automate running migrations for multiple databases. You can have a look at this answer to get an idea of how you could do that (although that situation is a little different from what you'll be needing, it offers some insight). Also anything else that needs to be handled in batch can be automated via Laravel commands or other scripts, so the amount of databases should not hinder maintenance.
A more modern way of doing this is to use UUID as primary keys. If you also,
when you import data have a source_uuid, import_time etc, in the table you can bookkeep all import (and export).
It might be hard to convince all parties to use UUID - but that is the best way go.
/gh
Related
So I have just set up a database which holds only one table with the following fields:
key_value: holds 6 digit code for a key
redeemed: boolean for if the key is redeemed
redeemed_by: who redeemed it
redeemed_date: when it was redeemed
software_name: name of the software the key relates to
I basically start with an empty database and then when someone purchases through PayPal, they get their own key and it is added to the database. After this they open an app which lets them input their code which is then searched for in the database and marked as redeemed so it can't be used again - this results in both redeemed and unredeemed codes being in one table.
If I was to reach a good few thousand purchases, would this cause the database to slow down majorly, crash maybe? what if it was a bigger number, say 10,000?
What exactly would be a good solution for this, even if I had another table of redeemed keys, it would have to look in the redeemed table to see if it was redeemed?
Thanks for any answer, I am still learning databases and SQL!
I think your design is sound. You might want to add indexes based on what queries you will be running. key_value sounds like a good primary key which would also serve as an index for updating redeemed.
As noted by Marc B, the hardware is your only likely consideration for performance.
I would use two tables for this: One for what you have spec'ed out, but another as an archive table with a job that migrates over redeemed/expired records on a regular basis.
Reasoning: The primary purpose of the table is for the benefit of redemptions, not for use as an archive. Over time, as more and more redeemed records are found in the table, the performance for lookups of unredeemed records starts getting worse and worse because of all the "deadwood" in the table. (Do you think eBay houses all active and completed auctions in one table?)
If you still absolutely need a "one-table" solution, you can easily create a view that merges the two tables.
Also, if you set up a proper primary key, the performance (for a while) will not degrade quickly as that would eliminate table scans which is what you are alluding to when the record volumes grow.
What do you think is the best approach for a PHP and SQL based web application that will be used by a number of people?
For example, say we have a table called "sales" and a user wants to access his sales. The table should contain a foreign key of the user_id or it will be better to make a separate table for each user?
Any other implementations and opinions are also welcome!
In my opinion best approach would be using two tables and refer from a foreign key. Make sure to use indexes as well. MySQL has done various optimizations to WHERE clause on a PRIMARY KEY or a UNIQUE index[1]. So you will be fine when working with considerable number of records(ex: handling 100000 records won't be a issue if you have capable hardware for database instance and optimized database configurations accordingly).
Make sure to do database optimizations based on your system to increase performance as well. Better to do in-house testing to make sure system is upto your expectations in long run.
[1] http://dev.mysql.com/doc/refman/5.7/en/where-optimizations.html
By sales I take it you mean the users invoices or billing history. You would want to separate them out into two tables using foreign keys. You could have a users table and an invoices table with the userid as the foreign key in your invoices table.
Whenever the user wanted to view their invoices, you would select rows from the invoices table where the userid is that users id.
I'm not a database expert, so I'm not sure how to ask this question briefly and succinctly. I am trying to copy data with the following characteristics: many of the tables with data being copied contain references to other tables with data being copied; i.e., a patient might attend a class where their weight is recorded, so I need to copy both the class attendance row as well as the weight value stored in another table, which is referenced by the class attendance row. There are other, even more complex, examples in this database, but it seems that I need to perform some kind of recursive copy of these inter-referenced items so I can maintain the cross-references in the copied data.
So, is there any kind of standard approach to this problem? If there isn't a direct answer, could someone share the terminology of what I'm trying to do so that I can look it up on my own? I'm certain this problem has been tackled many times before, but I don't know how to find the solution. I understand the basic concepts of JOINs and FKs, but this solution seems to require a way to copy the rows from various tables while also going back and updating the cross-references (in some cases, these are FKs, and in other cases, they are not; I'm stuck with the schema as it is).
PS: If it's such an obvious solution, why won't anyone just provide it or characterize it below so we can move on? Most of humanity is capable of asking the occasional dumb question, and this may very well be one of mine, but I'm seriously stuck on this one and would appreciate some assistance.
Here's a sketch of a small part of the schema to try to illustrate the issue:
When we copy a patient's data record, we need to 1) create a new row in patient; 2) create a corresponding new row in edclass_session_labs; 3) create a new row in patient_lab_weight; and (here's what I see as the tricky part) 4) also update the reference in edclass_session_labs to the new row in patient_lab_weight. What I'm looking for is a way to do this programmatically and algorithmically. I'm sure problems like this have been tackled before, so that's why I'm asking for advice here.
I didn't fully understand what you mean by "copy patient data", so there are two options:
1) If you want to "copy" the data to a report, you need to link many tables with related information, so you have to study the concept of JOINs and FOREIGN KEYs. This is what we do when we need to convert relational data into a flat table that can be easily read by non-IT people.
2) If you need to copy specific data from database tables to other database tables, you also have to study FOREIGN KEYs and table relationship. You need to understand how table rows relate to rows on other tables (one to many, many to one, many to many), so you can create INSERT statements based on SELECTs that will filter the exact data you need.
This is very general, but I think it's sufficient to point you to the right direction.
EDIT:
Since the issue is related to creating a merged structure of patient data, let's say we have patient 1 and patient 2. They are duplicates of the same person, and need to be merged. I would do this, in this order:
a) Create a patient 3, this one will be the target of our merging. Simply copy each field from patients 1 or 2 to this new record.
b) Create as many new records as needed in table "patient_lab_weight". For example: if patient 1 has 2 records there, and patient 2 has 4 records, you will have to create 6 records, which are copies of the records related to patient 1 and 2, but patient_id will be 3. However, after creating each record here, obtain the auto_increment generated for field "patient_lab_weight_id", and insert a new record in "ed_class_session_labs", with patient_id = 3, and "patient_lab_weight_id" = the obtained ID. Do that for each insert on "patient_lab_weight".
c) after all that, disable patients 1 and 2 in your application.
If you use this approach, you will slowly build up your new structure, linked in a consistent way.
I am creating a database for keeping track of water usage per person for a city in South Florida.
There are around 40000 users, each one uploading daily readouts.
I was thinking of ways to set up the database and it would seem easier to give each user separate a table. This should ease the download of data because the server will not have to sort through a table with 10's of millions of entries.
Am I false in my logic?
Is there any way to index table names?
Are there any other ways of setting up the DB to both raise the speed and keep the layout simple enough?
-Thank you,
Jared
p.s.
The essential data for the readouts are:
-locationID (table name in my idea)
-Reading
-ReadDate
-ReadTime
p.p.s. during this conversation, i uploaded 5k tables and the server froze. ~.O
thanks for your help, ya'll
Setting up thousands of tables in not a good idea. You should maintain one table and put all entries in that table. MySQL can handle a surprisingly large amount of data. The biggest issue that you will encounter is the amount of queries that you can handle at a time, not the size of the database. For instances where you will be handling numbers use int with attribute unsigned, and instances where you will be handling text use varchar of appropriate size (unless text is large use text).
Handling users
If you need to identify records with users, setup another table that might look something like this:
user_id INT(10) AUTO_INCREMENT UNSIGNED PRIMARY
name VARCHAR(100) NOT NULL
When you need to link a record to the user, just reference the user's user_id. For the record information I would setup the SQL something like:
id INT(10) AUTO_INCREMENT UNSIGNED PRIMARY
u_id INT(10) UNSIGNED
reading Im not sure what your reading looks like. If it's a number use INT if its text use VARCHAR
read_time TIMESTAMP
You can also consolidate the date and time of the reading to a TIMESTAMP.
Do NOT create a seperate table for each user.
Keep indexes on the columns that identify a user and any other common contraints such as date.
Think about how you want to query the data at the end. How on earth would you sum up the data from ALL users for a single day?
If you are worried about primary key, I would suggest keeping a LocationID, Date composite key.
Edit: Lastly, (and I do mean this in a nice way) but if you are asking these sorts of questions about database design, are you sure that you are qualified for this project? It seems like you might be in over your head. Sometimes it is better to know your limitations and let a project pass by, rather than implement it in a way that creates too much work for you and folks aren't satisfied with the results. Again, I am not saying don't, I am just saying have you asked yourself if you can do this to the level they are expecting. It seems like a large amount of users constantly using it. I guess I am saying that learning certain things while at the same time delivering a project to thousands of users may be an exceptionally high pressure environment.
Generally speaking tables should represent sets of things. In your example, it's easy to identify the sets you have: users and readouts; there the theoretical best structure would be having those two tables, where the readouts entries have a reference to the id of the user.
MySQL can handle very large amounts of data, so your best bet is to just try the user-readouts structure and see how it performs. Alternatively you may want to look into a document based NoSQL database such as MongoDB or CouchDB, since storing readouts reports as individual documents could be a good choice aswell.
If you create a summary table that contains the monthly total per user, surely that would be the primary usage of the system, right?
Every month, you crunch the numbers and store the totals into a second table. You can prune the log table on a rolling 12 month period. i.e., The old data can be stuffed in the corner to keep the indexes smaller, since you'll only need to access it when the city is accused of fraud.
So exactly how you store the daily readouts isn't that big of a concern that you need to be freaking out about it. Giving each user his own table is not the proper solution. If you have tons and tons of data, then you might want to consider sharding via something like MongoDB.
I am trying to create a site where users can register and create a profile, therefore I am using two MySQL tables within a database e.g. users and user_profile.
The users table has an auto increment primary key called user_id.
The user_profile table has the same primary key called user_id however it is not auto increment.
*see note for why I have multiple tables.
When a user signs up, data from the registration form is inserted into users, then the last_insert_id() is input into the user_id field of the user_profile table. I use transactions to ensure this always happens.
My question is, is this bad practice?
Should I have a unique auto increment primary key for the user_profile table, even though one user can only ever have one profile?
Maybe there are other downsides to creating a database like this?
I'd appreciate if anyone can explain why this is a problem or if it's fine, I'd like to make sure my database is as efficient as possible.
Note: I am using seperate tables for user and user_profile because user_profile contains fields that are potentially null and also will be requested much more than the user table, due to the data being displayed on a public profile.
Maybe this is also bad practice and they should be lumped in one table?
I find this a good approach, I'd give bonus point if you use a foreign key relation and preferably cascade when deleting the user from the user table.
As too separated the core user data in one table, and the option profile data in another - good job. Nothing more annoying then a 50 field dragonish entry with 90% empty values.
It is generally frowned upon, but as long as you can provide the reasoning for the 1 to 1 relationship I'm sure it is fine.
I have used them when I have hundreds of columns (and it would be more logical to split them out into separate tables)
or I need a thinner table to speed up fullscans
In your case I would use a single table and create a couple of views.
see: http://dev.mysql.com/doc/refman/5.0/en/create-view.html
In general a single table approach is more logical, quicker, simpiler, and uses less space.
I don't think it's a bad practice. Sometimes it's quite useful, especially if you want one class to deal with authentication, and not load all profile data. You can then modify how your authentication works, build web services and so on, with little care about maintaining data structures about profiles information which is likely to change as your project evolves.
This is very good practice.
It's right at the core of writing good, modular, normalised relational database structures.