I have a database that contains a large number of tables that can be divided into multiple databases. The connection between the tables is, for example:
DB1: users (contains the field 'client_id')
DB2: customers (contains all the tables and relationships)
The two DBs are therefore connected via the 'client_id' field in the users table in DB1, and the 'id' field in the customers table in DB2.
Additionally, I also have a third DB that is connected in a similar way to the second DB.
Is this good practice? I have read that it can create performance problems, but keeping everything in a single DB doesn't seem ideal either.
Do you have any ideas or suggestions? Can this approach work?
In MySQL, databases (aka schemas) are just subdirectories under the datadir. Tables in different schemas on the same MySQL Server instance share the same resources (storage, RAM, and CPU). You can make relationships between tables in different schemas.
Assuming they are on the same MySQL Server instance, there is no performance implication with keeping tables together in one schema versus separating them into multiple schemas.
Using schemas is mostly personal preference. It can make certain tasks more convenient, such as granting privileges, backing up and restoring, or using replication filters. These have no direct effect on query performance.
Related
I am designing common web-based CRM application for huge number of tenants.
Users (tenants) sign up online for use of the application. Initially, there will not be so many users but in future there are possibilities.
I want to use single shared MySQL database. It will be impossible to create separate database for each tenant because of the chosen scenario and future functionality integrations. The programming will be in PHP.
But, how should I address data scalability issues:
What if rows in the table exceed the size of table. How to address this issue?
If I use auto increment BIGINT primary key, for example for 'contacts' table. What will happen after the largest value of BIGINT reached?
Is it best practice to use foreign key constraints in real huge data tables? How will it affect the performance of the application, if used or not used?
Will MySQL be good fit for this kind of applications?
What is Zoho CRM's database technique for multi-tenancy?
MySQL is pretty good a scaling up, even with enormous tables. Basically you can just put your database on larger and more powerful servers to handle the demand. In my experience it's usually limited by RAM.
Once that technique starts getting dicey you can scale out by creating read replicas of the database. Basically these are read-only copies of your master database that are continuously synchronized with the master. In your application use two different database connections. The first connection is to a read-replica and is used for all SELECT statements. The other connection is to your master to be used for all INSERT, UPDATE, and DELETE statements. Since many applications do more SELECTs than anything else and there is very little limit on how many read-replicas you can create this will greatly expand your potential scale.
In MySQL I tend to use a single database for all tenants and segment the data by using different database usernames for each tenant. Through a tenant_id column and views that filter by the tenant_id I can assure that tenants don't have any access to other tenant's data. I wrote a blog post on how I was able to convert a single-tenant application to multi-tenant in a weekend: https://opensource.io/it/mysql-multi-tenant/
Having a single database and single codebase for all tenants is much easier to maintain than multiple databases or schemas.
How can I connect two tables from different databases using PostgreSQL? My first database is called "payments_details" and my second one is called "insurance". Also I want to display and highlight the id's they don't have in common using php, is that possible?
Databases are isolated from each other, you cannot access data from different databases with one SQL statement. That is not a bug, but a design feature.
There are three ways to achieve what you want:
Don't put the data in different databases, but in different schemas in one database. It is a common mistake for people who are more experienced with MySQL to split up data that belong to one application in multiple databases and then try to join them. This is because the term database in MySQL is roughly equivalent to what in (standard) SQL is called a schema.
If you cannot do the above, e.g. because the data really belong to different applications, you can use the PostgreSQL foreign data wrapper. This enables you to access tables from a different database (or even on a different machine) as if they were local tables. You'll have to write your statements more carefully, because complicated queries can sometimes be inefficient if large amounts of data have to me transferred between the databases.
You can use dblink, which is an older and less comfortable interface than foreign data wrappers, but can allow you to do things that you could not do otherwise, like call a remote function.
I am taking a PHP MySQL app which was built for one customer and deploying it to be used by multiple customers. Each customer account will have many users (30-200), each user can have several classes, each class has several metrics, each metric contains several observations. Several means about 2-8.
Originally I was planning to have one instance of the application code which would connect to the appropriate table set for that customer based on a table prefix. But I now considering using only one set of tables for all my customer accounts. This would simplify the application design which would be best int he long run. My question is whether I would be taxing the database server by combining all the customer data into the same tables. Most queries will be SELECTs, but due to the nature of the schema there can be quite a few JOINS required. Most INSERT or UPDATE queries are just one row in one table, and possibly one or two bridge entity tables at most.
I know this is one of those "it depends" questions but I am hoping to get a little guidance regarding how slow/fast MySQL is with what I am trying to do.
Here is an example of the longest JOIN queryvI would be doing.
SELECT $m_measure_table_name.*, $m_metric_table_name.metric_name,$m_metric_table_name.metric_descrip, $m_metric_table_name.metric_id, $c_class_table_name.class_size,$c_class_table_name.class_id,$c_class_table_name.class_field,$c_class_table_name.class_number,$c_class_table_name.class_section, $lo_table_name.*,$lc_table_name.*, $user_table_name.user_name,$user_table_name.user_id, $department_table_name.*
FROM $m_measure_table_name
LEFT JOIN $m_metric_table_name ON $m_measure_table_name.measure_metric_id=$m_metric_table_name.metric_id
LEFT JOIN $c_class_table_name ON $m_metric_table_name.metric_class_id=$c_class_table_name.class_id
LEFT JOIN $lo_table_name ON $m_metric_table_name.metric_lo_id=$lo_table_name.lo_id
LEFT JOIN $lc_table_name ON $lo_table_name.lo_lc_id=$lc_table_name.lc_id
LEFT JOIN $class_user_table_name ON $c_class_table_name.class_id=$class_user_table_name.cu_class_id
LEFT JOIN $user_table_name ON $user_table_name.user_id=$class_user_table_name.cu_user_id
LEFT JOIN $department_class_table_name ON $c_class_table_name.class_id=$department_class_table_name.dc_class_id
LEFT JOIN $department_table_name ON $department_class_table_name.dc_department_id=$department_table_name.department_id
WHERE $c_class_table_name.class_semester=:class_semester AND $c_class_table_name.class_year=:class_year
AND $department_table_name.department_id=:id
ORDER BY $department_table_name.department_name, $lc_table_name.lc_name, $lo_table_name.lo_id
Ultimately my question is whether doing long strings of JOINS like this on primary keys is taxing to the database. Also whether using one set of tables seems like the better approach to deployment.
This is too long for a comment.
SQL is designed to perform well on tables with millions of rows, assuming you have appropriate indexing and table partitioning. I wouldn't worry about data volume being an issue in this case.
However, you may have an issue with security. You probably don't want different customers to see each other's data. Row-level security is a pain in SQL. Table-level is much easier.
Another approach is to create a separate database for each customer. In addition to the security advantages, this also allows you to moving different customers to different servers to meet demand.
It does come at a cost. If you have common tables, then you need to replicate them or have a "common tables" database. And, when you update the code, then you need to update all the databases. The latter may actually be an advantage as well. It allows you to move features out to customers individually, instead of requiring all to upgrade at the same time.
EDIT: (about scaling one database)
Scaling should be fine for one database, in general. Databases scale, you just have to throw more hardware, essentially in a single server, at the problem. You will need the judicious use of indexes for performance and possibly partitions if the data grows quite large. With multiple databases you can throw more "physical" servers at the problem, with one database you throw "bigger" servers at the problem. (Those are in double quotes because many servers nowadays are virtual anyway.)
As an example of the difference. If you have 100 clients, then you can back-up the 100 databases at times convenient to them and all in parallel. And, if the databases are on separate servers, the backups won't interfere with each other. With a single database, you back up once and it affects everyone at the same time. And the backup may take longer because you are not running separate jobs (backups can take advantage of parallelism).
Here's the situation:
I have a mySQL db on a remote server. I need data from 4 of its tables. On occasion, the schema of these tables is changed (new fields are added, but not removed). At the moment, the tables have > 300,000 records.
This data needs to be imported into the localhost mySQL instance. These same 4 tables exist (with the same names), but the fields needed are a subset of the fields in the remote db tables. The data in these local tables is considered read-only and is never written to. Everything needs to be run in a transaction so there is always some data in the local tables, even if it is a day old. The localhost tables are used by an active website, so this entire process needs to complete as quickly as possible to minimize downtime.
This process runs once per day.
The options as I see them:
Get a mysqldump of the structure/data of the remote tables and save to file. Drop the localhost tables, and run the dumped sql script. Then recreate the needed indexes on the 4 tables.
Truncate the localhost tables. Run SELECT queries on the remote db in PHP and retrieve only the fields needed instead of the entire row. Then loop through the results and create INSERT statements from this data.
My questions:
Performance wise, which is my best option?
Which one will complete the fastest?
Will either one put a heavier load on the server?
Would indexing the
tables take the same amount of time in both options?
If there is no good reason for having the local d/b be a subset of the remote, make the structure the same and enable database replication on the needed tables. Replication works by the master tracking all changes made, and managing each slave d/b's pointer into the changes. Each slave says give me all changes since the last request. For a sizeable database, this is far more efficient than any alternative you have selected. It comes with only modest cost.
As for schema changes, I think the alter information is logged by the master, so the slave(s) can replicate those as well. The mechanism definitely replicates drop table ... if exists and create table ... select, so alter logically should follow, but I have not tried it.
Here it is: confirmation that alter is properly replicated.
What is, if there is, a good way to keep a row in to separate databases (possibly on different machines) in sync?
For clarity. I have multiple mysql databases who share a user table with the same schema. There is a "master" database that has it's own unique schema, but contains the user table which contains all user records. Then there are multiple "slave" databases who for the most part share the same schema, that also contains the user table (with the same schema) with stores a subset of user records.
When an update is made to an instance of the user record in any database I want that change propagated to too all instances of that user record in all databases it is in.
I'm using mysql, php5.3, and doctrine 1.2.x for an orm, running on ubuntu VPS servers.
Don't try to do this using PHP or Doctrine: look to MySQL replication to keep the slave database tables in synch with the master
It sounds like you're looking for MySQL replication and in specific, the replicate-do-table configuration option to restrict the slave databases to only caring about specific table(s) from the master.