I'm integrating punBB into my existing site. Is it better to continue using one DB and add the new tables into my current setup or is it better to maintain a second DB for the forum. I guess my main concern is performance and maintainability. It seems easier to maintain just one DB but are there performance gains or losses with a second DB?
Assuming punbb does like every other forum and prefixes the table names (eg. punbb_users, punbb_messages, etc.) and you're not going to have any table name collisions, one database should be fine.
As far as performance, that depends on how integrated your setup is. Opening another connection to a separate database may be a little more overhead, however re-using the same connection for normal site intricacies and the forum itself would be more efficient. Once thing you may question is how big your site is and how much traffic it does as not every circumstance makes a new connection negligible.
If it makes sense logically and everything is self-contained, I would probably go with one database. it also makes database backups/restores easier as it's all one entity, but if you're on a hosted provider that may not matter as much to you.
it depends on flavor. Some will prefer same Database and put different prefixes to tables something like punbb_, joomla_ et al so that they can have single connection object and have things in place. This make centralizing login an easier task as they can somehow share one users table. However, if they don't share anything (other than being owned by same person and same server may be) I suggest you put them separate. That will easy backup and make a good organization. But as I said, preference matter.
One note though, if you share the database, be careful not to drop existing tables accidentally, so backup everything. I know it might be in your head but just a remainder!
Related
I'm going to try to make this as brief as possible while covering all points - I work as a PHP/MySQL developer currently. I have a mobile app idea with a friend and we're going to start developing it.
I'm not saying it's going to be fantastic, but if it catches on, we're going to have a LOT of data.
For example, we'd have "clients," for lack of a better term, who would have anywhere from 100-250,000 "products" listed. Assuming the best, we could have hundreds of clients.
The client would edit data through a web interface, the mobile interface would just make calls to the web server and return JSON (probably).
I'm a lowly cms-developing kinda guy, so I'm not sure how to handle this. My question is more or less about performance; the most I've ever seen in a MySQL table was 340k, and it was already sort of slow (granted it wasn't the best server either).
I just can't fathom a table with 40 million rows (and potential to continually grow) running well.
My plan was to have a "core" database that held the name of the "real" database, so the user would come in and try to access a client's data, it would go to the core database and figure out which database to get the information from.
I'm not concerned with data separation or data security (it's not private information)
Yes, it's possible and my company does it. I'm certainly not going to say it's smart, though. We have a SAAS marketing automation system. Some client's databases have 1 million+ records. We deal with a second "common" database that has a "fulfillment" table tracking emails, letters, phone calls, etc with over 4 million records, plus numerous other very large shared tables. With proper indexing, optimizing, maintaining a separate DB-only server, and possibly clustering (which we don't yet have to do) you can handle a LOT of data......in many cases, those who think it can only handle a few hundred thousand records work on a competing product for a living. If you still doubt whether it's valid, consider that per MySQL's clustering metrics, an 8 server cluster can handle 2.5million updates PER SECOND. Not too shabby at all.....
The problem with using two databases is juggling multiple connections. Is it tough? No, not really. You create different objects and reference your connection classes based on which database you want. In our case, we hit the main database's company class to deduce the client db name and then build the second connection based on that. But, when you're juggling those connections back and forth you can run into errors that require extra debugging. It's not just "Is my query valid?" but "Am I actually getting the correct database connection?" In our case, a dropped session can cause all sorts of PDO errors to fire because the system no longer can keep track of which client database to access. Plus, from a maintainability standpoint, it's a scary process trying to push table structure updates to 100 different live database. Yes, it can be automated. But one slip up and you've knocked a LOT of people down and made a ton of extra work for yourself. Now, calculate the extra development and testing required to juggle connections and push updates....that will be your measure of whether it's worthwhile.
My recommendation? Find a host that allows you to put two machines on the same local network. We chose Linode, but who you use is irrelevant. Start out with your dedicated database server, plan ahead to do clustering when it's necessary. Keep all your content in one DB, index and optimize religiously. Finally, find a REALLY good DB guy and treat him well. With that much data, a great DBA would be a must.
If you have two databases on the same host, one called blog and one called forum, it seems like you can access both using only one database handle? (in PDO)
$dbh=new PDO("mysql:host=$dbHost;dbname=blog", $dbUser, $dbPassword);
This handle is for the database blog, although you can also perform operations on forum using $dbh if you write something like
SELECT website.tableName.fieldName
My questions are:
Is the only reason why you have to specify dbname in $dbh for letting you omit the blog.tableName.fieldName part?
Since my website has two databases, would there be any pros or cons of only using one database handle, rather than creating two handles (obviously one for blog and one for forum)? Possible performance difference?
Does creating a database handle consume any server resources?
It is usually good practice to keep database specific user on any app you make. I would go as far as calling it necessity. That is the reason of keeping the name of the database necessary in the connection. (Hint for reason behind this: What if someone got your dbms password for one table somehow?)
I am not very good at this but I do not think its a good idea to keep two separate databases when one can do. Like in your case, you are not using master-slave or anything. So unless you have some physical limit you are trying to make up for, make them into 1 database (use prefixes for table names to avoid name collisions)
The reason for the previous point comes with this one. Keeping one user for a database or some people even keep two for strange and to some extent justifiable reasons is a safety measure you should follow. For multiple users, you need to make multiple connections, which means for every page load you will be connecting to the dbms twice! simple math 2x load (yes, it eats resources, every single line of code does) Simplifying it, think of a man who needs to walk to the grocery store for everything you ask for, gets only 1 thing at a time. if you give him 2 different grocery stores, the man will need 2x time and energy to do the same work.
Yes, you can omit it. Or switch with running USE databasename;.
Use one handle, seems a waste to make double the connections.
Yes, hence (2).
I see programmers putting a lot of information into databases that could otherwise be put in a file that holds arrays. Instead of arrays, they'll use many tables of SQL which, I believe, is slower.
CitrusDB has a table in the database called "holiday". This table consists of just one date column called "holiday_date" that holds dates that are holidays. The idea is to let the user add holidays to the table. Citrus and the programmers I work with at my workplace will prefer to put all this information in tables because it is "standard".
I don't see why this would be true unless you are allowing the user, through a user interface, to add holidays. I have a feeling there's something I'm missing.
Sometimes you want to design in a bit of flexibility to a product. What if your product is released in a different country with different holidays? Just tweak the table and everything will work fine. If it's hard coded into the application, or worse, hard coded in many different places through the application, you could be in a world of pain trying to get it to work in the new locale.
By using tables, there is also a single way of accessing this information, which probably makes the program more consistent, and easier to maintain.
Sometimes efficiency/speed is not the only motivation for a design. Maintainability, flexibility, etc are very important factors.
The main advantage I have found of storing 'configuration' in a database, rather than in a property file, or a file full of arrays, is that the database is usually centrally stored, whereas a server may often be split across a farm of several, or even hundreds of servers.
I have implemented, in a corporate environment, such a solution, and the power of being able to change configuration at a single point of access, knowing that it will immediately be propagated to all servers, without the concern of a deployment process is actually very powerful, and one that we have come to rely on quite heavily.
The actual dates of some holidays change every year. The flexibility to update the holidays with a query or with a script makes putting it in the database the easiest way. One could easily implement a script that updates the holidays each year for their country or region when it is stored in the database.
Theoretically, databases are designed and tuned to provide faster access to data than doing a disk read from a file. In practice, for small to mid-sized applications this difference is minuscule. Best practices, however, are typically oriented at larger scale. By implementing best practices on your small application, you create one that is capable of scaling up.
There is also the consideration of the accessibility of the data in terms of other aspects of the project. Where is most of the data in a web-based application? In the database. Thus, we try to keep ALL the data in the database, or as much as is feasible. That way, in the future, if you decide that now you need to join the holiday dates again a list of events (for example), all the data is in a single place. This segmenting of disparate layers creates tiers within your application. When each tier can be devoted to exclusive handling of the roles within its domain (database handles data, HTML handles presentation, etc), it is again easier to change or scale your application.
Last, when designing an application, one must consider the "hit by a bus principle". So you, Developer 'A', put the holidays in a PHP file. You know they are there, and when you work on the code it doesn't create a problem. Then.... you get hit by a bus. You're out of commission. Developer 'B' comes along, and now your boss wants the holiday dates changed - we don't get President's Day off any more. Um. Johnny Next Guy has no idea about your PHP file, so he has to dig. In this example, it sounds a little trivial, maybe a little silly, but again, we always design with scalability in mind. Even if you KNOW it isn't going to scale up. These standards make it easier for other developers to pick up where you left off, should you ever leave off.
The answer lays in many realms. I used to code my own software to read and write to my own flat-file database format. For small systems, with few fields, it may seem worth it. Once you learn SQL, you'll probably use it for even the smallest things.
File parsing is slow. String readers, comparing characters, looking for character sequences, all take time. SQL Databases do have files, but they are read and then cached, both more efficiently.
Updating & saving arrays require you to read all, rebuild all, write all, save all, then close the file.
Options: SQL has many built-in features to do many powerful things, from putting things in order to only returning x through y results.
Security
Synchronization - say you have the same page accessed twice at the same time. PHP will read from your flatfile, process, and write at the same time. They will overwrite each other, resulting in dataloss.
The amount of features SQL provides, the ease of access, the lack of things you need to code, and plenty other things contribute to why hard-coded arrays aren't as good.
The answer is it depends on what kind of lists you are dealing with. It seems that here, your list consists of a small, fixed set of values.
For many valid reasons, database administrators like having value tables for enumerated values. It helps with data integrity and for dealing wtih ETL, as two examples for why you want it.
At least in Java, for these kinds of short, fixed lists, I usually use Enums. In PHP, you can use what seems to be a good way of doing enums in PHP.
The benefit of doing this is the value is an in-memory lookup, but you can still get data integrity that DBAs care about.
If you need to find a single piece of information out of 10, reading a file vs. querying a database may not give a serious advantage either way. Reading a single piece of data from hundreds or thousands, etc, has a serious advantage when you read from a database. Rather than load a file of some size and read all the contents, taking time and memory, querying from the database is quick and returns exactly what you query for. It's similar to writing data to a database vs text files - the insert into the database includes only what you are adding. Writing a file means reading the entire contents and writing them all back out again.
If you know you're dealing with very small numbers of values, and you know that requirement will never change, put data into files and read them. If you're not 100% sure about it, don't shoot yourself in the foot. Work with a database and you're probably going to be future proof.
This is a big question. The short answer would be, never store 'data' in a file.
First you have to deal with read/write file permission issues, which introduces security risk.
Second, you should always plan on an application growing. When the 'holiday' array becomes very large, or needs to be expanded to include holiday types, your going to wish it was in the DB.
I can see other answers rolling in, so I'll leave it at that.
Generally, application data should be stored in some kind of storage (not flat files).
Configuration/settings can be stored in a KVP storage (such as Redis) then access it via REST API.
I am writing a PHP application in ZF. Customers will use it to sell their products to final customers. Customers will host their application on my server or they could use their own. Most of them will host this application on my server.
I could design one database for all customers at once, so every customer will use the same database, but of course products etc. will be assigned to particular customer. Trivial.
I could use separate database for every customer, so the database structure will be simpler. I will then probably use separate subdomains and maybe even file location, but that is just a detail.
Which solution will have better performance and how big will be the difference? Which one would you choose?
I would use a separate database for each customer. It makes backup and scaling easier. If you ever get a large customer that needs some custom changes to the schema, you can do it easily.
If one customer needs you to restore their data, with a single database it is trivial. On a shared db, much harder.
And that if large customer ever gets a lot of traffic, you can easily put them on another server with minimal changes.
If one site gets compromised, you don't have all of teh data for everyone in one place, the damage is mitigated to just the site that was hacked.
I'd definitely recommend going with 1 db per customer if possible.
Personally, I would go with multiple databases - i.e. a database for each client.
As I understand it all your clients will be using just an instance of your application so these instances should have their own databases.
If you go with a single database, you are creating a great potential security risk. One client compromising the login details to the db server would automatically compromise data of all your clients.
Also a single security vulnerability (a SQL injection attack) could destroy data of all clients (with multiple dbs you could still have time to fix the security hole and release a patch before all other sites are attacked).
You don't want to have an army of 1000000 mad clients instead of just 1 angry client.
Multiple databases also give you a greater possibility of load balancing (you can have the dbs spread across more servers).
Performance wise you're basically start with a 'sharding' approach. Because of this, the sharding performance strategy will be piece of cake.
The downside is that you could argue you're losing some (undefined) bit of overhead in the duplication.
One pitfall is that you might not notice performance issues in major components as quickly. This is because they are so scattered, so they might not be visible on your radar. Load testing is the way to get ahead of this.
To some extent this is a question of personal opinion. There are pros and cons of both models.
Personally, and because of the "they could use their own" comment, I would go with a seperate database per customer. This gives you
The ability to move customer data around when necessary. For example moving a single customer onto a different servers/setups depending on things like load.
If something goes wrong you only impact one customer and not everybody.
You can spread DB load across multiple DB servers if necessary.
If a customer comes to you with a specific requirement you can more easily cater for this without impact other customers.
From a performance perspective, to be honest I don't think there is any real performace gain in either model. That said this does of course depend on the structure of your DB and the hardware it runs on.
Don't choose multiple databases solution, if your needs can be fulfilled with one database. Because multiple databases will lead to big burden in long run, and your system will become highly complicated and unmanageable as you grow.
Using proper relationship you can go long way
A Client model can have many Products // why multiple databases?
Performance can achieved in either ways, just going multiple dbs will NOT benefit in that direction
I am currently in a debate with a coworker about the best practices concerning the database design of a PHP web application we're creating. The application is designed for businesses, and each company that signs up will have multiple users using the application.
My design methodology is to create a new database for every company that signs up. This way everything is sand-boxed, modular, and small. My coworkers philosophy is to put everyone into one database. His argument is that if we have 1000+ companies sign up, we wind up with 1000+ databases to deal with. Not to mention the mess that doing Business Intelligence becomes.
For the sake of example, assume that the application is an order entry system. With separate databases, table size can remain manageable even if each company is doing 100+ orders a day. In a single-bucket application, tables can get very big very quickly.
Is there a best practice for this? I tried hunting around the web, but haven't had much success. Links, whitepapers, and presentations welcome.
Thanks in advance,
The1Rob
I talked to the database architect from wordpress.com, the hosting service for WordPress. He said that they started out with one database, hosting all customers together. The content of a single blog site really isn't that much, after all. It stands to reason that a single database is more manageable.
This did work well for them until they got hundreds and thousands of customers, they realized that they needed to scale out, running multiple physical servers and hosting a subset of their customers on each server. When they add a server, it would be easy to migrate individual customers to the new server, but harder to separate data within a single database that belongs to an individual customer's blog.
As customers come and go, and some customers' blogs have high-volume activity while others go stale, the rebalancing over multiple servers becomes an even more complex maintenance job. Monitoring size and activity per individual database is easier too.
Likewise doing a database backup or restore of a single database containing terrabytes of data, versus individual database backups and restores of a few megabytes each, is an important factor. Consider: a customer calls and says their data got SNAFU'd due to some bad data entry, and could you please restore the data from yesterday's backup? How would you restore one customer's data if all your customers share a single database?
Eventually they decided that splitting into a separate database per customer, though complex to manage, offered them greater flexibility and they re-architected their hosting service to this model.
So, while from a data modeling perspective it seems like the right thing to do to keep everything in a single database, some database administration tasks become easier as you pass a certain breakpoint of data volume.
I would never create a new database for each company. If you want a modular design, you can create this using tables and properly connected primary and secondary keys. This is where i learned about database normalization and I'm sure it will help you out here.
This is the method I would use. SQL Article
I'd have to agree with your co-worker. Relational databases are designed to handle large amounts of data, and the numbers you're talking about (1000+ companies, multiple users per company, 100+ orders/day) are well within the expected bounds. Separate databases means:
multiple database connections in each script (memory and speed penalty)
maintenance is harder (DB systems generally do not provide tools for acting on databases as a group) so schema changes, backups, and similar tasks will be more difficult
harder to run queries on data from multiple companies
If your site becomes huge, you may eventually need to distribute your data across multiple servers. Deal with that when it happens. To start out that way for performance reasons sounds like premature optimization.
I haven't personally dealt with this situation, but I would think that if you want to do business intelligence, you should aggregate the data into an offline database that you can then run any analysis you want on.
Also, keeping them in separate databases makes it easier to partition across servers (which you will likely have to do if you have 1000+ customers) without resorting to messy replication technologies.
I had a similar question a while back and came to the conclusion that a single database is drastically more manageable. Right now, we have multiple databases (around 10) and it is already becoming a pain to manage especially when we upgrade the code. We have to migrate every single database.
The upside is that the data is segregated cleanly. Due to the sensitivity of our data, this is a good thing, but it does make it quite a bit more difficult to keep up with.
The separate database methodology has a very big advance over the other:
+ You could broke it up into smaller groups, this architecture scales much better.
+ You could make stand alone servers in an easy way.
That depends on how likely your schemas are to change. If they ever have to change, will you be able to safely make those changes to 1000 separate databases? If a scalability problem is found with your design, how are you going to fix it for 1000 databases?
We run a SaaS (Software-as-a-Service) business with a large number of customers and have elected to keep all customers in the same database. Managing 1000's of separate databases is an operational nightmare.
You do have to be very diligent creating your data model and the business objects / reporting queries that access them. One approach you may want to consider is to carry the company ID in every table and ensure that every WHERE clause includes the company ID for the currently logged-in user. If you use a data access layer, you can enforce that condition there.
As you grow large, you can still vertically partition by placing groups of companies on each physical server, e.g. the first 100 companies on Server A, the next 100 companies on Server B.