I have a MySQL hosting and capacity planning question. I would like to know the minimum hosting requirements to host a MySQL database of the type and size described below:
Background: I have a customer in the finance industry who has bought a bespoke software CMS platform written in PHP with a MySQL database.
Their current solution does not have any reports, and the software vendor who provided it only allows them to use some PHP pages to export the entire contents of tables which the customer then has to manually manipulate in Excel to obtain their business reporting.
The vendor will not allow them access to their live database in order
to run Crystal Reports saying that this is a risk to the database, preferring them to purchase an expensive database replication solution; so the
customer continues to perform tedious manual exports of entire tables
every day.
The database: The database is currently 90MB in size and a custom 9 month old PHP solution sits on top of it. The customer has no access to this as it is hosted by their current vendor. There are 43 tables in total, of which one - a whopping big log table uses up 99% of the database size.
The top four tables sizes containing the business data are tiny tables;
34.62 MB
13.79 MB
8.46 MB
7.59 MB
The vast majority of the tables are simple look-up tables for data values and have only a few rows.
The largest table in the database, however, is a big-ass log table which is 1400MB in size. This table alone accounts for over 99.9% of the total database size.
The question: Considering that the solution is (log table notwithstanding) very small, with only a few staff members making data entry via some simple PHP forms, is there a realistic problem with running Crystal Reports against such a database in production? Bearing in mind that there are times during the day - the majority of the day in fact - when this database is simply not being used. Lunchtimes for example and out of hours.
The vendor maintains that there is a fundamental risk to the business to query live data and that running Crystal Reports against this database could cause it to "crash the live db and the business loses operations".
The customer is keen to have a live dashboard too; which could be written with a very small SQL query to aggregate some numbers from those small tables listed above.
I usually work with SQL Server and Oracle and I have absolutely no qualms about allowing a Crystal Report or running a view to populate a UI with some real time data from the live database - especially a database this small; after all what is the database for if one cannot SELECT from it now and again?
Is it necessary, to avoid "hanging the server" and to "avoid querying when other operations are occurring on the server", to replicate this MySQL database to a second, reporting database? In my experience, the need to do this only applies to sensitive, security-risk or databases with high transactional volumes.
System usage: The system is heavily reliant on scheduled CRON jobs every half hour. There may be 500 users per week each logging on and entering some data (but not much data - see table sizes above).
Any comments are warmly welcome.
Thanks for your time.
1) You need 2 $5 digital ocean servers.
2) "crash the live db and the business loses operations". Is absolutely false. They are idiots. What they are likely hiding is the poor structure of their database. They likely have 1 table architecture for all of their clients only separating from a client_id. Giving access to the table would give access to all of the client data which is why they force a giant replication solution so they can make sure you are only getting YOUR data.
3) Is it necessary, to avoid "hanging the server" and to "avoid querying when other operations are occurring on the server"? Yes it is.
4) to replicate this MySQL database to a second, reporting database? Yes this is good practice as you can setup fail over in the event that the worst happens. If you are really paranoid you can setup remote fail over from different companies. seeing as how this is in the financial sector I am pretty sure you want that.
5) In my experience, the need to do this only applies to sensitive, security-risk or databases with high transactional volumes. In my experience it is always good to have your data backed up because sh*t happens in life and usually when you least expect it.
As for your real-time usage. Assuming the database is structured properly with indexes and using InnoDB you should have minimal issues supporting 100 requests per second, so I think your 500 a week user problem is something to not worry about.
Like i had mentioned what you likely want is 2 servers at different providers, likely the cheapest instances you can get since you don't need a huge amount of space or resources. You can setup DNS to make 1 the primary and 1 the replication slave, then in a disaster scenario change the DNS and make the other one the master.
I hope this helps.
Related
I have a question regarding databases and performances, so let me explain the situation.
The application - to be build - has the following set-up:
A group, with under that group, users.
Data / file-locations, (which is used to search through), estimated that one group can easily reach one million "search" terms.
Now, groups can never look at each other's data, and users can only look at the data which belongs to their group.
The only thing they should have in common is, some place to send error logs to (maybe, not even necessary).
Now in this situation, would you create a new database per group, or always limit your search results with a query, which will take someones user-group-id into account?
Now my idea was to just create a new Database, because you do not need to limit your query, every single time and it will keep the results to search through lower (?) but is that really necessary or is, even on over a million records, a "where groupid = 1" fast enough to not notice a decrease in performance.
This is the regular multi-tenant SaaS Architecture problem, which has been discussed at length, and the solution always varies according to your own situation. Here is one example of this discussion that I will just link to instead of copy-paste since all of it is worth a read: Multi-tenant PHP SaaS - Separate DB's for each client, or group them?
In addition to that I would like to add some more high level considerations:
Are there any legal requirements regarding the storage of your user's data? Some businesses operate in a regulatory environment where they are not allowed to store their data in a shared environment, quite common in the financial and medical industries.
Will you offer the same security (login method, data storage encryption), backup/restore service, geolocation redundancy and up-time guarantee to all users?
Are there any users who are willing to pay extra to have their data stored in a separate environment?
Are there any users who will potentially have requirements that are not compatible with the standard product that you will be offering? If so will you try to accommodate them? Note that occasionally there is some big customer that comes along and offers a lot of cash for a special treatment.
What is a separate environment? Is it a separate database, a separate virtual machine, a separate physical machine, a machine managed by the customer?
What parts of your application is part of each environment (hardware configuration, network config, database, source code, binaries, encryption certificates, etc)?
Will there be some heavy users that may produce loads on your application that will negatively impact the performance for the smaller users?
If you go for all users in one environment then is there a possibility that you in the future will create a separate environment for some customer? If so this will impact where you put shared data, eg configuration data like tax rates, and exchange rate data, etc.
I hope this helps.
Performance isn't really your problem, maintaining and data security is. If you have a lot of databases, you will have more to maintain. Not only backups but connection strings, patches, schema updates on release and so on. Multiple databases also suggests that you will have multiple PHP sites too. That will gradually get more expensive as the number of groups grows.
If you have one database then you need to ensure that every query contains the group id before it can run.
Database tables can be very, very large if you choose your indexes and constraints carefully. If you are performing joins against very large tables then it will be slow but a simple lookup, where you have an index on the group column should be fast enough.
If you were to share a single database, would you ever move a group out of it? If that's a possibility then split the databases now. If you are going to have one PHP site then I would recommend a single database with a group column.
You'll have to bear with me here for possibly getting some of the terminology slightly wrong as I wasn't even aware that this fell into the whole 'multi-tenant' 'software as a service' category, but here it does.
I've developed a membership system (in PHP) for a client. We're now looking at offering it as a completely hosted solution for our other clients, providing a subdomain (or even their own domain).
The options I seem to have on the table, as far as data storage goes are:
Option 1 - Store everything in 1 big database, and have a 'client_id' field on the tables that need it (there would be around 30 tables that it would apply to), and have a 'clients' table storing their main settings, details, etc and the domain to map to them. This then just sets a globally accessible variable containing their individual client id - I'd obviously have to modify every single query to check for the client_id column.
Option 2 - Have a master table with the 'shared reference' tables, and the 'clients' table. Then have 'blocks' of other databases, which each contain, say 10 clients. The client would get their own database tables, prefixed with their client ID. This adds a little bit of security to protect against seeing other client data if something went really wrong.
Option 3 - Exactly the same as option 2, except you have 1 database for each and every client, completely isolating them from other clients, and theoretically providing a bit more protection that if 1 client's tables were hacked or otherwise damaged, it wouldn't affect anyone else. The biggest downside is that when deploying a new client, an entire database, user and password need setting up, etc. Could this possibly also cause a fair amount of overhead, or would it be pretty much the same as if you had everyone in one database?
A few points as well - some of these clients will have 5000+ 'customers' along with all the details for those customers - this is why option 1 may be a bit of an issue - if I've got 100 clients, that could equal over half a million rows in 1 table.
Am I correct in thinking Option 3 would be the best way to go in a situation where security of customer data (and payment information) is key. From recommendations I've had, a few people have said to go with option 1 because 'its easier' however I really don't see it that way. I see it as a potential bottleneck down the line, as surely I can move clients around much easier if they have their own database.
(FYI The system is PHP based with MySQL)
Option 3 is the most scalable. While at first it may seem more complicated, it can be completely automated and will save you headaches on the future. You can also scale more efficiently by having client databases on multiple servers for increased performance.
I agree with Ozzy - I did this for an online database product. We had one master database that basically had a glorified user table. Each customer had their own database. What was great about this is that I could move one customers database from server A to server B easily [mysql] and could do so with command line tools in a pinch. Also doing maintenance on large tables, dropping/adding indexes can really screw up your application, especially if say, adding an index locks the table [mysql]. It affects everyone. With, presumably, smaller databases you are more immune to this and have more options when you need to roll out schema level changes. I just like the flexibility.
When many years ago I designed a platform for building SaaS applications in PHP, I opted for the third option: multi tenant code and single tenant databases.
In my experience, that is the most scalable option, but it also needs a set of scripts to propagate changes when updating code, DB schemes, enabling applications to a tenant, etc.
So a lot of my effort went in building a component based, extensible engine to fully automate all those tasks and minimize system administration stuff. I strongly advise to build such an architecture if you want to adopt the third option.
We are building a social website using PHP (Zend Framework), MySQL, server running Apache.
There is a requirement where in dashboard the application will fetch data for different events (there are about 12 events) on which this dashboard for user will be updated. We expect the total no of users to be around 500k to 700k. While at one time on average about 20% users would be online (for peak time we expect 50% users to be online).
So the problem is the event data as per our current design will be placed in a MySQL database. I think running a few hundred thousands queries concurrently on MySQL wouldn't be a good idea even if we use Amazon RDS. So we are considering to use both DynamoDB (or Redis or any NoSQL db option) along with MySQL.
So the question is: Having data both in MySQL and any NoSQL database would give us this benefit to have this power of scalability for our web application? Or we should consider any other solution?
Thanks.
You do not need to duplicate your data. One option is to use the ElastiCache that amazon provides to give your self in memory caching. This will get rid of your database calls and in a sense remove that bottleneck, but this can be very expensive. If you can sacrifice rela time updates then you can get away with just slowing down the requests or caching data locally for the user. Say, cache the next N events if possible on the browser and display them instead of making another request to the servers.
If it has to be real time then look at the ElastiCache and then tweak with the scaling of how many of them you require to handle your estimated amount of traffic. There is no point in duplicating your data. Keep it in a single DB if it makes sense to keep it there, IE you have some relational information that you need and then also have a variable schema system then you can use both databases, but not to load balance them together.
I would also start to think of some bottle necks in your architecture and think of how well your application will/can scale in the event that you reach your estimated numbers.
I agree with #sean, there’s no need to duplicate the database. Have you thought about a something with auto-scalability, like Xeround. A solution like that can scale out automatically across several nodes when you have throughput peaks and later scale back in, so you don’t have to commit to a larger, more expansive instance just because of seasonal peaks.
Additionally, if I understand correctly, no code changes are required for this auto-scalability. So, I’d say that unless you need to duplicate your data on both MySQL and NoSQL DB’s for reasons other than scalability-related issues, go for a single DB with auto-scaling.
I'm going to try to make this as brief as possible while covering all points - I work as a PHP/MySQL developer currently. I have a mobile app idea with a friend and we're going to start developing it.
I'm not saying it's going to be fantastic, but if it catches on, we're going to have a LOT of data.
For example, we'd have "clients," for lack of a better term, who would have anywhere from 100-250,000 "products" listed. Assuming the best, we could have hundreds of clients.
The client would edit data through a web interface, the mobile interface would just make calls to the web server and return JSON (probably).
I'm a lowly cms-developing kinda guy, so I'm not sure how to handle this. My question is more or less about performance; the most I've ever seen in a MySQL table was 340k, and it was already sort of slow (granted it wasn't the best server either).
I just can't fathom a table with 40 million rows (and potential to continually grow) running well.
My plan was to have a "core" database that held the name of the "real" database, so the user would come in and try to access a client's data, it would go to the core database and figure out which database to get the information from.
I'm not concerned with data separation or data security (it's not private information)
Yes, it's possible and my company does it. I'm certainly not going to say it's smart, though. We have a SAAS marketing automation system. Some client's databases have 1 million+ records. We deal with a second "common" database that has a "fulfillment" table tracking emails, letters, phone calls, etc with over 4 million records, plus numerous other very large shared tables. With proper indexing, optimizing, maintaining a separate DB-only server, and possibly clustering (which we don't yet have to do) you can handle a LOT of data......in many cases, those who think it can only handle a few hundred thousand records work on a competing product for a living. If you still doubt whether it's valid, consider that per MySQL's clustering metrics, an 8 server cluster can handle 2.5million updates PER SECOND. Not too shabby at all.....
The problem with using two databases is juggling multiple connections. Is it tough? No, not really. You create different objects and reference your connection classes based on which database you want. In our case, we hit the main database's company class to deduce the client db name and then build the second connection based on that. But, when you're juggling those connections back and forth you can run into errors that require extra debugging. It's not just "Is my query valid?" but "Am I actually getting the correct database connection?" In our case, a dropped session can cause all sorts of PDO errors to fire because the system no longer can keep track of which client database to access. Plus, from a maintainability standpoint, it's a scary process trying to push table structure updates to 100 different live database. Yes, it can be automated. But one slip up and you've knocked a LOT of people down and made a ton of extra work for yourself. Now, calculate the extra development and testing required to juggle connections and push updates....that will be your measure of whether it's worthwhile.
My recommendation? Find a host that allows you to put two machines on the same local network. We chose Linode, but who you use is irrelevant. Start out with your dedicated database server, plan ahead to do clustering when it's necessary. Keep all your content in one DB, index and optimize religiously. Finally, find a REALLY good DB guy and treat him well. With that much data, a great DBA would be a must.
I am creating an application that utilizes MySQL and PHP. My current web hosting provider has a MySQL database size limitation of 1 GB, but I am allowed to create many 1 GB databases. Even if was able to find another web hosting provider that allowed larger databases, I wonder how is data integrity and speed affected by larger databases? Is it better to keep databases small in terms of disk size? In other words, what is the best practice method of storing the same data (all text) from thousands of users? I am new to database design and planning. Eventually, I would imagine that a single database with data from thousands of users would grow to be inefficient and optimally the data should be distributed among smaller databases. Do I have this correct?
On a related note, how would my application know when to create another table (or switch to another table that was manually created)? For example, if I had 1 database that filled up with 1 GB of data, I would want my application to continue working without any service delays. How would I control the input of data from 1 table to a second, newly created database?
Similarly, if a user joins the website in 2011 and creates 100 records of information, and thousands of other users do the same, and then the 1 GB database becomes filled. Later on, that original user adds an additional 100 records that are created in another 1 GB database. How would my PHP code know which database to query for the 2 sets of 100 records? Would this be managed automatically in some way on the MySQL end? Would it need to be managed in the PHP code would IF/THEN/ELSE statements? Is this a service that some web hosting providers offer?
This is a very abstract question and I'm not sure the generic stackoverflow is the right place to do it.
In any case. What is the best practice method of storing? How about: in a file on disk. Keep in mind that a database is just a glorified file that has fancy 'read' and 'write' commands.
Optimization is hard, you can only ever trade things. CPU for memory usage, read speed for write speed, bulk data storage or speed. (Or get a better host provider and make your databases as large as you want ;) )
To answer your second question, if you do go with your database approach you will need to set up some system to 'migrate' users from a database to another if one gets full. If you reach 80% of 1GB, start migrating users.
Detecting the size of a database is a tricky problem. You could, I suppose look at the RAW files on disk to see how big they are, but perhaps there are more clever ways.
I would suggest using SQLite will the best option in your case. It supports 2 terabytes (2^41 bytes) database and best part is that it requires no server side installation. So it is compatible everywhere. All you need is a library to work with SQLite database.
You can also choose your host without looking on what databases and sizes do they support.