Managing multiple MySQL 1 GB databases with data input from PHP application

Managing multiple MySQL 1 GB databases with data input from PHP application - php

I am creating an application that utilizes MySQL and PHP. My current web hosting provider has a MySQL database size limitation of 1 GB, but I am allowed to create many 1 GB databases. Even if was able to find another web hosting provider that allowed larger databases, I wonder how is data integrity and speed affected by larger databases? Is it better to keep databases small in terms of disk size? In other words, what is the best practice method of storing the same data (all text) from thousands of users? I am new to database design and planning. Eventually, I would imagine that a single database with data from thousands of users would grow to be inefficient and optimally the data should be distributed among smaller databases. Do I have this correct?
On a related note, how would my application know when to create another table (or switch to another table that was manually created)? For example, if I had 1 database that filled up with 1 GB of data, I would want my application to continue working without any service delays. How would I control the input of data from 1 table to a second, newly created database?
Similarly, if a user joins the website in 2011 and creates 100 records of information, and thousands of other users do the same, and then the 1 GB database becomes filled. Later on, that original user adds an additional 100 records that are created in another 1 GB database. How would my PHP code know which database to query for the 2 sets of 100 records? Would this be managed automatically in some way on the MySQL end? Would it need to be managed in the PHP code would IF/THEN/ELSE statements? Is this a service that some web hosting providers offer?

This is a very abstract question and I'm not sure the generic stackoverflow is the right place to do it.
In any case. What is the best practice method of storing? How about: in a file on disk. Keep in mind that a database is just a glorified file that has fancy 'read' and 'write' commands.
Optimization is hard, you can only ever trade things. CPU for memory usage, read speed for write speed, bulk data storage or speed. (Or get a better host provider and make your databases as large as you want ;) )
To answer your second question, if you do go with your database approach you will need to set up some system to 'migrate' users from a database to another if one gets full. If you reach 80% of 1GB, start migrating users.
Detecting the size of a database is a tricky problem. You could, I suppose look at the RAW files on disk to see how big they are, but perhaps there are more clever ways.

I would suggest using SQLite will the best option in your case. It supports 2 terabytes (2^41 bytes) database and best part is that it requires no server side installation. So it is compatible everywhere. All you need is a library to work with SQLite database.
You can also choose your host without looking on what databases and sizes do they support.

Related

MySQL Database - Capacity planning dilemma, advice please

I have a MySQL hosting and capacity planning question. I would like to know the minimum hosting requirements to host a MySQL database of the type and size described below:
Background: I have a customer in the finance industry who has bought a bespoke software CMS platform written in PHP with a MySQL database.
Their current solution does not have any reports, and the software vendor who provided it only allows them to use some PHP pages to export the entire contents of tables which the customer then has to manually manipulate in Excel to obtain their business reporting.
The vendor will not allow them access to their live database in order
to run Crystal Reports saying that this is a risk to the database, preferring them to purchase an expensive database replication solution; so the
customer continues to perform tedious manual exports of entire tables
every day.
The database: The database is currently 90MB in size and a custom 9 month old PHP solution sits on top of it. The customer has no access to this as it is hosted by their current vendor. There are 43 tables in total, of which one - a whopping big log table uses up 99% of the database size.
The top four tables sizes containing the business data are tiny tables;
34.62 MB
13.79 MB
8.46 MB
7.59 MB
The vast majority of the tables are simple look-up tables for data values and have only a few rows.
The largest table in the database, however, is a big-ass log table which is 1400MB in size. This table alone accounts for over 99.9% of the total database size.
The question: Considering that the solution is (log table notwithstanding) very small, with only a few staff members making data entry via some simple PHP forms, is there a realistic problem with running Crystal Reports against such a database in production? Bearing in mind that there are times during the day - the majority of the day in fact - when this database is simply not being used. Lunchtimes for example and out of hours.
The vendor maintains that there is a fundamental risk to the business to query live data and that running Crystal Reports against this database could cause it to "crash the live db and the business loses operations".
The customer is keen to have a live dashboard too; which could be written with a very small SQL query to aggregate some numbers from those small tables listed above.
I usually work with SQL Server and Oracle and I have absolutely no qualms about allowing a Crystal Report or running a view to populate a UI with some real time data from the live database - especially a database this small; after all what is the database for if one cannot SELECT from it now and again?
Is it necessary, to avoid "hanging the server" and to "avoid querying when other operations are occurring on the server", to replicate this MySQL database to a second, reporting database? In my experience, the need to do this only applies to sensitive, security-risk or databases with high transactional volumes.
System usage: The system is heavily reliant on scheduled CRON jobs every half hour. There may be 500 users per week each logging on and entering some data (but not much data - see table sizes above).
Any comments are warmly welcome.
Thanks for your time.

1) You need 2 $5 digital ocean servers.
2) "crash the live db and the business loses operations". Is absolutely false. They are idiots. What they are likely hiding is the poor structure of their database. They likely have 1 table architecture for all of their clients only separating from a client_id. Giving access to the table would give access to all of the client data which is why they force a giant replication solution so they can make sure you are only getting YOUR data.
3) Is it necessary, to avoid "hanging the server" and to "avoid querying when other operations are occurring on the server"? Yes it is.
4) to replicate this MySQL database to a second, reporting database? Yes this is good practice as you can setup fail over in the event that the worst happens. If you are really paranoid you can setup remote fail over from different companies. seeing as how this is in the financial sector I am pretty sure you want that.
5) In my experience, the need to do this only applies to sensitive, security-risk or databases with high transactional volumes. In my experience it is always good to have your data backed up because sh*t happens in life and usually when you least expect it.
As for your real-time usage. Assuming the database is structured properly with indexes and using InnoDB you should have minimal issues supporting 100 requests per second, so I think your 500 a week user problem is something to not worry about.
Like i had mentioned what you likely want is 2 servers at different providers, likely the cheapest instances you can get since you don't need a huge amount of space or resources. You can setup DNS to make 1 the primary and 1 the replication slave, then in a disaster scenario change the DNS and make the other one the master.
I hope this helps.

large blobs in mysql database are making my website slow

I have website running under drupal 7 CMS with mysql database and i'm facing a problem in the data base because i have to store a lot of large texts in blob type in 3 tables in the current time the size of each table from those 3 tables is about 10 GB.
I use on those 3 tables 'insert' and 'select' query.
although my server is 16 GB RAM but I believe due to the database my website is so slow, what is your suggestions to solve this problem ? How large websites deal with mega data problems ?
I'm thinking to put this 3 tables in another database also in another server ?

The best solution will depend a lot on the nature of your site and exactly what you're looking for, so it's very difficult to give a concise answer here.
One common approach, for sites which aren't extremely latency-sensitive, is to actually store the textual/binary data in another service (e.g., Amazon's S3), and then only keep a key to that service stored in your database. Your application can then perform a database query, retrieve the key, and either send a request to the service directly (if you want to process the BLOB server-side) or instruct the client application to download the file from the service.

Best way to store chat messages and files

I would like to know what do you think about storing chat messages in a database?
I need to be able to bind other stuff to them (like files, or contacts) and using a database is the best way I see for now.
The same question comes for files, because they can be bound to chat messages, I have to store them in the database too..
With thousands of messages and files I wonder about performance drops and database size.
What do you think considering I'm using PHP with MySQL/Doctrine?

I think that it would be OK to store any textual information on the database (names, messages history, etc) provided that you structure your database properly. I have worked for big Web-sites (multi-kilo visits a day) and telecom companies that store information about their users (including their traffic statistics) on the databases that have grown up to hundreds of gigabytes and the applications were working fine.
But regarding binary information like images and files it would be better to store them on the file systems and store only their paths on the database, because it will be cheaper to read them off the disks that to tie a database process to reading a multi-megabyte file.
As I said, it is important that you do several things:
Structure you information properly - it is very important to properly design your database, properly divide it into tables and tables into fields with your performance goals in mind because this will form the basis for your application and queries. Get that wrong and your queries will be slow.
Make proper decisions on table engines pertinent to every table. This is an important step because it will greatly affect the performance of your queries. For example, MyISAM blocks reading access to the table while it is being updated. That will be a problem for a web application like a social networking or a news site because im many situations your users will basically have to wait for a information update to be completed before the will see a generated page.
Create proper indexes - very important for performance, especially for applications with rapidly growing big databases.
Measure performance of your queries as data grows and look for the ways to improve it - you will always find bottlenecks that have to be removed, this is an ongoing non-stop process. Every popular web application has to do it.

I think a NoSQL database like CouchDB or MongtoDB is an option. You can also store the files separate and link them via a known filename but it depends on your system architecture.

Mysql replication - is it worth it?

Replication
I have an app that Is polling data from a large number of data feeds. It processes thousands of records per day and this number is ever increasing. The data is stored in Mysql. 
I then have a website that utilises this data.
I'm trying to build my environment with future in mind. 
 I thought of mysql replication so that the website can use it's own database on a different server and get bogged down by the thousands of write commands that are happening on the main database. 
I am having difficulty getting this setup, despite mysql reporting it's all working fine. 
I then started think - is there not a better way ?
From what I understand mysql sends the write command to the slave database as the master. 
Does this not mean that what I am trying to avoid is just happening anyway?
Does this mean that the slave database will suffer thousands of writes 
I am a one man band, doing this venture with my own money so I need to do this a cheapest way. I am getting a bit lost !
I have a dedicated server,
A vps
Using Php5, mysql 5 in a lamp stack.
I cannot begin to tell you how much I would appreciate some guidance!

If the slaves are a 1:1 clone of the master, than all writes to the master MUST be propagated down to the slaves. Otherwise replication would be useless.
Thousands of records per day is actually very small. Assuming the same processing time for each, and doing 5000 records, you'd have 86400/5000 = 17.28 seconds per record. That's very minimal write overhead.
If you were doing millions of records a day, THEN you'd have a write bottleneck.

I would split this in three layers.
Data Feed layer. Data read from the feeds is preprocessed and posted into a queue. This layer has a temporary queue that serves also as a temporary storage, a buffer to allow all data feed to post its data. I'd use a Message Queue System. It's fast and reliable.
Data Store layer. This layer reads from the queue, maybe processes someway the data read, and stores the data in the database.
Data Analysis layer. This is your "slave" database. It's a data warehouse. It periodically does ETL (extract, transform and load) data from the Data Store layer to this secondary database.
This layeread approach allows you isolate concerns (speed, reliability, security) and implementation details; and allows for future scalability.

Replication is literally what the word suggest - replicating queries on another machine.
MySQL creates a log that's filled with queries that were used to create the dataset on the original machine (master) and sends it to the slave(s) that read the log and re-execute those queries.
Basically, what you want is to increase your write ratio. That's achievable trough using different engines, for example TokuDB is one of them (however it isn't free, but you are allowed to store 50gb of user data for free and use it).
What you want (for the moment) is fast HDD subsystem more than a monolithic write-scalable storage system. InnoDB is capable of achieving a lot of queries per second on properly configured machine with sufficient hardware. I am not sure about pricing, but SSD and 4-8 gigs of ram shouldn't be that expensive. As Marc. B said - until you reach millions of records per day, you don't have to worry about scaling reads and writes trough replication.

You say you have an app "polling" your data from datafeeds. Does that mean you are doing full text searches? I'm making an assumption here in that you are batch processing date feeds and then querying that. If that is the case I'd offload all your fulltext queries to something like Solr. It actually isn't too time consuming to setup, depending on the size of your DB you can get away with running it on a fairly small VPS or on your dedicated, and best yet the difference is search speed is incredible. I've had full text mysql queries that would take 20 minutes to run be done in solr in under a second.
Just make sure you use a try statement in the event your solr instance goes down.

Why would you use two (or more) databases instead of one?

Many database libraries come setup for multiple database connections - but I've never actually known of an scripting application that needed to connect to two databases during it's run. (compiled, daemon-running languages are a different matter).
I understand having database slaves so that you can spread the load out - but usually on startup only one of them is chosen to handle that scripts needs.
So why would a PHP or Ruby application need to connect to more than one database? Or rather, why would you split your data up among several databases?
The only thing I can think of is bad design from a slowly evolving system that started off in multiple separate parts.

Are you talking about different physical database servers or different databases in the "schema" sense?
Regarding physical servers, If you're using MySQL replication you might write to a master and always read from a slave. This helps split the load among each database.

The simple answer is "scalability".
The ready availability of replication and clustering in a number of database products makes multiple database use a definite 'this must be possible'. Any decent ORM should know how to connect to multiple databases as required.
But even when the main application doesn't connect to more than one, there will often be other needs that do. Report generation, either scripted or ad-hoc, often involve queries that run for a long time. These are best run on database replicants dedicated (and configured) for these queries so they don't disrupt the main application.
Another good use is a type of scripted processing. Many apps will have a regular process that needs to rummage through a large part of the database. Whislt updates obviously have to go to the master, the big read queries can be run off a replicant.
Of course, the obvious need is simple performance. I oversaw a webapp and database that grew from surviving comfortably on one MySQL databse on a 32-bit dual-core machine with 3Gb to needing two 8-core 64-bit servers with 8Gb. Once it reached this stage, it relied on the database handler directing traffic to both servers. We had a window of about 50 minutes in a day where it could survive on just one database.

I have a Ruby application that connects to multiple databases. One database contains user login credentials (which is shared between several other projects). Another database contains archived data that my application tracks and compares (that only my application accesses). Another database contains data regarding physical machine resources which my application uses to generate new data (these resources are used by several different applications). By splitting the data into multiple databases, different applications only access the data that they need to be accessing.

It is all too frequently the case that some of the data you need is stored in The Wrong Database. Sometimes it's personnel records in a PeopleSoft (Oracle) database. Maybe it's Enterprise CRM data on Informix. Or some departmental database stored in MS SQL Server. Whatever it is, it's in a different database, but you still need access (hopefully read-only).
Unless your primary database is magic-based, it isn't going to be able to provide you with remote table access for every other database out there. (Most will only provide remote access to other databases of the same type, eg: MySQL->MySQL.) When that all too frequent situation occurs, you'll have no other option but to have multiple database connections, and be glad that your framework supports it.

I have a site that connects with two databases. One powers the website content (CMS DB) the other drives a web application that runs within the site (large amounts of non-CMS data) In fact, the latter uses replication.
I don't feel that's bad design. If one set of data has no relation to the other, then it makes sense even from a pure organization perspective to house it in a separate DB. Otherwise, people would just put all their tables in one DB.

For added security, I always create two accounts for every database: a read-only account (good for SELECT) and a read-write account (for SELECT, UPDATE, INSERT, DELETE and whatever else I might need). On some pages, I may need to use both accounts, thus I will consume two connections for only one database.

Well, reading from one and writing to another is a very common use case. It's easy and fun to write a data access layer that reads from one connection (reading from the slave), and writes to another (the master). A single script might make multiple reads before writing -- perhaps some lookups are necessary for validation, for instance.
Scripting languages are also frequently used for integration. You might have two off-the-shelf codebases, both of which want to maintain their own database. Your integration code might want to talk to both of them.
In general, you can usually design out of using more than one connection, but in general, I don't see anything fundamentally wrong with using connections to more than one database.

Other reasons to have multiple databases. We have one application that everyone can access. We also have client database that are very differnt from client to client. It is easier to maintain the application that all clients use (and which is maintained by a differnte team) if the client_specific data is separated out to their own databases. It is also easier to move the client to a new server when they become a large enterprise client rather than the smaller clietns who run on a server with many other clients.
Further there are types of data that are transactional and need to be in databases that are set to full recovery mode with full transaction logging. Other data is only populated from imports and does not need transactional logging and which might slow down the system as the log grew enough to handle the 10,000,000 record import. These are often split out to a separate databse so they can be in simple recovery mode as it si not necessary to recover data from the transaction log if there is a problem, it can be easily recoverd by re-running the import.
Then data is split out into datawarehouses which are optimized for data reporting not transactions. Again these reporting databases are usually separate databases (often on separate servers).
Then you have the databases for multiple different COTS applications (we have accounting databases, Credit Card transaction porcessing databases, HR databases, our project management database). A particular website might need to access more than one of these or transfer information from one to the other. Believe me vendors won't let you copy their database structure into one database to rule them all.
We have several hundred databases here on many differnt servers.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.