I will be using mySQL with PHP as a serverside / Database model for my mobile application on both iOS and Android.
There will be loads of syncing with the client side, at a rate we assume could reach 100,000 requests to the server per second. where each one is trying to write or read from the server.
My worry is that, CAN MySQL handle this ? Does it have an internal automatic mechanism that locks the table and prevents others from writing at that exact moment when something else is already writing to the DB ? or I should take care of the myself ?
I think you need at least 64-core server (or few servers) to handle 100k transactions per second with MySQL.
And about transactions - I think you have to learn a lot about databases, table locks, transactions before you start writing your very popular application :)
Related
We are developing an iOS/Android application which downloads large amounts of data from a server.
We're using JSON to transfer data between the server and client devices.
Recently the size of our data increased a lot (about 30000 records).
When fetching this data, the server request gets timed out and no data gets fetched.
Can anyone suggest the best method to achieve a fast transfer of data?
Is there any method to prepare data initially and download data later?
Is there any advantage of using multiple databases in the device(SQLite dbS) and perform parallel insertion into db's?
Currently we are downloading/uploading only changed data (using UUID and time-stamp).
Is there any best approach to achieve this efficiently?
---- Edit -----
i think its not only the problem of mysql records, at peak times multiple devices are connecting to the server to access data, so connections also goes to waiting. we are using performance server. i am mainly looking for a solution to handle this sync in device. any good method to simplify the sync or make it faster using multi threading, multiple sqlite db etc,...? or data compression, using views or ...?
A good way to achieve this would probably be to download no data at all.
I guess you won't be showing these 30k lines at your client, so why download them in the first place?
It would probably be better to create an API on your server which would help the mobile devices to communicate with the database so the clients would only download the data they actually need / want.
Then, with a cache system on the mobile side you could make yourself sure that clients won't download the same thing every time and that content they have already seen would be available off-line.
When fetching this data, the server request gets timed out and no data gets fetched.
Are you talking only about reads or writes, too?
If you are talking about writing access, as well: Are the 30,000 the result of a single insert/update? Are you using a transactional engine like InnoDB, e.g.? If so, Are your queries wrapped in a single transaction? Having auto commit mode enabled can lead to massive performance issues:
Wrap several modifications into a single transaction to reduce the number of flush operations. InnoDB must flush the log to disk at each transaction commit if that transaction made modifications to the database. The rotation speed of a disk is typically at most 167 revolutions/second (for a 10,000RPM disk), which constrains the number of commits to the same 167th of a second if the disk does not “fool” the operating system.
Source
Can anyone suggest the best method to achieve a fast transfer of data?
How complex is your query designed? Inner or outer joins, correlated or non-correlated subqueries, etc? Use EXPLAIN to inspect the efficiency? Read about EXPLAIN
Also, take a look at your table design: Have you made use of normalization? Are you indexing properly?
Is there any method to prepare data initially and download data later?
How do you mean that? Maybe temporary tables could do the trick.
But without knowing any details of your project, downloading 30,000 records on a mobile at one time sounds weird to me. Probably your application/DB-design needs to be reviewd.
Anyway, for any data that need not be updated/inserted directly to the database use a local SQLite on the mobile. This is much faster, as SQLite is a file-based DB and the data doesn't need to be transferred over the net.
Replication
I have an app that Is polling data from a large number of data feeds. It processes thousands of records per day and this number is ever increasing. The data is stored in Mysql.
I then have a website that utilises this data.
I'm trying to build my environment with future in mind.
I thought of mysql replication so that the website can use it's own database on a different server and get bogged down by the thousands of write commands that are happening on the main database.
I am having difficulty getting this setup, despite mysql reporting it's all working fine.
I then started think - is there not a better way ?
From what I understand mysql sends the write command to the slave database as the master.
Does this not mean that what I am trying to avoid is just happening anyway?
Does this mean that the slave database will suffer thousands of writes
I am a one man band, doing this venture with my own money so I need to do this a cheapest way. I am getting a bit lost !
I have a dedicated server,
A vps
Using Php5, mysql 5 in a lamp stack.
I cannot begin to tell you how much I would appreciate some guidance!
If the slaves are a 1:1 clone of the master, than all writes to the master MUST be propagated down to the slaves. Otherwise replication would be useless.
Thousands of records per day is actually very small. Assuming the same processing time for each, and doing 5000 records, you'd have 86400/5000 = 17.28 seconds per record. That's very minimal write overhead.
If you were doing millions of records a day, THEN you'd have a write bottleneck.
I would split this in three layers.
Data Feed layer. Data read from the feeds is preprocessed and posted into a queue. This layer has a temporary queue that serves also as a temporary storage, a buffer to allow all data feed to post its data. I'd use a Message Queue System. It's fast and reliable.
Data Store layer. This layer reads from the queue, maybe processes someway the data read, and stores the data in the database.
Data Analysis layer. This is your "slave" database. It's a data warehouse. It periodically does ETL (extract, transform and load) data from the Data Store layer to this secondary database.
This layeread approach allows you isolate concerns (speed, reliability, security) and implementation details; and allows for future scalability.
Replication is literally what the word suggest - replicating queries on another machine.
MySQL creates a log that's filled with queries that were used to create the dataset on the original machine (master) and sends it to the slave(s) that read the log and re-execute those queries.
Basically, what you want is to increase your write ratio. That's achievable trough using different engines, for example TokuDB is one of them (however it isn't free, but you are allowed to store 50gb of user data for free and use it).
What you want (for the moment) is fast HDD subsystem more than a monolithic write-scalable storage system. InnoDB is capable of achieving a lot of queries per second on properly configured machine with sufficient hardware. I am not sure about pricing, but SSD and 4-8 gigs of ram shouldn't be that expensive. As Marc. B said - until you reach millions of records per day, you don't have to worry about scaling reads and writes trough replication.
You say you have an app "polling" your data from datafeeds. Does that mean you are doing full text searches? I'm making an assumption here in that you are batch processing date feeds and then querying that. If that is the case I'd offload all your fulltext queries to something like Solr. It actually isn't too time consuming to setup, depending on the size of your DB you can get away with running it on a fairly small VPS or on your dedicated, and best yet the difference is search speed is incredible. I've had full text mysql queries that would take 20 minutes to run be done in solr in under a second.
Just make sure you use a try statement in the event your solr instance goes down.
Well, this is the thing. Let's say that my future PHP CMS need to drive 500k visitors daily and I need to record them all in MySQL database (referrer, ip address, time etc.). This way I need to insert 300-500 rows per minute and update 50 more. The main problem is that script would call database every time I want to insert new row, which is every time someone hits a page.
My question, is there any way to locally cache incoming hits first (and what is the best solution for that apc, csv...?) and periodically send them to database every 10 minutes for example? Is this good solution and what is the best practice for this situation?
500k daily it's just 5-7 queries per second. If each request will be served for 0.2 sec, then you will have almost 0 simultaneous queries, so there is nothing to worry about.
Even if you will have 5 times more users - all should work fine.
You can just use INSERT DELAYED and tune your mysql.
About tuning: http://www.day32.com/MySQL/ - there is very useful script (will change nothing, just show you the tips how to optimize settings).
You can use memcache or APC to write log there first, but with using INSERT DELAYED MySQL will do almost same work, and will do it better :)
Do not use files for this. DB will serve locks much better, than PHP. It's not so trivial to write effective mutexes, so let DB (or memcache, APC) do this work.
A frequently used solution:
You could implement an counter in memcached which you increment on an visit, and push an update to the database for every 100 (or 1000) hits.
We do this by storing locally on each server to CSV, then having a minutely cron job to push the entries into the database. This is to avoid needing a highly available MySQL database more than anything - the database should be able to cope with that volume of inserts without a problem.
Save them to a directory-based database (or flat file, depends) somewhere and at a certain time, use a PHP code to insert/update them into your MySQL database. Your php code can be executed periodically using Cron, so check if your server has Cron so that you can set the schedule for that, say every 10 minutes.
Have a look at this page: http://damonparker.org/blog/2006/05/10/php-cron-script-to-run-automated-jobs/. Some codes have been written in the cloud and are ready for you to use :)
One way would be to use Apache access.log. You can get a quite fine logging by using cronolog utility with apache . Cronolog will handle the storage of a very big number of rows in files, and can rotate it based on volume day, year, etc. Using this utility will prevent your Apache from suffering of log writes.
Then as said by others, use a cron-based job to analyse these log and push whatever summarized or raw data you want in MySQL.
You may think of using a dedicated database (or even database server) for write-intensive jobs, with specific settings. For example you may not need InnoDB storage and keep a simple MyIsam. And you could even think of another database storage (as said by #Riccardo Galli)
If you absolutely HAVE to log directly to MySQL, consider using two databases. One optimized for quick inserts, which means no keys other than possibly an auto_increment primary key. And another with keys on everything you'd be querying for, optimized for fast searches. A timed job would copy hits from the insert-only to the read-only database on a regular basis, and you end up with the best of both worlds. The only drawback is that your available statistics will only be as fresh as the previous "copy" run.
I have also previously seen a system which records the data into a flat file on the local disc on each web server (be careful to do only atomic appends if using multiple proceses), and periodically asynchronously write them into the database using a daemon process or cron job.
This appears to be the prevailing optimium solution; your web app remains available if the audit database is down and users don't suffer poor performance if the database is slow for any reason.
The only thing I can say, is be sure that you have monitoring on these locally-generated files - a build-up definitely indicates a problem and your Ops engineers might not otherwise notice.
For an high number of write operations and this kind of data you might find more suitable mongodb or couchdb
Because INSERT DELAYED is only supported by MyISAM, it is not an option for many users.
We use MySQL Proxy to defer the execution of queries matching a certain signature.
This will require a custom Lua script; example scripts are here, and some tutorials are here.
The script will implement a Queue data structure for storage of query strings, and pattern matching to determine what queries to defer. Once the queue reaches a certain size, or a certain amount of time has elapsed, or whatever event X occurs, the query queue is emptied as each query is sent to the server.
you can use a Queue strategy using beanstalk or IronQ
I'm writing a program that runs (24/7) on a Linux server and adds entries to a MySQL database.
The contents of the database are presented on a web interface with PHP and the user should be able to delete entries using the web interface.
Is it possible to access the database from multiple processes at the same time?
Yes, databases are designed for this purpose quite well. You'll want to keep a few things in mind in your designs:
Concurrency and race conditions on database writes.
Performance.
Separate database permissions for separate applications.
Unless you're doing something like accessing the DB using a singleton, the max number of simultaneous mysql connections php will use is limited in your php.ini. I believe it defaults to 100.
Yes multiple users can access the database at the same time.
You should however take care that the data is consistent.
If you create/edit entry with many small sql statements and in the meantime someone useses the web interface this may lead to some errors.
If you have a simple db this should not be a problem, else you should consider using transactions.
http://dev.mysql.com/doc/refman/5.0/en/ansi-diff-transactions.html
Yes and there will not be any problems while trying to delete records in the presence of that automated program which runs 24/7 if you are using the InnoDb engine. This is because transactions happen one at a time, one starts after another has finished and the database is consistent everytime.
This answer How to implement the ACID model for a database has many relevant points.
Read about the ACID Properties of a database. A Mysql database with InnoDb engine will take care of all these things for you and you need not worry about that.
Many database libraries come setup for multiple database connections - but I've never actually known of an scripting application that needed to connect to two databases during it's run. (compiled, daemon-running languages are a different matter).
I understand having database slaves so that you can spread the load out - but usually on startup only one of them is chosen to handle that scripts needs.
So why would a PHP or Ruby application need to connect to more than one database? Or rather, why would you split your data up among several databases?
The only thing I can think of is bad design from a slowly evolving system that started off in multiple separate parts.
Are you talking about different physical database servers or different databases in the "schema" sense?
Regarding physical servers, If you're using MySQL replication you might write to a master and always read from a slave. This helps split the load among each database.
The simple answer is "scalability".
The ready availability of replication and clustering in a number of database products makes multiple database use a definite 'this must be possible'. Any decent ORM should know how to connect to multiple databases as required.
But even when the main application doesn't connect to more than one, there will often be other needs that do. Report generation, either scripted or ad-hoc, often involve queries that run for a long time. These are best run on database replicants dedicated (and configured) for these queries so they don't disrupt the main application.
Another good use is a type of scripted processing. Many apps will have a regular process that needs to rummage through a large part of the database. Whislt updates obviously have to go to the master, the big read queries can be run off a replicant.
Of course, the obvious need is simple performance. I oversaw a webapp and database that grew from surviving comfortably on one MySQL databse on a 32-bit dual-core machine with 3Gb to needing two 8-core 64-bit servers with 8Gb. Once it reached this stage, it relied on the database handler directing traffic to both servers. We had a window of about 50 minutes in a day where it could survive on just one database.
I have a Ruby application that connects to multiple databases. One database contains user login credentials (which is shared between several other projects). Another database contains archived data that my application tracks and compares (that only my application accesses). Another database contains data regarding physical machine resources which my application uses to generate new data (these resources are used by several different applications). By splitting the data into multiple databases, different applications only access the data that they need to be accessing.
It is all too frequently the case that some of the data you need is stored in The Wrong Database. Sometimes it's personnel records in a PeopleSoft (Oracle) database. Maybe it's Enterprise CRM data on Informix. Or some departmental database stored in MS SQL Server. Whatever it is, it's in a different database, but you still need access (hopefully read-only).
Unless your primary database is magic-based, it isn't going to be able to provide you with remote table access for every other database out there. (Most will only provide remote access to other databases of the same type, eg: MySQL->MySQL.) When that all too frequent situation occurs, you'll have no other option but to have multiple database connections, and be glad that your framework supports it.
I have a site that connects with two databases. One powers the website content (CMS DB) the other drives a web application that runs within the site (large amounts of non-CMS data) In fact, the latter uses replication.
I don't feel that's bad design. If one set of data has no relation to the other, then it makes sense even from a pure organization perspective to house it in a separate DB. Otherwise, people would just put all their tables in one DB.
For added security, I always create two accounts for every database: a read-only account (good for SELECT) and a read-write account (for SELECT, UPDATE, INSERT, DELETE and whatever else I might need). On some pages, I may need to use both accounts, thus I will consume two connections for only one database.
Well, reading from one and writing to another is a very common use case. It's easy and fun to write a data access layer that reads from one connection (reading from the slave), and writes to another (the master). A single script might make multiple reads before writing -- perhaps some lookups are necessary for validation, for instance.
Scripting languages are also frequently used for integration. You might have two off-the-shelf codebases, both of which want to maintain their own database. Your integration code might want to talk to both of them.
In general, you can usually design out of using more than one connection, but in general, I don't see anything fundamentally wrong with using connections to more than one database.
Other reasons to have multiple databases. We have one application that everyone can access. We also have client database that are very differnt from client to client. It is easier to maintain the application that all clients use (and which is maintained by a differnte team) if the client_specific data is separated out to their own databases. It is also easier to move the client to a new server when they become a large enterprise client rather than the smaller clietns who run on a server with many other clients.
Further there are types of data that are transactional and need to be in databases that are set to full recovery mode with full transaction logging. Other data is only populated from imports and does not need transactional logging and which might slow down the system as the log grew enough to handle the 10,000,000 record import. These are often split out to a separate databse so they can be in simple recovery mode as it si not necessary to recover data from the transaction log if there is a problem, it can be easily recoverd by re-running the import.
Then data is split out into datawarehouses which are optimized for data reporting not transactions. Again these reporting databases are usually separate databases (often on separate servers).
Then you have the databases for multiple different COTS applications (we have accounting databases, Credit Card transaction porcessing databases, HR databases, our project management database). A particular website might need to access more than one of these or transfer information from one to the other. Believe me vendors won't let you copy their database structure into one database to rule them all.
We have several hundred databases here on many differnt servers.