Access and store large amount of data from mysql server

Access and store large amount of data from mysql server - php

We are developing an iOS/Android application which downloads large amounts of data from a server.
We're using JSON to transfer data between the server and client devices.
Recently the size of our data increased a lot (about 30000 records).
When fetching this data, the server request gets timed out and no data gets fetched.
Can anyone suggest the best method to achieve a fast transfer of data?
Is there any method to prepare data initially and download data later?
Is there any advantage of using multiple databases in the device(SQLite dbS) and perform parallel insertion into db's?
Currently we are downloading/uploading only changed data (using UUID and time-stamp).
Is there any best approach to achieve this efficiently?
---- Edit -----
i think its not only the problem of mysql records, at peak times multiple devices are connecting to the server to access data, so connections also goes to waiting. we are using performance server. i am mainly looking for a solution to handle this sync in device. any good method to simplify the sync or make it faster using multi threading, multiple sqlite db etc,...? or data compression, using views or ...?

A good way to achieve this would probably be to download no data at all.
I guess you won't be showing these 30k lines at your client, so why download them in the first place?
It would probably be better to create an API on your server which would help the mobile devices to communicate with the database so the clients would only download the data they actually need / want.
Then, with a cache system on the mobile side you could make yourself sure that clients won't download the same thing every time and that content they have already seen would be available off-line.

When fetching this data, the server request gets timed out and no data gets fetched.
Are you talking only about reads or writes, too?
If you are talking about writing access, as well: Are the 30,000 the result of a single insert/update? Are you using a transactional engine like InnoDB, e.g.? If so, Are your queries wrapped in a single transaction? Having auto commit mode enabled can lead to massive performance issues:
Wrap several modifications into a single transaction to reduce the number of flush operations. InnoDB must flush the log to disk at each transaction commit if that transaction made modifications to the database. The rotation speed of a disk is typically at most 167 revolutions/second (for a 10,000RPM disk), which constrains the number of commits to the same 167th of a second if the disk does not “fool” the operating system.
Source
Can anyone suggest the best method to achieve a fast transfer of data?
How complex is your query designed? Inner or outer joins, correlated or non-correlated subqueries, etc? Use EXPLAIN to inspect the efficiency? Read about EXPLAIN
Also, take a look at your table design: Have you made use of normalization? Are you indexing properly?
Is there any method to prepare data initially and download data later?
How do you mean that? Maybe temporary tables could do the trick.
But without knowing any details of your project, downloading 30,000 records on a mobile at one time sounds weird to me. Probably your application/DB-design needs to be reviewd.
Anyway, for any data that need not be updated/inserted directly to the database use a local SQLite on the mobile. This is much faster, as SQLite is a file-based DB and the data doesn't need to be transferred over the net.

Related

Optimization: Where to process data? Database, Server or Client?

I've been thinking a lot about optimization lately. I'm developing an application that makes me think where I should process data considering balancing server load, memory, client, loading, speed, size, etc..
I want to understand better how experienced programmers optimize their code when thinking about processing. Take the following 3 options:
Do some processing on the database level, when I'm getting the data.
Process the data on PHP
Pass the raw data to the client, and process with javascript.
Which would you guys prefer on which occasions and why? Sorry for the broad question, I'd also be thankful if someone could recommend me good reading sources on this.

Database is heart of any application, so you should keep load on database as light as possible. Here are some suggestions
Get only required fields from database.
Two simple queries are better than a single complex query.
Get data from database, process with PHP and then store this processed data into temporary storage(say cache e.g. Memcache, Couchbase, Redis). This data should be set with an expiry time, expiry time totally depends upon type of data. Caching will reduce your database load to a great extent.
Data is stored in normalized form. But if you know in advance that data is going to be requested and producing this data requires joins from many tables, then processed data, in advance, can be stored in separate table and can be served from this table.
Send as few as possible data on client side. Less HTML size will save bandwidth and browser will be able to render page quickly.
Load data on demand(using ajax, lazy loading etc), e.g a image is not visible on a page until user clicks on a tab, this image should be loaded upon user click.

Two thoughts: Computers should work, people should think. (IBM ad from the 1960s.)
"Premature optimization is the root of all evil (or at least most of it) in programming." --Donald Knuth
Unless you are, or are planning to become, Google or Amazon or Facebook, you should focus on functionality. "Make it work before you make it fast." If you are planning to grow to that size, do what they did: throw hardware at the problem. It is cheaper and more likely to be effective.
Edited to add: Since you control the processing power on the server, but probably not on the client, it is generally better to put intensive tasks on the server, especially if the clients are likely to be mobile devices. However, consider network latency, bandwidth requirements, and response time. If you can improve response time by processing on the client, then consider doing so. So, optimize the user experience, not the CPU cycles; you can buy more CPU cycles when you need them.
Finally, remember that the client cannot be trusted. For that reason, some things must be on the server.

So as a rule of thumb, process as much of the data in the database as possible. The cost of creating a new connection to query is very high, so you want to limit it as much as possible. Even if you have to write some very ugly SQL, performing a JOIN will almost always be quicker than performing 2 SELECT statements.
PHP should really only be used to format and cache data. If you are performing a ton of data operations after every request, you are probably storing your data in a format that's not very practical. You want to cache anything that is not changed often in an almost ready to server state using something like Redis or APCu.
Finally, client should never be performing data operations on more than a few objects. You never know the clients resource availability so always keep the client data lean. Perform pagination and sorting on any data sets larger than a few dozen in the back-end. An AJAX request using AngularJS is usually just as quick as performing a sort on 100+ items on an iPad 2.
If you would like further details on any aspect of this answer please ask and I will do my best to provide examples or additional detail.

mySQL table heavy usage

I will be using mySQL with PHP as a serverside / Database model for my mobile application on both iOS and Android.
There will be loads of syncing with the client side, at a rate we assume could reach 100,000 requests to the server per second. where each one is trying to write or read from the server.
My worry is that, CAN MySQL handle this ? Does it have an internal automatic mechanism that locks the table and prevents others from writing at that exact moment when something else is already writing to the DB ? or I should take care of the myself ?

I think you need at least 64-core server (or few servers) to handle 100k transactions per second with MySQL.
And about transactions - I think you have to learn a lot about databases, table locks, transactions before you start writing your very popular application :)

Mysql replication - is it worth it?

Replication
I have an app that Is polling data from a large number of data feeds. It processes thousands of records per day and this number is ever increasing. The data is stored in Mysql. 
I then have a website that utilises this data.
I'm trying to build my environment with future in mind. 
 I thought of mysql replication so that the website can use it's own database on a different server and get bogged down by the thousands of write commands that are happening on the main database. 
I am having difficulty getting this setup, despite mysql reporting it's all working fine. 
I then started think - is there not a better way ?
From what I understand mysql sends the write command to the slave database as the master. 
Does this not mean that what I am trying to avoid is just happening anyway?
Does this mean that the slave database will suffer thousands of writes 
I am a one man band, doing this venture with my own money so I need to do this a cheapest way. I am getting a bit lost !
I have a dedicated server,
A vps
Using Php5, mysql 5 in a lamp stack.
I cannot begin to tell you how much I would appreciate some guidance!

If the slaves are a 1:1 clone of the master, than all writes to the master MUST be propagated down to the slaves. Otherwise replication would be useless.
Thousands of records per day is actually very small. Assuming the same processing time for each, and doing 5000 records, you'd have 86400/5000 = 17.28 seconds per record. That's very minimal write overhead.
If you were doing millions of records a day, THEN you'd have a write bottleneck.

I would split this in three layers.
Data Feed layer. Data read from the feeds is preprocessed and posted into a queue. This layer has a temporary queue that serves also as a temporary storage, a buffer to allow all data feed to post its data. I'd use a Message Queue System. It's fast and reliable.
Data Store layer. This layer reads from the queue, maybe processes someway the data read, and stores the data in the database.
Data Analysis layer. This is your "slave" database. It's a data warehouse. It periodically does ETL (extract, transform and load) data from the Data Store layer to this secondary database.
This layeread approach allows you isolate concerns (speed, reliability, security) and implementation details; and allows for future scalability.

Replication is literally what the word suggest - replicating queries on another machine.
MySQL creates a log that's filled with queries that were used to create the dataset on the original machine (master) and sends it to the slave(s) that read the log and re-execute those queries.
Basically, what you want is to increase your write ratio. That's achievable trough using different engines, for example TokuDB is one of them (however it isn't free, but you are allowed to store 50gb of user data for free and use it).
What you want (for the moment) is fast HDD subsystem more than a monolithic write-scalable storage system. InnoDB is capable of achieving a lot of queries per second on properly configured machine with sufficient hardware. I am not sure about pricing, but SSD and 4-8 gigs of ram shouldn't be that expensive. As Marc. B said - until you reach millions of records per day, you don't have to worry about scaling reads and writes trough replication.

You say you have an app "polling" your data from datafeeds. Does that mean you are doing full text searches? I'm making an assumption here in that you are batch processing date feeds and then querying that. If that is the case I'd offload all your fulltext queries to something like Solr. It actually isn't too time consuming to setup, depending on the size of your DB you can get away with running it on a fairly small VPS or on your dedicated, and best yet the difference is search speed is incredible. I've had full text mysql queries that would take 20 minutes to run be done in solr in under a second.
Just make sure you use a try statement in the event your solr instance goes down.

Flat File or Database to store small amounts of records. Which would be faster for many connections/users

I Have a web application that will support a large number of connections.
Each time a session is created, or a refresh is called, I will run a service that will collect data and store it for viewing. Then using php, read this data from somewhere, and display it back to the user.
My question is; If i'm only reading and writing from a single table with 5 columns and 50~100 rows(per user), would it be faster to store this information in flat file(s) and read from it?

You'll only know for sure by benchmarking it, but keep in mind that the developers of the RDBMS systems have already taken care of the necessary optimizations to move data in and out of database tables, and MySQL has a strong API for PHP, supporting transactional writes.
I would go for the database over flat files for sure, but benchmark your own situation.

Data requests and performance

should i take the new data for my ajax onlinegame worldmap (while dragging/scrolling) from my mysql db or is it better to load the data from a generated (and frequently updated) XML ? (frequently updated -> because of new players joining the game/worldmap)?
in other words:
is mysql capable of dunno a few thousand players scrolling a worldmap (and therefore requesting new data) or should i use a XML sheet?

Personally I hate XML.
For you it might be the right tool for the job, but I'm just going to answer the "is mysql capable of..." part of your question :-)
Yes
But it depends on your SQL skills.
How to speed things up?
Keep the MySQL server on the same machine as the webserver to avoid network traffic.
Use memory tables to avoid disk IO.
Know your way around SQL
MySQL in de default config is tuned to small tables and small memory sizes, this sounds like it fits your case, but experiment and measure to see which config works best.
Fewer selects/inserts/updates with more data per request are faster than more selects/inserts/updates with less data per request.
Also note that if you don't cache the XML file in memory you will hit lock issues on the XML file slowing things down.

A database hit will almost always be more expensive than a file hit (due to crossing a network) - but the quickest option would be to keep an in-memory dataset/cache (be aware of memory consumption though).

i think mysql fits your need.. you also could cluster your data, when running low on system resources…
also could websockets be intressting for you. maybe you should have look at nodejs, with it you can handle new user joins easily (push the new players to the other players instead of pulling the data out of mysql

Is the Ajax response returned as XML or JSON? If the latter, then why bother messing about with XML?
If it were me, I'd maintain the data in the database with smart serverside caching (where you can invalidate cache items selectively)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.