Data requests and performance

Data requests and performance - php

should i take the new data for my ajax onlinegame worldmap (while dragging/scrolling) from my mysql db or is it better to load the data from a generated (and frequently updated) XML ? (frequently updated -> because of new players joining the game/worldmap)?
in other words:
is mysql capable of dunno a few thousand players scrolling a worldmap (and therefore requesting new data) or should i use a XML sheet?

Personally I hate XML.
For you it might be the right tool for the job, but I'm just going to answer the "is mysql capable of..." part of your question :-)
Yes
But it depends on your SQL skills.
How to speed things up?
Keep the MySQL server on the same machine as the webserver to avoid network traffic.
Use memory tables to avoid disk IO.
Know your way around SQL
MySQL in de default config is tuned to small tables and small memory sizes, this sounds like it fits your case, but experiment and measure to see which config works best.
Fewer selects/inserts/updates with more data per request are faster than more selects/inserts/updates with less data per request.
Also note that if you don't cache the XML file in memory you will hit lock issues on the XML file slowing things down.

A database hit will almost always be more expensive than a file hit (due to crossing a network) - but the quickest option would be to keep an in-memory dataset/cache (be aware of memory consumption though).

i think mysql fits your need.. you also could cluster your data, when running low on system resources…
also could websockets be intressting for you. maybe you should have look at nodejs, with it you can handle new user joins easily (push the new players to the other players instead of pulling the data out of mysql

Is the Ajax response returned as XML or JSON? If the latter, then why bother messing about with XML?
If it were me, I'd maintain the data in the database with smart serverside caching (where you can invalidate cache items selectively)

Related

Optimization: Where to process data? Database, Server or Client?

I've been thinking a lot about optimization lately. I'm developing an application that makes me think where I should process data considering balancing server load, memory, client, loading, speed, size, etc..
I want to understand better how experienced programmers optimize their code when thinking about processing. Take the following 3 options:
Do some processing on the database level, when I'm getting the data.
Process the data on PHP
Pass the raw data to the client, and process with javascript.
Which would you guys prefer on which occasions and why? Sorry for the broad question, I'd also be thankful if someone could recommend me good reading sources on this.

Database is heart of any application, so you should keep load on database as light as possible. Here are some suggestions
Get only required fields from database.
Two simple queries are better than a single complex query.
Get data from database, process with PHP and then store this processed data into temporary storage(say cache e.g. Memcache, Couchbase, Redis). This data should be set with an expiry time, expiry time totally depends upon type of data. Caching will reduce your database load to a great extent.
Data is stored in normalized form. But if you know in advance that data is going to be requested and producing this data requires joins from many tables, then processed data, in advance, can be stored in separate table and can be served from this table.
Send as few as possible data on client side. Less HTML size will save bandwidth and browser will be able to render page quickly.
Load data on demand(using ajax, lazy loading etc), e.g a image is not visible on a page until user clicks on a tab, this image should be loaded upon user click.

Two thoughts: Computers should work, people should think. (IBM ad from the 1960s.)
"Premature optimization is the root of all evil (or at least most of it) in programming." --Donald Knuth
Unless you are, or are planning to become, Google or Amazon or Facebook, you should focus on functionality. "Make it work before you make it fast." If you are planning to grow to that size, do what they did: throw hardware at the problem. It is cheaper and more likely to be effective.
Edited to add: Since you control the processing power on the server, but probably not on the client, it is generally better to put intensive tasks on the server, especially if the clients are likely to be mobile devices. However, consider network latency, bandwidth requirements, and response time. If you can improve response time by processing on the client, then consider doing so. So, optimize the user experience, not the CPU cycles; you can buy more CPU cycles when you need them.
Finally, remember that the client cannot be trusted. For that reason, some things must be on the server.

So as a rule of thumb, process as much of the data in the database as possible. The cost of creating a new connection to query is very high, so you want to limit it as much as possible. Even if you have to write some very ugly SQL, performing a JOIN will almost always be quicker than performing 2 SELECT statements.
PHP should really only be used to format and cache data. If you are performing a ton of data operations after every request, you are probably storing your data in a format that's not very practical. You want to cache anything that is not changed often in an almost ready to server state using something like Redis or APCu.
Finally, client should never be performing data operations on more than a few objects. You never know the clients resource availability so always keep the client data lean. Perform pagination and sorting on any data sets larger than a few dozen in the back-end. An AJAX request using AngularJS is usually just as quick as performing a sort on 100+ items on an iPad 2.
If you would like further details on any aspect of this answer please ask and I will do my best to provide examples or additional detail.

Access and store large amount of data from mysql server

We are developing an iOS/Android application which downloads large amounts of data from a server.
We're using JSON to transfer data between the server and client devices.
Recently the size of our data increased a lot (about 30000 records).
When fetching this data, the server request gets timed out and no data gets fetched.
Can anyone suggest the best method to achieve a fast transfer of data?
Is there any method to prepare data initially and download data later?
Is there any advantage of using multiple databases in the device(SQLite dbS) and perform parallel insertion into db's?
Currently we are downloading/uploading only changed data (using UUID and time-stamp).
Is there any best approach to achieve this efficiently?
---- Edit -----
i think its not only the problem of mysql records, at peak times multiple devices are connecting to the server to access data, so connections also goes to waiting. we are using performance server. i am mainly looking for a solution to handle this sync in device. any good method to simplify the sync or make it faster using multi threading, multiple sqlite db etc,...? or data compression, using views or ...?

A good way to achieve this would probably be to download no data at all.
I guess you won't be showing these 30k lines at your client, so why download them in the first place?
It would probably be better to create an API on your server which would help the mobile devices to communicate with the database so the clients would only download the data they actually need / want.
Then, with a cache system on the mobile side you could make yourself sure that clients won't download the same thing every time and that content they have already seen would be available off-line.

When fetching this data, the server request gets timed out and no data gets fetched.
Are you talking only about reads or writes, too?
If you are talking about writing access, as well: Are the 30,000 the result of a single insert/update? Are you using a transactional engine like InnoDB, e.g.? If so, Are your queries wrapped in a single transaction? Having auto commit mode enabled can lead to massive performance issues:
Wrap several modifications into a single transaction to reduce the number of flush operations. InnoDB must flush the log to disk at each transaction commit if that transaction made modifications to the database. The rotation speed of a disk is typically at most 167 revolutions/second (for a 10,000RPM disk), which constrains the number of commits to the same 167th of a second if the disk does not “fool” the operating system.
Source
Can anyone suggest the best method to achieve a fast transfer of data?
How complex is your query designed? Inner or outer joins, correlated or non-correlated subqueries, etc? Use EXPLAIN to inspect the efficiency? Read about EXPLAIN
Also, take a look at your table design: Have you made use of normalization? Are you indexing properly?
Is there any method to prepare data initially and download data later?
How do you mean that? Maybe temporary tables could do the trick.
But without knowing any details of your project, downloading 30,000 records on a mobile at one time sounds weird to me. Probably your application/DB-design needs to be reviewd.
Anyway, for any data that need not be updated/inserted directly to the database use a local SQLite on the mobile. This is much faster, as SQLite is a file-based DB and the data doesn't need to be transferred over the net.

Mysql replication - is it worth it?

Replication
I have an app that Is polling data from a large number of data feeds. It processes thousands of records per day and this number is ever increasing. The data is stored in Mysql. 
I then have a website that utilises this data.
I'm trying to build my environment with future in mind. 
 I thought of mysql replication so that the website can use it's own database on a different server and get bogged down by the thousands of write commands that are happening on the main database. 
I am having difficulty getting this setup, despite mysql reporting it's all working fine. 
I then started think - is there not a better way ?
From what I understand mysql sends the write command to the slave database as the master. 
Does this not mean that what I am trying to avoid is just happening anyway?
Does this mean that the slave database will suffer thousands of writes 
I am a one man band, doing this venture with my own money so I need to do this a cheapest way. I am getting a bit lost !
I have a dedicated server,
A vps
Using Php5, mysql 5 in a lamp stack.
I cannot begin to tell you how much I would appreciate some guidance!

If the slaves are a 1:1 clone of the master, than all writes to the master MUST be propagated down to the slaves. Otherwise replication would be useless.
Thousands of records per day is actually very small. Assuming the same processing time for each, and doing 5000 records, you'd have 86400/5000 = 17.28 seconds per record. That's very minimal write overhead.
If you were doing millions of records a day, THEN you'd have a write bottleneck.

I would split this in three layers.
Data Feed layer. Data read from the feeds is preprocessed and posted into a queue. This layer has a temporary queue that serves also as a temporary storage, a buffer to allow all data feed to post its data. I'd use a Message Queue System. It's fast and reliable.
Data Store layer. This layer reads from the queue, maybe processes someway the data read, and stores the data in the database.
Data Analysis layer. This is your "slave" database. It's a data warehouse. It periodically does ETL (extract, transform and load) data from the Data Store layer to this secondary database.
This layeread approach allows you isolate concerns (speed, reliability, security) and implementation details; and allows for future scalability.

Replication is literally what the word suggest - replicating queries on another machine.
MySQL creates a log that's filled with queries that were used to create the dataset on the original machine (master) and sends it to the slave(s) that read the log and re-execute those queries.
Basically, what you want is to increase your write ratio. That's achievable trough using different engines, for example TokuDB is one of them (however it isn't free, but you are allowed to store 50gb of user data for free and use it).
What you want (for the moment) is fast HDD subsystem more than a monolithic write-scalable storage system. InnoDB is capable of achieving a lot of queries per second on properly configured machine with sufficient hardware. I am not sure about pricing, but SSD and 4-8 gigs of ram shouldn't be that expensive. As Marc. B said - until you reach millions of records per day, you don't have to worry about scaling reads and writes trough replication.

You say you have an app "polling" your data from datafeeds. Does that mean you are doing full text searches? I'm making an assumption here in that you are batch processing date feeds and then querying that. If that is the case I'd offload all your fulltext queries to something like Solr. It actually isn't too time consuming to setup, depending on the size of your DB you can get away with running it on a fairly small VPS or on your dedicated, and best yet the difference is search speed is incredible. I've had full text mysql queries that would take 20 minutes to run be done in solr in under a second.
Just make sure you use a try statement in the event your solr instance goes down.

What is more expensive for template reading: Database query or File reading?

My question is fairly simple; I need to read out some templates (in PHP) and send them to the client.
For this kind of data, specifically text/html and text/javascript; is it more expensive to read them out a MySQL database or out of files?
Kind regards
Tom
inb4 security; I'm aware.
PS: I read other topics about similar questions but they either had to do with other kind of data, or haven't been answered.

Reading from a database is more expensive, no question.
Where do the flat files live? On the file system. In the best case, they've been recently accessed so the OS has cached the files in memory, and it's just a memory read to get them into your PHP program to send to the client. In the worst case, the OS has to copy the file from disc to memory before your program can use it.
Where does the data in a database live? On the file system. In the best case, they've been recently accessed so MySQL has that table in memory. However, your program can't get at that memory directly, it needs to first establish a connection with the server, send authentication data back and forth, send a query, MySQL has to parse and execute the query, then grab the row from memory and send it to your program. In the worst case, the OS has to copy from the database table's file on disk to memory before MySQL can get the row to send.
As you can see, the scenarios are almost exactly the same, except that using a database involves the additional overhead of connections and queries before getting the data out of memory or off disc.

There are many factors that would affect how expensive both are.
I'll assume that since they are templates, they probably won't be changing often. If so, flat-file may be a better option. Anything write-heavy should be done in a database.
Reading a flat-file should be faster than reading data from the database.
Having them in the database usually makes it easier for multiple people to edit.
You might consider using memcache to store the templates after reading them, since reading from memory is always faster than reading from a db or flat-file.

It really doesnt make enough difference to worry you. What sort of volume are you working with? Will you have over a million page views a day? If not I'd say pick whichever one is easiest for you to code with and maintain and dont worry about the expense of the alternatives until it becomes a problem.
Specifically, if your templates are currently in file form I would leave them there, and if they are currently in DB form I'd leave them there.

XML vs. MySQL direct query performance

What, if any, is the performance overhead of using XML as the interface between a Php application (A) and a MySQL database via another Php application (B), rather than direct querying from Php application (A) to the database?
How much will this change between application (A) and the database being on the same server, and being on separate servers?

There are a number of variables here that would impact performance. Generally the database connection is faster than transmitting and parsing XML, but issues like network latency, message size, and data complexity will all effect how much faster.
On the other hand there are some good reasons to have only one program interacting with the database, like data integrity, that may make the overhead costs worth paying.

XML is a fairly heavy language, in that, there is lots of extra data to convey specific data (ie, the opening/closing tags). This processing is quite CPU intensive, so for larger messages, it can impact performance significantly. If the message sizes are small enough, the performance shouldn't be too bad, you just need to account for what is generating the XML, and what is processing it.
In my opinion, MySQL will be faster, easier to develop, and easier to manage (storing / updating / deleting).

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.