Reduce MySQL queries by saving result to textfile

Reduce MySQL queries by saving result to textfile - php

I have an app that is posting data from android to some MySQL tables through PHP with a 10 second interval. The same PHP file does a lot of queries on some other tables in the same database and the result is downloaded and processed in the app (with DownloadWebPageTask).
I usually have between 20 and 30 clients connected this way. Most of the data each client query for is the same as for all the other clients. If 30 clients run the same query every 10th second, 180 queries will be run. In fact every client run several queries, some of them are run in a loop (looping through results of another query).
My question is: if I somehow produce a textfile containing the same data, and updating this textfile every x seconds, and let all the clients read this file instead of running the queries themself - is it a better approach? will it reduce serverload?

In my opinion you should consider using memcache.
It will let you store your data in memory which is even faster than files on disk or mysql queries.
What it will also do is reduce load on your database so you will be able to serve more users with the same server/database setup.
Memcache is very easy to use and there are lots of tutorials on the internet.
Here is one to get you started:
http://net.tutsplus.com/tutorials/php/faster-php-mysql-websites-in-minutes/

What you need is caching. You can either cache the data coming from your DB or cache the page itself. Below you can find few links on how do the same in PHP:
http://www.theukwebdesigncompany.com/articles/php-caching.php
http://www.addedbytes.com/articles/for-beginners/output-caching-for-beginners/
And yes. This will reduce DB server load drastically.

Related

Is it good to use memcached in fully dynamic and heavy database website?

I'm currently working on the project which is like e-commerce site. There are hundreds of thousands of records in the database tables. I also have to use join operations on them to get data as there is query builder in project to select criteria of data. It takes too much time to fetch data. So, I'm using limit as some no of records(e.g. 10) per page. Now I come to know the concept of memcached. So I thought to use memcached for my project as it will take too much time for only once. But still there are some doubts.
Will too many cache file affect? I mean there will be too many files will be created as for each page of each module, there will be one cache file. So digit will go approx 10000 cache file.
Let's assume that there is no any problem of no of files. But what about to update files using replace() when any row of table is being added or deleted from middle of the table. And here, table is being updated near about every week.
So I'm in dilemma that should I go for memcached or not? If any one can advice and answer with explanation, then it will be appreciated.

If your website executes many of the same MySQL queries that frequently return the same data, then yes, there is probably some benefit to running memcached.
Problem:
"There are hundreds of thousands of records...It takes too much time to fetch the data".
This probably indicates a problem with your schema. Properly indexed, even when using JOINs, the queries should be able to execute quickly (< 0.1 seconds). Run an EXPLAIN query on the queries that are taking a long time to run and see if they can be improved.
Answer to Question 1
There won't be an issue with too many cache files. Memcached stores all cached information in memory (hence the name), so no disk files are used. Cached objects are stored in RAM and accessed directly from RAM.
Answer to Question 2
Not exactly sure what you are asking here, but if your application updates or deletes information from the database, then it is critical that the cache items affected by the updates and deletes are deleted. If the application doesn't remove cached items affected by such operations, than the next time the data is queried, cached results which are no longer valid may be returned. Make sure any data cached either has appropriate expiration times set, or the application removes them from cache when the data in the database changes.
Hope that helps.

I would start not from Memcached but from figuring out what the bottleneck is. Your tables have roughly one millions rows. I don't know the size of a row but my educated guess is that it is less than 1K based on the fact that a browser window accommodates information from one record.
So it is probably 1G of information in your database. Correct me if I'm wrong. If that's true then the whole database should be automatically cached in RAM by MySQL.
Now that your database is totally in RAM then with proper organization of indexes complexity of a query should be linear with respect to the number of the result set which measured in a number of kilobytes because it fits the browser window.
So my advice is to determine the size of the database and to see the result of "top" command in order to know how much memory is consumed by MySQL. And if you make sure that your database sits totally in memory then run the explain command against your most popular queries and add some indexes to your database according to the result of the explain. Even if your database is bigger than the amount of RAM then I still recommend you to look into the results of the explain command cause it really helps a lot.

Access and store large amount of data from mysql server

We are developing an iOS/Android application which downloads large amounts of data from a server.
We're using JSON to transfer data between the server and client devices.
Recently the size of our data increased a lot (about 30000 records).
When fetching this data, the server request gets timed out and no data gets fetched.
Can anyone suggest the best method to achieve a fast transfer of data?
Is there any method to prepare data initially and download data later?
Is there any advantage of using multiple databases in the device(SQLite dbS) and perform parallel insertion into db's?
Currently we are downloading/uploading only changed data (using UUID and time-stamp).
Is there any best approach to achieve this efficiently?
---- Edit -----
i think its not only the problem of mysql records, at peak times multiple devices are connecting to the server to access data, so connections also goes to waiting. we are using performance server. i am mainly looking for a solution to handle this sync in device. any good method to simplify the sync or make it faster using multi threading, multiple sqlite db etc,...? or data compression, using views or ...?

A good way to achieve this would probably be to download no data at all.
I guess you won't be showing these 30k lines at your client, so why download them in the first place?
It would probably be better to create an API on your server which would help the mobile devices to communicate with the database so the clients would only download the data they actually need / want.
Then, with a cache system on the mobile side you could make yourself sure that clients won't download the same thing every time and that content they have already seen would be available off-line.

When fetching this data, the server request gets timed out and no data gets fetched.
Are you talking only about reads or writes, too?
If you are talking about writing access, as well: Are the 30,000 the result of a single insert/update? Are you using a transactional engine like InnoDB, e.g.? If so, Are your queries wrapped in a single transaction? Having auto commit mode enabled can lead to massive performance issues:
Wrap several modifications into a single transaction to reduce the number of flush operations. InnoDB must flush the log to disk at each transaction commit if that transaction made modifications to the database. The rotation speed of a disk is typically at most 167 revolutions/second (for a 10,000RPM disk), which constrains the number of commits to the same 167th of a second if the disk does not “fool” the operating system.
Source
Can anyone suggest the best method to achieve a fast transfer of data?
How complex is your query designed? Inner or outer joins, correlated or non-correlated subqueries, etc? Use EXPLAIN to inspect the efficiency? Read about EXPLAIN
Also, take a look at your table design: Have you made use of normalization? Are you indexing properly?
Is there any method to prepare data initially and download data later?
How do you mean that? Maybe temporary tables could do the trick.
But without knowing any details of your project, downloading 30,000 records on a mobile at one time sounds weird to me. Probably your application/DB-design needs to be reviewd.
Anyway, for any data that need not be updated/inserted directly to the database use a local SQLite on the mobile. This is much faster, as SQLite is a file-based DB and the data doesn't need to be transferred over the net.

Mysql replication - is it worth it?

Replication
I have an app that Is polling data from a large number of data feeds. It processes thousands of records per day and this number is ever increasing. The data is stored in Mysql. 
I then have a website that utilises this data.
I'm trying to build my environment with future in mind. 
 I thought of mysql replication so that the website can use it's own database on a different server and get bogged down by the thousands of write commands that are happening on the main database. 
I am having difficulty getting this setup, despite mysql reporting it's all working fine. 
I then started think - is there not a better way ?
From what I understand mysql sends the write command to the slave database as the master. 
Does this not mean that what I am trying to avoid is just happening anyway?
Does this mean that the slave database will suffer thousands of writes 
I am a one man band, doing this venture with my own money so I need to do this a cheapest way. I am getting a bit lost !
I have a dedicated server,
A vps
Using Php5, mysql 5 in a lamp stack.
I cannot begin to tell you how much I would appreciate some guidance!

If the slaves are a 1:1 clone of the master, than all writes to the master MUST be propagated down to the slaves. Otherwise replication would be useless.
Thousands of records per day is actually very small. Assuming the same processing time for each, and doing 5000 records, you'd have 86400/5000 = 17.28 seconds per record. That's very minimal write overhead.
If you were doing millions of records a day, THEN you'd have a write bottleneck.

I would split this in three layers.
Data Feed layer. Data read from the feeds is preprocessed and posted into a queue. This layer has a temporary queue that serves also as a temporary storage, a buffer to allow all data feed to post its data. I'd use a Message Queue System. It's fast and reliable.
Data Store layer. This layer reads from the queue, maybe processes someway the data read, and stores the data in the database.
Data Analysis layer. This is your "slave" database. It's a data warehouse. It periodically does ETL (extract, transform and load) data from the Data Store layer to this secondary database.
This layeread approach allows you isolate concerns (speed, reliability, security) and implementation details; and allows for future scalability.

Replication is literally what the word suggest - replicating queries on another machine.
MySQL creates a log that's filled with queries that were used to create the dataset on the original machine (master) and sends it to the slave(s) that read the log and re-execute those queries.
Basically, what you want is to increase your write ratio. That's achievable trough using different engines, for example TokuDB is one of them (however it isn't free, but you are allowed to store 50gb of user data for free and use it).
What you want (for the moment) is fast HDD subsystem more than a monolithic write-scalable storage system. InnoDB is capable of achieving a lot of queries per second on properly configured machine with sufficient hardware. I am not sure about pricing, but SSD and 4-8 gigs of ram shouldn't be that expensive. As Marc. B said - until you reach millions of records per day, you don't have to worry about scaling reads and writes trough replication.

You say you have an app "polling" your data from datafeeds. Does that mean you are doing full text searches? I'm making an assumption here in that you are batch processing date feeds and then querying that. If that is the case I'd offload all your fulltext queries to something like Solr. It actually isn't too time consuming to setup, depending on the size of your DB you can get away with running it on a fairly small VPS or on your dedicated, and best yet the difference is search speed is incredible. I've had full text mysql queries that would take 20 minutes to run be done in solr in under a second.
Just make sure you use a try statement in the event your solr instance goes down.

Best practice to record large amount of hits into MySQL database

Well, this is the thing. Let's say that my future PHP CMS need to drive 500k visitors daily and I need to record them all in MySQL database (referrer, ip address, time etc.). This way I need to insert 300-500 rows per minute and update 50 more. The main problem is that script would call database every time I want to insert new row, which is every time someone hits a page.
My question, is there any way to locally cache incoming hits first (and what is the best solution for that apc, csv...?) and periodically send them to database every 10 minutes for example? Is this good solution and what is the best practice for this situation?

500k daily it's just 5-7 queries per second. If each request will be served for 0.2 sec, then you will have almost 0 simultaneous queries, so there is nothing to worry about.
Even if you will have 5 times more users - all should work fine.
You can just use INSERT DELAYED and tune your mysql.
About tuning: http://www.day32.com/MySQL/ - there is very useful script (will change nothing, just show you the tips how to optimize settings).
You can use memcache or APC to write log there first, but with using INSERT DELAYED MySQL will do almost same work, and will do it better :)
Do not use files for this. DB will serve locks much better, than PHP. It's not so trivial to write effective mutexes, so let DB (or memcache, APC) do this work.

A frequently used solution:
You could implement an counter in memcached which you increment on an visit, and push an update to the database for every 100 (or 1000) hits.

We do this by storing locally on each server to CSV, then having a minutely cron job to push the entries into the database. This is to avoid needing a highly available MySQL database more than anything - the database should be able to cope with that volume of inserts without a problem.

Save them to a directory-based database (or flat file, depends) somewhere and at a certain time, use a PHP code to insert/update them into your MySQL database. Your php code can be executed periodically using Cron, so check if your server has Cron so that you can set the schedule for that, say every 10 minutes.
Have a look at this page: http://damonparker.org/blog/2006/05/10/php-cron-script-to-run-automated-jobs/. Some codes have been written in the cloud and are ready for you to use :)

One way would be to use Apache access.log. You can get a quite fine logging by using cronolog utility with apache . Cronolog will handle the storage of a very big number of rows in files, and can rotate it based on volume day, year, etc. Using this utility will prevent your Apache from suffering of log writes.
Then as said by others, use a cron-based job to analyse these log and push whatever summarized or raw data you want in MySQL.
You may think of using a dedicated database (or even database server) for write-intensive jobs, with specific settings. For example you may not need InnoDB storage and keep a simple MyIsam. And you could even think of another database storage (as said by #Riccardo Galli)

If you absolutely HAVE to log directly to MySQL, consider using two databases. One optimized for quick inserts, which means no keys other than possibly an auto_increment primary key. And another with keys on everything you'd be querying for, optimized for fast searches. A timed job would copy hits from the insert-only to the read-only database on a regular basis, and you end up with the best of both worlds. The only drawback is that your available statistics will only be as fresh as the previous "copy" run.

I have also previously seen a system which records the data into a flat file on the local disc on each web server (be careful to do only atomic appends if using multiple proceses), and periodically asynchronously write them into the database using a daemon process or cron job.
This appears to be the prevailing optimium solution; your web app remains available if the audit database is down and users don't suffer poor performance if the database is slow for any reason.
The only thing I can say, is be sure that you have monitoring on these locally-generated files - a build-up definitely indicates a problem and your Ops engineers might not otherwise notice.

For an high number of write operations and this kind of data you might find more suitable mongodb or couchdb

Because INSERT DELAYED is only supported by MyISAM, it is not an option for many users.
We use MySQL Proxy to defer the execution of queries matching a certain signature.
This will require a custom Lua script; example scripts are here, and some tutorials are here.
The script will implement a Queue data structure for storage of query strings, and pattern matching to determine what queries to defer. Once the queue reaches a certain size, or a certain amount of time has elapsed, or whatever event X occurs, the query queue is emptied as each query is sent to the server.

you can use a Queue strategy using beanstalk or IronQ

Mysql Query vs XML Data in php

We are building a website which would serve about 30k unique visitors a day.
Currently we use a simple mysql Connect > A Simple Query > mysql Close.
I'm afraid that with a dual core server running 2GB of RAM we would be able
to open about 1k mysql connection tops. is 1k a good estimate?
Is it better to make a Cron-Job output XML files and let our php files grab the data from them?

Typically XML will never be faster than MySQL for searching data (i.e. performing queries).
I don't know what kind of data you have, but XML will only be faster if you have a bunch of simple files and don't need to search, just load the files and format them.
If you need to search, then use MySQL.
MySQL does all sorts of optimizations. For example it stores KEY columns in a separate file, allowing for a much faster search.

I would suggest using Zend cache for caching MySQL query results for the the data that doesn't change frequently.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.