I've been looking at asynchronous database requests in PHP using mysqlnd. The code is working correctly but comparing performance pulling data from one reasonable sized table versus the same data split across multiple tables using asynchronous requests I'm not getting anything like the performance I would expect although it does seem fairly changeable according to hardware setup.
As I understand it I should be achieving, rather than:
x = a + b + c + d
Instead:
x = max(a, b, c, d)
Where x is the total time taken and a to d are the times for individual requests. What I am actually seeing is a rather minor increase in performance on some setups and on others worse performance as if requests weren't asynchronous at all. Any thoughts or experiences from others who may have worked with this and come across the same are welcome.
EDIT: Measuring the timings here, we are talking about queries spread over 10 tables, individually the queries take no more than around 8 seconds to complete, combining the time each individual request takes to complete (not asynchronously) it totals around 18 seconds.
Performing the same requests asynchronously total query time is also around 18 seconds. So clearly the requests are not being executed in parallel against the database.
EDIT: Code used is exactly as shown in the documentation here
<?php
$link1 = mysqli_connect();
$link1->query("SELECT 'test'", MYSQLI_ASYNC);
$all_links = array($link1);
$processed = 0;
do {
$links = $errors = $reject = array();
foreach ($all_links as $link) {
$links[] = $errors[] = $reject[] = $link;
}
if (!mysqli_poll($links, $errors, $reject, 1)) {
continue;
}
foreach ($links as $link) {
if ($result = $link->reap_async_query()) {
print_r($result->fetch_row());
if (is_object($result))
mysqli_free_result($result);
} else die(sprintf("MySQLi Error: %s", mysqli_error($link)));
$processed++;
}
} while ($processed < count($all_links));
?>
I'll expand my comments and I'll try to explain why you won't gain any performance using the setup you have currently.
Asynchronous, in your case, means that the process of retrieving data is asynchronous compared to the rest of your code. The two moving parts (getting data) and working with the data are separate and are executed one after another, but only when the data arrives.
This implies that you want to utilize the CPU to its fullest, so you won't invoke PHP code until the data is ready.
In order for that to work, you must seize the control of PHP process and make it use one of operating system's event interfaces (epoll on Linux, or IOCP on Windows). Since PHP is either embedded into a web server (mod_php) or runs as its own standalone FCGI server (php-fpm), that implies the best utilization of asynchronous data fetching would be when you run a CLI php script since it's quite difficult to utilize event interfaces otherwise.
However, let's focus on your problem and why your code isn't faster.
You assumed that you are CPU bound and your solution was to retrieve data in chunks and process them that way - that's great, however since nothing you do yields faster execution, that means you are 100% I/O bound.
The process of retrieving data from databases forces the hard disk to perform seeking. No matter how much you "chunk" that, if the disk is slow and if the data is scattered around the disk - that part will be slow and creating more workers that deal with parts of the data will just make the system slower and slower since each worker will have the same problem with retrieving the data.
I'd conclude that your issue lies in the slow hard disk, too big of a data set that might be improperly constructed for chunked retrieval. I suggest updating this question or creating another question that will help you retrieve data faster and in a more optimal way.
Related
Which one is best when to choose from server-side or client-side?
I have a PHP function something like:
function insert(argument)
{
//do some heavy MySQL work such as sp_call
// that takes near about 1.5 seconds
}
I have to call this function about 500 times.
for(i=1;i<=500;i++)
{
insert(argument);
}
I have two options:
a) call through loop in PHP(server-side)-->server may timed out
b) call through loop in JavaScript(AJAX)-->takes a long time.
Please suggest the the best one, if there is any third one.
If I understand correctly your server still needs to do all the work, so you can't use the clients computer to lessen the power needed on your server, so you have a choice of the following:
Let the client ask the server 500 times. This will easily let you show the process for the client, giving him the satisfactory knowledge that something is happening, or
Let the server do everything to skip the 500 extra round trip times, and extra overhead needed to process the 500 requests.
I would probably go with 1 if it't important that the client don't give up early, or 2 if it's important that the job is done all the way though, as the client might stop the requests after 300.
EDIT: With regard to your comment I would then suggest having a "start work"-button on the client that tells the server to start the job. Your server then tells a background service (which can be created in php) to do the work. And it can update it's process to a file or in a database or something. Then the client and the php server is free to timeout and log out without problems. And then you can update the page to see if the work is completed in the background, which can be collected from the database or file or whatever. Then you minimize both time and dependencies.
You have not given any context for what you are trying to achieve - of key importance here are performance and whether a set of values should be treated as a single transaction.
The further the loop is from the physical storage (not just the DBMS) then the bigger the performance impact. For most web applications the biggest performance bottleneck is the network latency between the client and webserver - even if you are relatively close....say 50 milliseconds away...and have keeaplives working properly, then it will take a minimum of 25 seconds to carry out this operation for 500 data items.
For optimal performance you should be sending the data the DBMS in the least number of DML statements - you've mentioned MySQL which supports multiple row inserts and if you're using MySQLi you can also submit multiple DML statements in the same database call (although the latter just eliminates the chatter between PHP and DBMS while a single DML inserting multiple rows also reduces chatter between the DBMS and the storage). Depending on the data structure and optimiziation this should take in the region of 10s of milliseconds to insert hundreds of rows - both methods will be much, MUCH faster than having the loop running in the client even if the latency were 0.
The length of time the transaction in progress is going to determine the likelihood of the transaction failing - the faster method will therefore be thousands of times more reliable than the Ajax method.
As Krycke suggests, using the client to do some of the work will not save resource on your system - there is an additional overhead of the webserver, PHP instances and DBMS connection. Although these are relatively small, they add up quickly. If you test both approaches you will find that having the loop in PHP or in the database will result in significantly less effort and therefore greater capacity on your server.
Once I had script which was running tens of minutes. My solutions was doing long request through AJAX with timeout 1 second and checking for result in another AJAX threads. Experience for user is better than waiting too long for response from php without ajax.
$.ajax({
...
timeout: 1000
})
So Finally I Got this.
a) Use AJAX if you wanna sure that it will complete. it is also user-friendly as he gets regular responses between AJAX calls.
b) Use Server Side Script if you almost sure that server will not get it down in between and want less load on client.
Now i am using Server Side Script with a waiting message window for the user and user waits for successful submission message else he have to try again.
with a probability that it will succeed in first attempt is 90-95%.
I can't get into too many specifics as this is a project for work, but anyways..
I'm in the process of writing a SOAP client in PHP that pushes all responses to a MySQL database. My main script makes an initial soap request that retrieves a large set of items (approximately ~4000 at the moment, but the list is expected to grow into hundreds of thousands at some point).
Once this list of 4000 items is returned, I use exec("/usr/bin/php path/to/my/historyScript.php &") that sends a history request for each item. The web service api supports up to 30 requests / sec. Below is some pseudo code for what I am currently doing:
$count = 0;
foreach( $items as $item )
{
if ( $count == 30 )
{
sleep(1); // Sleep for one second before calling the next 30 requests
$count = 0;
}
exec('/usr/bin/php path/to/history/script.php &');
$count++;
}
The problem I'm running into is that I am unsure when the processes finish and my development server is starting to crash. Since data is expected to grow, I know this is a very poor solution to my problem.
Might there be a better approach I should consider using for a task like this? I just feel that this is more of a 'hack'
I am not sure, but i feel that the reason for your application crash, you are keeping large set of data in PHP variable. Look into this, based on RAM size this(data size) will leads to system crash. And my suggestion is try to limit incoming data from external service per request, instead number of request to the service.
I am new to working with large amounts of data. I am wondering if there are any best practices when querying a database in batches or if anyone can give any advice.
I have a query that will pull out all data and PHP is used to write the data to an XML file. There can be anywhere between 10 and 500,000 rows of data and I have therefore witten the script to pull the data out in batches of 50, write to the file, then get the next 50 rows, append this to the file etc. Is this OK or should I be doing something else? Could I increase the batch size or should I decrease it to make the script run faster?
Any advice would be much appreciated.
Yes, for huge results it is recommended to use batches (performance and memory reasons).
Here is benchmark and example code of running query in batches
The best way to do this depends on a couple of different things. Most importantly is when and why you are creating this XML file.
If you are creating the XML file on demand, and a user is waiting for the file then you'll need to do some fine tuning and testing for performance.
If it's something that's created on a regular basis, maybe a nightly or hourly task, and then the XML file is requested after it's built (something like an RSS feed builder) then if what you have works I would recommend not messing with it.
As far as performance, there are different things that can help. Put in some simple timers into your scripts and play with the number of records per batch and see if there is any performance differences.
$start = microtime(true);
//process batch
$end = microtime(true);
$runTimeMilliseconds = $end - $start;
If the issue is user feedback, you may consider using AJAX to kick off each batch and report progress to the user. If you give the user feedback, they'll usually be happy to wait longer than if they're just waiting on the page to refresh in whole.
Also, check your SQL query to make sure there's no hidden performance penalties there. http://dev.mysql.com/doc/refman/5.0/en/explain.html EXPLAIN can show you how MySQL goes about processing your queries.
At an extreme, I'd imagine the best performance could be accomplished through parallel processing. I haven't worked with it in PHP, but here's the primary reference http://www.php.net/manual/en/refs.fileprocess.process.php
Depending on your hosting environment you could find the total number of records and split it among sub processes. Each building their own XML fragments. Then you could combine the fragments. So process 1 may handle records 0 to 99, process 2 100 to 199, etc.
You would be surprised ONE simple select all without limit is the fastest,
because it only query database once,
everything else is processed locally
$sql = select all_columns from table;
<?php
// set a very high memory
// query without limit, if can avoid sorting is the best
// iterate mysql result, and set it to an array
// $results[] = $row
// free mysql_result
// write xml for every one thousand
// because building xml is consuming MOST memory
for ($i=0; $i<$len; ++$i)
{
$arr = $results[$i];
// do any xml preparation
// dun forget file-write is expensive too
if ($i%1000 == 0 && $i > 0)
{
// write to file
}
}
?>
The best way to go about this is to schedule it as a CRON job, which i think is the best solution for batch processing in PHP. check this link for more info! Batch Processing in PHP. Hope this helps.
I'm attempting to make a php script that can load the current weather forecast and it uses a bit of XML pre-processing to digest the input, however it is accessed quite often and reloaded. The problem begins with my current host, which yes I do understand why, limits the amount of processing power a script takes up.
Currently takes an entire process for ever execution, which is around 3 seconds per execution. I'm limited to 12, yet I get quite a few pings.
My question to you guys is: What methods, if any, can I use to cache the output of a script so that it does not have to pre-process something it already did 5 minutes ago. Since it is weather, I can have a time difference of up to 2 hours.
I am quite familiar with php too, so don't worry xD.
~Thank you very much,
Jonny :D
You could run a cronjob that would generate the weather forecast data and then just display the whole thing from cache. You could use APC so it is always loaded in memory (plus all other added advantages).
The Zend Framework provides the Zend_Cache object with multiple backends (File, memcached, APD). Or you can roll your own with something like:
$cachFile = "/path/to/cache/file";
$ttl = 60; // 60 second time to live
if (!file_exists($cacheFile) || time()-filemtime($cacheFile) > $ttl) {
$data = getWeatherData(); // Go off and get the data
file_put_contents(serialize($cacheFile), $data);
} else {
$data = unserialize(file_get_contents($cacheFile));
}
need a code snippet to see what kind of processing you are doing. consider using xdebug to better optimize your code.
Also you may use a benchmarking tool such as AB to see how many processes your server can handle.
there are several different caching mechanisms available but without seeing what kind of process you are doing it is hard to say...
3 seconds is an extremely long execution time, as already asked, some cold would be nice to see how you process the 'input' and in what format said input is in.
A quick and dirty article about caching out of script to file is found here:
http://codestips.com/?p=153
I have a scheduled task that runs a script on a regular basis (every hour). This script does some heavy interaction with the database and filesystem and regularly takes several minutes to run. The problem is, the server's cpu-usage spikes while the script is running and slows down normal operations. Is there a way to throttle this process so that it takes longer but does not consume as many resources?
I've looked at different configuration options for PHP but there does not appear to be any that fit my needs.
Setting memory_limit in php.ini to something lower causes my data objects to overflow quite easily.
I've seen similar posts where people suggested using sleep() at certain points in the script but that does not prevent the script from spiking the server.
The optimal solution would be some way to tell the Lamp (in this case Wamp) stack to only use 10% max cpu utilization. I'm not concerned at all about runtime and would prefer that it take longer if it means saving cpu cycles per second. My alternate solution would be to setup a different server with database replication so the cron could go to town without slowing everything else down.
Environment: Windows Server 2k3, Apache 2.2.11, PHP 5.2.9, MySQL 5.1
I appreciate any insight to this situation.
EDIT: I appreciate all the answers, even the ones that are *nix-specific. It's still early enough in my situation to change the hosting environment. Hopefully this question will help others out regardless of the OS.
This is a tricky problem. If you're running the PHP script via the command line, you can set the process's scheduling priority to low (start /low php.exe myscript.php I believe). If your PHP script itself is actually doing most of the processing that's eating your CPU, this might work. However, you said you are doing some heavy database and filesystem interaction, which this solution will not help. It looks like there is a MySQL hint "LOW_PRIORITY" for INSERT and UPDATE queries that may help you there, but I have not tried those.
You can set processes in Windows to be a lower priority. I'm not sure how the process is being kicked off, but if you set the process to be a low priority, whatever wants CPU resources will get them if you set the priority to be really low.
In UNIX (LAMP) I managed to solve the problem by checking the load of the server before continuing the loop
function get_server_load($windows = 0) {
$os = strtolower(PHP_OS);
if(strpos($os, "win") === false) {
if(file_exists("/proc/loadavg")) {
$load = file_get_contents("/proc/loadavg");
$load = explode(' ', $load);
return $load;
}
elseif(function_exists("shell_exec")) {
$load = explode(' ', `uptime`);
return $load;
}
else {
return "";
}
}
}
for(... ... ...){
$data = get_server_load();
if($data[0] < 0.2){
// continue
}else{
sleep(1);
}
}
This function should work also on windows but I can't guarantee it. On linux it gives you back an array with the load of the last 1 minute, 5 minutes and 15 minutes
Also, consider to start your scripts (if by CLI) with a lower priority (in Linux, use "nice")
You can also use other values before continuing the loop, like the number of Apache active processes (you can parse the page 127.0.0.1/server_status?auto if you enabled the mod_status in httpd.conf), or also the MySQL situation (active connections?)
Can you alter your cron entry to launch your script using nice?
Not a good idea to use a server for serving clients and analyse data.
So if you are looking for a final solution, make a few redesign of your application and offload the data analysis from the frontends and the live database to another system dedicated to this task.
Even if you can successfully throttle the analyzer, it would use up precious resources otherwise would be available to serve the users.
This might be a difficult change but it may worth it refactoring your data structures into iterators. Also, if you have circular references in your code, provide a method like clearReferences() that unsets these objects. This is a problem that is solved in PHP 5.3 by the way.
So if you have:
class Row
{
protected $_table;
public function __construct($table)
{
$this->_table = $table;
}
}
class Table
{
protected $_row;
public function __construct()
{
$this->_row = new Row($this);
}
}
Add a clearReferences() method to the Row class:
class Row
{
public function clearReferences()
{
$this->_table = null;
}
}
That's all I can come up with for the moment.
I have a bunch of scripts that I run from cron in a similar way using nice:
0 * * * * nice -n19 php myscript.php
This won't help the RAM utilization (only changing the way the script is written can do that), but it only uses the CPU that would otherwise be idle.
EDIT: didn't see that the question involved a Windows environment, sorry... leaving this in for any *nix users having the same problem..
Perhaps what your script is simply trying to do too much all at once. Would it do less if it ran three times an hour?
Another solution might be to setup an additional server just for running this sort of 'backend' processing. This would particularly effective if it is not putting undue load in the database, just the web server.
Yet another approach to look at is whether it's work can be divided in a different direction. These sorts of scripts often have a few big SQL statements that generate results used to generate a whole lot of little SQL statements. If the latter could be put aside somewhere, they can be run against the database as a later step. Such an approach might also let you use an unbuffered query to fetch the pre-processing data which could cut down significantly on memory consumption by the PHP code.
If you have it(Apache) running as a service, you can change the priority
settings in the Win control center /services.
Your CPU usage will spike anyway, but other programs will be preferred
by the scheduler.
Also try putting the database/server on a different hd than your
Applications.