I am interested to know if there is better way to time operations. Relying on timer, as it seems the microtime .. or any other method that reads OS time will clearly not give as good an estimate.
I am interested to know in general what is the precision of timing operations nowadays in a linux based operating system, say Red Hat Linux.
Now to be more specific, I recently wrote an algorithm and tried to compare the time it took. My php code worked like this :
$start = microtime(true);
$result = myTimeConsumingProcess();
$end= microtime(true);
$timeinMiliSec = 1000.0*($end - $start);
It turns out that sometime the process took 0 ms and on different execution, with precisely the same data it took 9.9xxxx milliseconds.
The explanation for this as I can imagine is that time is measured using timer interrupts. If the execution starts and finishes before the next timer interrupt updates the time of the OS, we are bound to get 0 sec as the difference.
It used to be the case from my early DOS days that a timer interrupts was called at every 18.19 ms. Looks like that is not the case any more with better machines and upgraded OS , we have are now running timers faster than that.
Related
Reason
I've been building a system that pulls data from multiple JSON sources. The data being pulled is constantly changing and I'm recording what the changes are to a SQL database via a PHP script. 9 times out of 10 the data is different and therefore needs recording.
The JSON needs to be checked every single second. I've been successfully using a cron task every minute with a PHP function that loops 60 times over.
The problem I'm now having is that the more JSON sources I want to check the slower the PHP file runs, meaning the next cron get's triggered before the previous has finished. It's all starting to feel way too unstable and hacky.
Question
Assuming the PHP script is already the most efficient it can be, what else can be done?
Should I be using multiple cron tasks?
Should something else other then PHP be used?
Are cron tasks even suitable for this sort of problem?
Any experience, best practices or just plan old help will be very much appreciated.
Overview
I'm monitoring for active race sessions and recording each driver and then each lap a driver completes. Laps are recorded only once a driver crosses the start/finish line and I do not know when race sessions may or may not be active or when a driver crosses the line. Therefore I have been checking every second for new data to record.
Each venue where a race session may be active has a separate URL to receive JSON data from. The more venue's I add to my system to monitor the slower the script takes to run.
I've currently 19 venues and the script takes circa 12 seconds to complete. Since I'm running a cron job every minute and looping the script every second. I'm assuming I have at the very least 12 scripts running every second. It just doesn't seem like the most efficient way to do it to me. Of course, it worked a charm back when I was only checking 1 single venue.
There's a cycle to your operations. It is.
start your process by reading the time witn $starttime = time();.
compute the next scheduled time by taking the time plus 60 seconds. $nexttime = $starttime + 60;
do the operations you must do (read a mess of json feeds)
compute how long is left in the minute $timeleft = $nexttime - time();.
sleep until the next scheduled time if ($timeleft > 0) sleep ($timeleft);
set $starttime = $nexttime.
jump back to step 2.
Obviously, if $timeleft is ever negative, you're not keeping up with your measurements. If $timeleft is always negative, you will get further and further behind.
The use of cron every minute is probably wasteful, because it takes resources to fire up a new process and get it going. You probably want to make your process run forever, and use a shell script that monitors it and restarts it if it crashes.
This is all pretty obvious. What's not so obvious is that you should keep track of your individual $timeleft values for each minute over your cycle of measurements. If they vary daily, you should track for a whole day. If they vary weekly you should track for a week.
Then you should should look at the worst (smallest) values of $timeleft. If your 95th percentile is less than about 15 seconds, you're running out of resources and you need to take action. You need a margin like 15 seconds, so your system doesn't move into overload.
If your system has zero tolerance for late sampling of data, you should look at the single worst value of $timeleft, not the 95th percentile. You should give yourself a more generous margin than 15 seconds.
So-called hard real time systems allocate a time slot to each operation, and crash if the operation exceeds the time slot. In your case the time slot is 60 seconds and the operation is reading a certain number of feeds. Crashing is pretty drastic, but measuring is mandatory.
The simplest action to take is to start running multiple worker processes. Give some of your feeds to each process. php runs single-threaded so multiple processes probably will help, at least until you get to three or four of them.
Then you will need to add another computer, and divide your feeds among worker processes on those multiple computers.
A language environment that parses JSON faster than php does might help, but only if the time it takes to parse the JSON is more important than the time it takes to wait for it to arrive.
I'm using the DigitalOcean API to create droplets when my web application needs the extra resources. Because of the way DigitalOcean charges (min. one hour increments), I'd like to keep a created server on for an hour so it's available for new tasks after the initial task is completed.
I'm thinking about formatting my script this way:
<?php
createDroplet($dropletid);
$time = time();
// run resource heavy task
sleep($time + 3599);
deleteDroplet($dropletid);
Is this the best way to achieve this?
It doesn't look like a good idea, but the code is so simple, nothing can compete with that. You would need to make sure your script can run, at least, for that long.
Note that sleep() should not have $time as an argument. It sleeps for the given number of seconds. $time contains many, many seconds.
I am worried that the script might get interrupted, and you will never delete the droplet. Not much you can do about that, given this script.
Also, the sleep() itself might get interrupted, causing it to sleep much shorter than you want. A better sleep would be:
$remainingTime = 3590;
do {
$remainingTime = sleep($remainingTime);
} while ($remainingTime > 0);
This will catch an interrupt of sleep(). This is under the assumption that FALSE is -1. See the manual on sleep(): http://php.net/manual/en/function.sleep.php
Then there's the problem that you want to sleep for exactly 3599 seconds, so that you're only charged one hour. I wouldn't make it so close to one hour. You have to leave some time for DigitalOcean to execute stuff and log the time. I would start with 3590 seconds and make sure that always works.
Finally: What are the alternatives? Clearly this could be a cron job. How would that work? Suppose you execute a PHP script every minute, and you have a database entry that tells you which resource to allocate at a certain start time and deallocate at a certain expire time. Then that script could do this for you with an accurary of about a minute, which should be enough. Even if the server crashes and restarts, as long as the database is intact and the script runs again, everything should go as planned. I know, this is far more work to implement, but it is the better way to do it.
I have some PHP code that executes some server intensive code with a lot of calculations and loops. It typically takes 2 to 10 seconds for it to run. But as the code is written, it should always finish (no infinite loops) and it should never take more than 10 seconds. However, I randomly get these situations where it takes around 60 seconds for the page to load in my browser, yet the PHP microtime function is telling me it only took 2 seconds.
$start_time = microtime(TRUE);
// All my vairous loops and calculations
$end_time = microtime(TRUE);
$g = number_format(($end_time - $start_time), 4);
echo "<br>Total Time " . $g;
I run my own Ubuntu 12 instance so I have complete control over server processes. I'm not a very experienced server admin though. My PHP.ini max_execution_time is set to 30. Yet when I have this problem, it takes well over 30 seconds to finally appear in my browser (and then tells me it ran in 2 seconds). What I'm wondering is if could this simply be a network issue. If so, how do I know if it is my ISP or on the data center side? Could it also be something at the Apache service level? Like some buffer filling up and delaying the sending of the page.
Wanted to know if there was a was to prevent a php timeout of occurring if a part of the code has started being process.
Let me explain:
i have a script that is executed that take way too long to even use
ini_set('max_execution_time', 0);
set_time_limit(0);
the code is built to allow it to timeout and restart where it was but I have 2 line of code that need to be executed together for that to happen
$email->save();
$this->_setDoneWebsiteId($user['id'], $websiteId);
is there a way in php to tell it it has to finish executing them both even if the timeout is called?
Got an idea as I'm writing this, i could use a time_out of 120 sec and start a timer and if there is less then 20 sec left before timeout to stop, i just wanted to know if i was missing something.
Thank you for your inputs.
If your code is not synchronous and some task takes more than 100 seconds - you'll not be able to check the execution time.
I see only one truly HACK (be careful, test it with php -f in console for be able to kill the processes):
<?php
// Any preparations here
register_shutdown_function(function(){
if (error_get_last()) { // There was timeout exceeded error
// Call the rest of your system
// But note: you have no stack, no context, no valid previous frame - nothing!
}
});
One thing you could do is use the DATE time features to monitor your average execution time. (kind of builds up with each execution assuming you have a loop).
If the average then time is then longer than how ever much time you have left (you would be counting how much time has been taken already against your maximum execution time), you would trigger a restart and let it pick up from where it left off.
How ever if you are experiencing time outs, you might want to look at ways to make your code more efficient.
No, you can't abort timeout handler, but i'd say that 20 seconds is quite a big time if you're not parsing something huge. However, you can do the following:
Get time of the execution start ($start = $_SERVER['REQUEST_TIME'] or just $start = microtime(true); in the beginning of your controller).
Asset that execution time is lesser than 100 seconds before running $email->save() or halt/skip code if necessary. This is as easy as if (microtime(true) - $start < 100) { $email->save()... }. You would like more abstraction on this, however.
(if necessary) check execution time after both methods have ran and halt execution if it has timed out.
This will require time_limit set to big value or even turned off and tedious work to prevent too long execution. In most cases timeout is your friend that just tells you're work is taking too much time and you should rethink your architecture and inner processes; if you're out of 120 seconds, you'll probably want to put that work on a daemon.
Thank you for your input, as I thought the timer solution is the best way.
what I ended up doing was the following, this is not the actual code as its too long to make a good answer but just the general idea.
ini_set('max_execution_time', 180);
set_time_limit(180);
$startTime = date('U'); //give me the current timestamps
while(true){
//gather all the data i need for the email
//I know its overkill to break the loop with 1 min remaining
//but i realy dont want to take chances
if(date('U') < ($startTime+120)){
$email->save();
$this->_setDoneWebsiteId($user['id'], $websiteId);
}else{
return false
}
}
I could not use the idea of measuring the average time of each cycle as it vary too much.
I could have made the code more efficient but it number of cycle is based on the number of users and websites in the framework. It should grow big enough to need multiple run to be completed anyway.
Il have to make some research to understand register_shutdown_function, but I will look into it.
Again Thank you!
I've REALLY been wanting to test speeds of regex etc. and on php.net it has this example:
$time_start = microtime(true);
// Sleep for a while
usleep(100); // Or anything for that matter..
$time_end = microtime(true);
$time = $time_end - $time_start;
echo "Did nothing in $time seconds\n";
EDIT: What I meant was to play a large loop of functions in place of usleep(). It always shows a very random number that is always under 1. It shows nothing worth a benchmark!
Timers are like that. Wrap a loop around the function. Run it 10^6 times if you want to measure microseconds, 10^3 times if you want milliseconds. Don't expect to get more than 2 decimal digits of precision.
The usleep function need time in microsecond, you are specifying very low value, try increasing that:
usleep(2000000); // wait for two seconds
usleep function is also known to be tricky on Windows systems.
Or you can simply use the sleep function with seconds to be specified as its parameter eg:
sleep(10); // wait for 10 seconds
Cannot reproduce, and no one has a faster computer than me. 16 threads of execution, eight cores, i7 architecture. Your PHP build or OS must be suboptimal. This is running your code exactly.
`--> php tmp.php
Did nothing in 0.00011587142944336 seconds
.-(/tmp)---------------------------------------------------(pestilence#pandemic)-
This is with usleep(1000000)... (1 second)
`--> php tmp.php
Did nothing in 1.0000479221344 seconds
.-(/tmp)---------------------------------------------------(pestilence#pandemic)-
FYI, I was using PHP 5.3.0, not that it makes any difference (in this case) past 5.0.
Is your code taking less than a second to execute? If so, that'd explain your benchmarks. When I profile something for an article, I run my routine 10,000's of times... sometimes millions. Computers are fast these days.
As far as the "random" part goes, this is why I take several samples of large iterations and average them together. It doesn't hurt to throw the standard deviation in there too. Also, remember not to stream MP3s or play video while testing. :p