Simple PHP Script Makes Heavy Server Load - php

I've got a simple PHP script that, once ran, makes impossible for me to access any other page on the server.
The script is as simple as this:
for($league=11387; $league<=11407; $league++){
for($i=1; $i<9; $i++){
//gets the team object here from external resource
$team = $HT->getYouthTeam($HT->getTeam($HT->getLeague($league)->getTeam($i)->getTeamId())->getYouthTeamId());
if($team->getId() != 2286094){
$youthTeams[] = $team;
}
set_time_limit(10);
}
}
Obviously, I am supposed to get thousands of "teams" here (except one with the ID of 2286094), but once I run this script I cannot open any other page on the server until this is over and it takes lots of time until the script fetches the results into $youthTeams array.
My intent was to make a progress bar that would tell exactly (in %) where the script is at, but I can't since this script makes impossible for the server to display any other pages (you get any other page "loading" but it never loads because of this script being ran on the server).
Also, addition sub-question: once all of this data is fetched, would it be smart to insert it all into the mysql database in one single query?
I really wanna learn more on this and want to get this finished so please help me out on this one.

Maybe you can identify which one of your lookups eats the most time by checking on the times?
$t0=microtime(1);
$teamid=$HT->getLeague($league)->getTeam($i)->getTeamId();
echo "lookup teamid: ".(($t1=microtime(1))-$t0)."<br>";
if (if($team->getId() != 2286094) {
$youthteamid=$HT->getTeam($teamid)->getYouthTeamId();
echo "lookup youthteamid: ".(($t2=microtime(1))-$t1)."<br>";
$youthteam = $HT->getYouthTeam($youthteamid);
echo "lookup youthteam: ".(($t3=microtime(1))-$t2)."<br>total time: ".($t3-$t0)."<br>";
}

Related

Updating page info using jQuery from a PHP script that performs an external connection

I have a PHP script that performs a connection to my other server using file_get_contents, and then retrieves and displays the data.
//authorize connection to the ext. server
$xml_data=file_get_contents("http://server.com/connectioncounts");
$doc = new DOMDocument();
$doc->loadXML($xml_data);
//variables to check for name / connection count
$wmsast = $doc->getElementsByTagName('Name');
$wmsasct = $wmsast->length;
//start the loop that fetches and displays each name
for ($sidx = 0; $sidx < $wmsasct; $sidx++) {
$strname = $wmsast->item($sidx)->getElementsByTagName("WhoIs")->item(0)->nodeValue;
$strctot = $wmsast->item($sidx)->getElementsByTagName("Sessions")->item(0)->nodeValue;
/**************************************
Display only one instance of their name.
strpos will check to see if the string contains a _ character
**************************************/
if (strpos($strname, '_') !== FALSE){
//null. ignoring any duplicates
}
else {
//Leftovers. This section contains the names that are only the BASE (no _jibberish, etc)
echo $sidx . " <b>Name: </b>" . $strname . " Sessions: " . $strctot . "<br />";
}//end display base check
}//end name loop
From the client side, I'm calling on this script using jQuery load () and to execute using mousemove().
$(document).mousemove(function(event){
$('.xmlData').load('./connectioncounts.php').fadeIn(1000);
});
And I've also experimented with set interval which works just as well:
var auto_refresh = setInterval(
function ()
{
$('.xmlData').load('./connectioncounts.php').fadeIn("slow");
}, 1000); //refresh, 1000 milli = 1 second
It all works and the contents appear in "real time", but I can already notice an effect on performance and it's just me using it.
I'm trying to come up with a better solution but falling short. The problem with what I have now is that each client would be forcing the script to initiate a new connection to the other server, so I need a solution that will consistently keep the information updated without involving the clients making a new connection directly.
One idea I had was to use a cron job that executes the script, and modify the PHP to log the contents. Then I could simply get the contents of that cache from the client side. This would mean that there is only one connection being made instead of forcing a new connection every time a client wants the data.
The only problem is that the cron would have to be run frequently, like every few seconds. I've read about people running cron this much before, but every instance I've come across isn't making an external connection each time as well.
Is there any option for me other than cron to achieve this or in your experience is that good enough?
How about this:
When the first client reads your data, you retrieve them from the remote server and cache them together with a timestamp.
When the next clients read the same data, you check how old the contents of the cache is and only if it's older than 2 seconds (or whatever) you access the remote server again.
make yourself familiar with APC as a global storage. Once you have fetched the file, store it in the APC cache and set a timeout. You only need to connect to the remote server, once a page is not in the cache or outdated.
Mousemove: are you sure? That generates gazllions of parallel requests unless you set a semaphore clientside to not issue any AJAX queries anymore.

Downloading pages in parallel using PHP

I have to scrap a web site where i need to fetch multiple URLs and then process them one by one. The current process somewhat goes like this.
I fetch a base URL and get all secondary URLs from this page, then for each secondary url I fetch that URL, process found page, download some photos (which takes quite a long time) and store this data to database, then fetch next URL and repeat the process.
In this process, I think I am wasting some time in fetching secondary URL at the start of each iteration. So I am trying to fetch next URLs in parallel while processing first iteration.
The solution in my mind is, from main process call a PHP script, say downloader, which will download all the URL (with curl_multi or wget) and store them in some database.
My questions are
How to call such downloder asynchronously, I don't want my main script to wait till downloder completes.
Any location to store downloaded data, such as shared memory. Of course, other than database.
There any chances that data gets corrupt while storing and retrieving, how to avoid this?
Also, please guide me know if anyone have a better plan.
When I hear someone uses curl_multi_exec it usually turns out they just load it with, say, 100 urls, then wait when all complete, and then process them all, and then start over with the next 100 urls... Blame me, I was doing so too, but then I found out that it is possible to remove/add handles to curl_multi while something is still in progress, And it really saves a lot of time, especially if you reuse already open connections. I wrote a small library to handle queue of requests with callbacks; I'm not posting full version here of course ("small" is still quite a bit of code), but here's a simplified version of the main thing to give you the general idea:
public function launch() {
$channels = $freeChannels = array_fill(0, $this->maxConnections, NULL);
$activeJobs = array();
$running = 0;
do {
// pick jobs for free channels:
while ( !(empty($freeChannels) || empty($this->jobQueue)) ) {
// take free channel, (re)init curl handle and let
// queued object set options
$chId = key($freeChannels);
if (empty($channels[$chId])) {
$channels[$chId] = curl_init();
}
$job = array_pop($this->jobQueue);
$job->init($channels[$chId]);
curl_multi_add_handle($this->master, $channels[$chId]);
$activeJobs[$chId] = $job;
unset($freeChannels[$chId]);
}
$pending = count($activeJobs);
// launch them:
if ($pending > 0) {
while(($mrc = curl_multi_exec($this->master, $running)) == CURLM_CALL_MULTI_PERFORM);
// poke it while it wants
curl_multi_select($this->master);
// wait for some activity, don't eat CPU
while ($running < $pending && ($info = curl_multi_info_read($this->master))) {
// some connection(s) finished, locate that job and run response handler:
$pending--;
$chId = array_search($info['handle'], $channels);
$content = curl_multi_getcontent($channels[$chId]);
curl_multi_remove_handle($this->master, $channels[$chId]);
$freeChannels[$chId] = NULL;
// free up this channel
if ( !array_key_exists($chId, $activeJobs) ) {
// impossible, but...
continue;
}
$activeJobs[$chId]->onComplete($content);
unset($activeJobs[$chId]);
}
}
} while ( ($running > 0 && $mrc == CURLM_OK) || !empty($this->jobQueue) );
}
In my version $jobs are actually of separate class, not instances of controllers or models. They just handle setting cURL options, parsing response and call a given callback onComplete.
With this structure new requests will start as soon as something out of the pool finishes.
Of course it doesn't really save you if not just retrieving takes time but processing as well... And it isn't a true parallel handling. But I still hope it helps. :)
P.S. did a trick for me. :) Once 8-hour job now completes in 3-4 mintues using a pool of 50 connections. Can't describe that feeling. :) I didn't really expect it to work as planned, because with PHP it rarely works exactly as supposed... That was like "ok, hope it finishes in at least an hour... Wha... Wait... Already?! 8-O"
You can use curl_multi: http://www.somacon.com/p537.php
You may also want to consider doing this client side and using Javascript.
Another solution is to write a hunter/gatherer that you submit an array of URLs to, then it does the parallel work and returns a JSON array after it's completed.
Put another way: if you had 100 URLs you could POST that array (probably as JSON as well) to mysite.tld/huntergatherer - it does whatever it wants in whatever language you want and just returns JSON.
Aside from the curl multi solution, another one is just having a batch of gearman workers. If you go this route, I've found supervisord a nice way to start a load of deamon workers.
Things you should look at in addition to CURL multi:
Non-blocking streams (example: PHP-MIO)
ZeroMQ for spawning off many workers that do requests asynchronously
While node.js, ruby EventMachine or similar tools are quite great for doing this stuff, the things I mentioned make it fairly easy in PHP too.
Try execute from PHP, python-pycurl scripts. Easier, faster than PHP curl.

PHP spreading a script into multiple parts to avoid server timeout

I have a script that is very long to execute, so when i run it it hit the max execution time on my webserver and end up timing out.
To illustrate that imagine i have a for loop that make some pretty intensive manipulation one million time. How could i spread this loop execution in several parts so that i don t hit the max execution time of my Webserver?
Many thanks,
If you have an application that is going to loop a known number of times (i.e. you are sure that it's going to finish some time) you can increase time limit inside the loop:
foreach ($data as $row) {
set_time_limit(10);
// do your stuff here
}
This solution will protect you from having one run-away iteration, but will let your whole script run undisturbed as long as you need.
Best solution is to use http://php.net/manual/en/function.set-time-limit.php to change the timeout. Otherwise, you can use 301 redirects to send to an updated URL on a timeout.
$threshold = 10000;
$t = microtime();
$i = isset( $_GET['i'] ) ? $_GET['i'] : 0;
for( $i; $i < 10000000; $i++ )
{
if( microtime - $t > $threshold )
{
header('Location: http://www.example.com/?i='.$i);
exit;
}
// Your code
}
The browser will only respect a few redirects before it stops, you're better to use javascript to force a page reload.
I someday used a technique where I splitted the work from one file into three parts. It was just an array of 120.000 elements with intensive operation. I created a splitter script which stored the arrays in a database of the size of 40.000 each one. Then I created an HTML file with a redirect to the first PHP file to compute the first 40.000 elements. After computing the first 40.000 elments I had again a HTML forward to the next PHP file and so on.
Not very elegant, but it worked :-)
If you have the right permissions on your hosting server, you could use the php interpreter to execute a php script and have it run in the background.
See Asynchronous shell exec in PHP.
if you are running a script that needs to execute for unknown time, you can use:
set_time_limit(0);
If possible you can make the script so that it handles a portion of the wanted operations. Once it completes say 10%, you via AJAX call the script again to execute the next 10%. But there are circumstances where this is not an ideal solution, it really depends on what you are doing.
I used this method to create a web-based crawler which only ran on my computer for instance. If it had to do the operations at once it would time out as well. So it was split into 200 "tasks", each called via Ajax once the previous completes. Works perfectly, and it's been over a year since it started running (crawling?)

How to design an AJAX interface to show progress bar based ona running script backend

Ok here is my problem.
I have a file which outputs an XML based on an input X
I have another file which calls the above(1) file with 10000 (i mean many) times with different numbers for X
When an user clicks "Go" It should go through all those 10000 Xs and simultaneously show him a progress of how many are done. (hmm may be updated once every 10sec).
How do i do it? I need ideas. I know how to AJAX and stuff, but whats the structure my program should take?
EDIT
So according to the answer given below i did store my output in a session variable. It then outputs the answer. What is happening is:
When i execute a loong script. It gets executed say within 1min. But in the mean time if i open (in a new window) just the file which outputs my SESSION variable, then it doesnt output will the first script has run. Which is completely opposite to what i want. Whats the problem here? Is it my syste/server which doesnt handle multiple requests or what?
EDIT 2
I use the files approach:
To read what i want
> <?php include_once '../includeTop.php'; echo
> util::readFromLog("../../Files/progressData.tmp"); ?>
and in another script
$processed ++;
util::writeToLog($dir.'/progressData.tmp', "Files processed: $processed");
where the functions are:
public static function writeToLog($file,$data) {
$f = fopen($file,"w");
fwrite($f, $data);
fclose($f);
}
public static function readFromLog($file) {
return file_get_contents($file);
}
But still the same problem persist :(. I can manually see the file gettin updated like 1, 2, 3 etc. But when i run my script to do from php it just waits till my original script is output.
EDIT 3
Ok i finally found the solution. Instead of seeking the output from the php file i directly goto the log now and seek it.
Put the progress (i.e. how far are you into the 2nd file) into a memcached directly from the background job, then deliver that value if requested by the javascript application (triggered by a timer, as long as you did not reach a 100%). The only thing you need to figure out is how to pass some sort of "transaction ID" to both the background job and the javascript side, so they access the same key in memcached.
Edit: I was wrong about $_SESSION. It doesn't update asynchronously, i.e. the values you store in it are not accessible until the script has finished. Whoops.
So the progress needs to be stored in something that does update asynchronously: Memory (like pyroscope suggests, and which is still the best solution), a file, or the database.
In other words, instead of using $_SESSION to store the value, it should be stored by memcached, in a file or in the database.
I.e. using the database
$progress = 0;
mysql_query("INSERT INTO `progress` (`id`, `progress`) VALUES ($uid, $progress)");
# loop starts
# processing...
$progress += $some_increment;
mysql_query("UPDATE `progress` SET `progress`=$progress WHERE `id`=$uid");
# loop ends
Or using a file
$progress = 0;
file_put_contents("/path/to/progress_files/$uid", $progress);
# loop starts
# processing...
$progress += $some_increment;
file_put_contents("/path/to/progress_files/$uid", $progress);
# loop ends
And then read the file/select from the database, when requesting progress via ajax. But it's not a pretty solution compared to memcached.
Also, remember to remove the file/database row once it's all done.
You could put the progress in a $_SESSION variable (you'll need a unique name for it), and update it while the process runs. Meanwhile your ajax request simply gets that variable at a specific interval
function heavy_process($input, $uid) {
$_SESSION[$uid] = 0;
# loop begins
# processing...
$_SESSION[$uid] += $some_increment;
# loop ends
}
Then have a url that simply spits out the $_SESSION[$uid] value when it's requested via ajax. Then use the returned value to update the progress bar. Use something like sha1(microtime()) to create the $uid
Edit: pyroscope's solution is technically better, but if you don't have a server with memcached or the ability to run background processes, you can use $_SESSION instead

best way to measure (and refine) performance with PHP?

A site I am working with is starting to get a little sluggish, and I would like to refine it. I think the problem is with the PHP, but I can't be sure. How can I see how long functions are taking to perform?
If you want to test the execution time :
<?php
$startTime = microtime(true);
// Your content to test
$endTime = microtime(true);
$elapsed = $endTime - $startTime;
echo "Execution time : $elapsed seconds";
?>
Try the profiler feature in XDebug or Zend Debugger?
Two things you can do.
place Microtime calls everywhere although its not convenient if you want to test more than one function. So there is a simpler way to do it a better solution if you want to test many functions which i assume you would like to do.
just have a class (click on link to follow tutorial) where you can test how long all your functions take. Rather than place microtime everywhere. you just use this class. which is very convenient
http://codeaid.net/php/calculate-script-execution-time-%28php-class%29
the second thing you can do is to optimize your script is by taking a look at the memory usage.
By observing the memory usage of your scripts, you may be able optimize your code better.
PHP has a garbage collector and a pretty complex memory manager. The amount of memory being used by your script. can go up and down during the execution of a script. To get the current memory usage, we can use the memory_get_usage() function, and to get the highest amount of memory used at any point, we can use the memory_get_peak_usage() function.
view plaincopy to clipboardprint?
echo "Initial: ".memory_get_usage()." bytes \n";
/* prints
Initial: 361400 bytes
*/
// let's use up some memory
for ($i = 0; $i < 100000; $i++) {
$array []= md5($i);
}
// let's remove half of the array
for ($i = 0; $i < 100000; $i++) {
unset($array[$i]);
}
echo "Final: ".memory_get_usage()." bytes \n";
/* prints
Final: 885912 bytes
*/
echo "Peak: ".memory_get_peak_usage()." bytes \n";
/* prints
Peak: 13687072 bytes
*/
http://net.tutsplus.com/tutorials/php/9-useful-php-functions-and-features-you-need-to-know/
PK
You can also make it manually, by recording microtime() value in various places, like this:
<?
$TIMER['start']=microtime(TRUE);
// some code
$query="SELECT ...";
$TIMER['before q']=microtime(TRUE);
$res=mysql_query($query);
$TIMER['after q']=microtime(TRUE);
while ($row = mysql_fetch_array($res)) {
// some code
}
$TIMER['array filled']=microtime(TRUE);
// some code
$TIMER['pagination']=microtime(TRUE);
/and so on
?>
and then visualize it
<?
if ('127.0.0.1' === $_SERVER['REMOTE_ADDR']) {
echo "<table border=1><tr><td>name</td><td>so far</td><td>delta</td><td>per cent</td></tr>";
reset($TIMER);
$start=$prev=current($TIMER);
$total=end($TIMER)-$start;
foreach($TIMER as $name => $value) {
$sofar=round($value-$start,3);
$delta=round($value-$prev,3);
$percent=round($delta/$total*100);
echo "<tr><td>$name</td><td>$sofar</td><td>$delta</td><td>$percent</td></tr>";
$prev=$value;
}
echo "</table>";
}
?>
an IP address check implies that we are doing this profiling on the working site
Though I doubt it's PHP itself. Most likely it's database. So, pay most attention to query execution timing.
however, a "site" term is very broad. It includes also JS, CSS, images and stuff. So, I'd suggest to start form FirebFug's Net page to see what part of whole page takes more time.
Of course, refining can be done only after analysis of profiling results, and cannot be advised here without it.
Your best bet is Xdebug. Im happy as it comes bundled in my PHPed IDE. I can get profiler data at the click of a button.
So maybe you could consider that.
I had similar issues and so I created 2 new tables on the database and two new functions. One was audit_sql and the other was audit_code. Because I used an SQL abstraction class it was easy to time every single SQL call (I used php microtime as some others have suggested). So, I called microtime before and after the SQL call and stored the results on the database.
Similarly with pages. I called microtime at the start and end of each page and if necessary at the start and end of functons, divs - whatever I thought might be a culprit.
The general results were:
SQL calls to MySQL were almost instantaneous and were nto a problem at all. The only thing I would say is that even I was surprised at the number being executed! The site is generated from the database - even the menus, permissions etc. To produce the home page the SQL calls were measured in the 100s.
PHP was not the culprit. This was even more instantaneous that MySQL.
The culprit was.... (big build up!) calls to You Tube and Picassa and other sites like that. I host videos and photo albums on the site (well, I don't actually store them - they are stored on YT etc.) and on the home page are thumbnails that are extracted from You Tube and the like via the You Tube PHP API/Zend Framework. Because this is all http based to the other sites, each one was taking 1, 2 or 3 seconds. This was causing those divs containing these to take between 6 and 12 seconds and the home page up to 17 seconds.
The solution - store all thumbnails on my server. The first time one has to be served from the remote site (YT, Picassa etc.) so do that and then store it on your own site. Future times, you check if you have it and if so serve it always from your server. Cuts the page load time down to 2-3 seconds tops. Granted the first person to view the first home page load after someone has loaded more videos/images will take some time, but not thereafter. People will put a long one-off page load time down to their connection/the internet in general. Too many slow loads of your site and they will stop visiting!
I hope that helps somewhat.

Categories