I'm building a WordPress theme and trying to determine what is causing it to run so slowly. Myfunctions.php is about 500 lines and has functions that interact with the database such as
add_action( 'wp_ajax_nopriv_get_N_more', 'get_N_more' );
add_action( 'wp_ajax_get_N_more', 'get_N_more' );
$N = 9;
function get_N_more ( )
{
global $wpdb, $N;
$thisTable = ($_POST['workType'] === 'projs') ? 'wp_nas_projs' : 'wp_nas_cases';
$q = $wpdb->prepare( "SELECT id,compname,descsmall,sumsmall,imageurl FROM $thisTable ORDER BY postdate DESC LIMIT %d, %d", intval($_POST['numItemsLoaded']), $N);
$nextNRows = $wpdb->get_results($q);
removeEscapes($nextNRows);
$stillSomeLeft = ($_POST['numItemsLoaded'] + $N) < count($wpdb->get_results($wpdb->prepare("SELECT * FROM $thisTable")));
// ^ would like more efficient way of doing above
die(json_encode(array('stillSomeLeft' => $stillSomeLeft, 'workItems' => $nextNRows)));
}
Such functions are not called every time the script is read, however I'm wondering if having them there in the first place could be creating performance slowdowns if PHP has read through them every time.
The short answer is maybe.
If you are calling do_action(...) several times on those actions you have added, then it could be the case that multiple ajax requests are being made and either are timing out or being cancelled as new requests are being issued.
The main reason that you could have a very slow running page is that you are working directly with the database. If these are custom tables then there is a good probability that your indexes are not set correctly and the query is taking forever to return or is timing out.
You should consider using get_posts or the WP_Query for posts or pages as you can then use the native wordpress functions to extract data. If you have made a custom post type then store your data in the post_meta and use wordpress core functions versus a secondary table.
In general if your queries will be returning the same information on nearly every page load you might want to consider storing them to cache so you are not recalculating every page load.
What does all of this mean?
Check your database table indexes. Run the query as it is built by your code in something like mysql workbench.
Open up your browser console or firebug and look at the ajax requests being made when you load your pages.
Check all the places you are calling do_action and adjust accordingly.
When debugging print things to a file or to the screen. This will help tremendously.
PHP must only be reading your functions once during the request, therefore not causing a slowdown.
Look at the definition of your function:
function get_N_more ( )
{
}
It's not guarded in a check to function_exists(). This means that, if this file were parsed anywhere more than once, execution would fail and report an error that your get_N_more function is already declared in this scope.
The complexity and runtime efficiency of the code is never taken into account during parsing of the script, and 500 lines is a relatively trivial amount of code.
tl;dr: This isn't the cause of your slowdown.
Related
I have a foreach in cakephp that processes products from a distributor, but the thing is the lists have up to 200products each product can have 3 big pictures with 2 resizes.
So i have in total 1200 big actions to much for one request.
I breaked the foreach at each 10 products, removing them from the array and redirected to the same page. But after a while I get a redirect loop.
Any ideeas on how to avoid this?
If I add another page in this redirect freenzy will it work?
The redirect loop appears only when redirecting in the same page?
The thing is the loop will end, but the browser doesn't know that.
$this->data = $this->Session->read('Parser.data');
$limit = 0;
foreach ($this->data as $key => $data):
$limit++;
if ($limit == 4)
$this->redirect($this->here);
...
$this->Session->delete('Parser.data.' . $key);
endforeach;
$this->redirect(array('controller' => 'parser', 'action' => 'index')); //if $this->data is empty it redirects to upload page
The server work with any number of records from what I have tested, but I have this action along the lines:
$this->getImage(WWW_ROOT . $folder . DS, $new_path, $image['path']);
which looks like this:
protected function getImage($folder = null, $path = null, $from = null) {
if (isset($from) && !empty($from))
file_put_contents($folder . $path, file_get_contents($from));
}
this loads up the server's memory and crashes.
This is why I have to break the foreach a couple of times.
I also tried other functions to get the images as cUrl, but with same results!
Let me copy my answer from another very similar question:
Never use URLs to do these kind of tasks, it is simply plain wrong, insecure and can cause your script to die or the server to become not responding any more.
Lets say you have 10000 users and a script runtime of 30 sec, it is very likely that the script times out before it finished and you end up with just a part of your users being processed at this time. The other scenario with a high or infinite amount of script runtime can lock your server. Depending on the script or DB actions it might cause the server to have a high load and users who use the site while the script is running will encounter a horrible slow to non responding site.
Also you can't really run a loop on a single URL, well you could redirect from one to another that does the limit and offset thing to simulate a loop over the 100000 users. If you don't loop over the records but fetch all 100000 at the same time it's likely your script dies because of running out of memory.
You should create a shell that processes the users in a loop and always just processes batches of for example 10, 50 or 100 users.
When executing your shell I recommend to use it with the "nice" command together to limit the amount of CPU time the shell is allowed to use to prevent the shell from taking 100% CPU usage to keep your site responding.
Look at creating a shell
and setting up a cron in cake.
My platform is PHP 5.2, Apache, Magento EE 1.9 and CentOS.
I have a pretty basic script which is fetching about 60,000 rows of data from an MS-SQL database using PHP's ms_sql() functions. The data is then processed a bit via data from Magento and finally written to a text file.
Really simple stuff...
$result = mssql_query($query);
while($row = mssql_fetch_assoc($result)) {
$member = $row; // Copied so I can modify it
// Do some stuff with each row... e.g.:
$customer = Mage::getModel("customer/customer");
$customer->loadByEmail($member["email"]);
$customerId = $customer->getId();
// Some more stuff like that...
$ordersCollection = Mage::getResourceModel('sales/order_collection');
// ...........
// Some more stuff like that...
$wishList = Mage::getModel('wishlist/wishlist')->loadByCustomer($customer);
// ...........
// Write straight to a file
fwrite($fp, implode("\t", $member) . "\r\n");
// Probably not even necessary
unset($member);
}
The problem is, the memory usage of my script increases with each iteration of the loop (about 10MB for every 300 rows), with a theoretical peak of about 2GB (though it hasn't got there yet).
I've taken great pains to ensure that I'm not leaving any data in memory. No huge arrays are building up, no variables are being added to, everything is either unset() or directly overwritten with each iteration of the loop.
So my question is: could the Magento functions be causing memory leaks?
And if so, how do I stop them from doing so?
Ideally this script should be totally "passive": just grab the query results, modify them a bit (very temporary memory needed for this) then dump them straight to a file and destroy the memory. But this is not happening!
Thanks
Exclude all Mage:: from your code and just dump data to the file without processing. And see what happens to the memory while doing this. Then start adding the Mage:: functions back one by one and see when it breaks.
This way you'll find the culprit. Then you need to start digging into it's implementation and see what could go wrong. You could also consider doing the processing without relying on your Mage:: calls. Just write the plain code to deal with the data in self-contained functions/classes and compare how things turn out if you exclude Mage:: entirely from the process.
Yes — PHP has a long history of non-ideal behavior when it comes to memory managment and code that pushes the edges of it's object oriented model.
You can try an alternate method of querying for your data that wastes less memory, or you can read up on how the Magento core team deals with this same issue.
I wrote a web spider to spider pages concurrently. For each link that the spider finds, I want to fork off a new child that starts the process all over again.
I don't want to overload the target server so I created a static array that all objects can access. Each child can add their PID to the array, and either parent or child should check the array to see if $maxChildren have been met, and if so, patiently wait until any child finishes.
As you see, I have $maxChildren set to 3. I am expecting to see 3 simultaneous processes at any given time. However, that's not the case. The linux top command shows 12 to 30 processes at any given time. In concurrent programming, how can I regulate the number of simultaneous processes? My logic is currently inspired by how Apache handles it's max children, but I'm not exactly sure how that works.
As pointed out in one of the answers, globally accessing the static variable brings up issues with race conditions. To deal with this, the $children array takes the unique $PID of the process as both the key and it's value, thereby creating a unique value. My thinking is that since any object can only deal with one $children[$pid] value, locking is not necessary. Is this not true? Is there a chance that two processes could try to unset or add the same value at some point?
private static $children = array();
private $maxChildren = 3;
public function concurrentSpider($url) {
// STEP 1:
// Download the $url
$pageData = http_get($url, $ref = '');
if (!$this->checkIfSaved($url)) {
$this->save_link_to_db($url, $pageData);
}
// STEP 2:
// extract all hyperlinks from this url's page data
$linksOnThisPage = $this->harvest_links($url, $pageData);
// STEP 3:
// Check the links array from STEP 2 to see if this page has
// already been saved or is excluded because of any other
// logic from the excluded_link() function
$filteredLinks = $this->filterLinks($linksOnThisPage);
shuffle($filteredLinks);
// STEP 4: loop through each of the links and
// repeat the process
foreach ($filteredLinks as $filteredLink) {
$pid = pcntl_fork();
switch ($pid) {
case -1:
print "Could not fork!\n";
exit(1);
case 0:
if ($this->checkIfSaved($filteredLink)) {
exit();
}
//$pid = getmypid();
print "In child with PID: " . getmypid() . " processing $filteredLink \n";
$var[$pid]->concurrentSpider($filteredLink);
sleep(2);
exit(1);
default:
// Add an element to the children array
self::$children[$pid] = $pid;
// If the maximum number of children has been
// achieved, wait until one or more return
// before continuing.
while (count(self::$children) >= $this->maxChildren) {
//print count(self::$children) . " children \n";
$pid = pcntl_waitpid(-1, $status);
unset(self::$children[$pid]);
}
}
}
}
This is written in PHP. I know that the pcntl_waitpid function with argument of -1 waits for any child to complete regardless of the parent (http://php.net/manual/en/function.pcntl-waitpid.php).
What's wrong with my logic and how can I correct it so that only $maxChildren processes are running simultaneously? I'm also open to improving the logic in general if you have suggestions.
First thing to note: if this is truly a global being shared among multiple threads, it's possible that multiple threads are adding to it at once and you're running afoul of a race condition. You need some sort of concurrency control to ensure that only one process is accessing your global array at once.
Also, try the simple debugging trick of having each process write out (to the console or to a file) its PID and the full contents of the global array each time a new spider is forked. It will help you to check your assumptions (which are plainly wrong at some point) and figure out what's going wrong.
EDIT: (In response to the comments)
I'm not a PHP developer, but if I had to guess, based on the fact that you're using an OS tool that counts OS-level processes, I'd guess that your fork is spawning multiple processes, but your static array is global within the current process. Implementing system-wide shared memory is a lot more complicated!
If you just want to count something and ensure that instances of a shared resource don't grow out of control, look into semaphores, and see if you can find a way in PHP to create a named semaphore object that can be shared between multiple instances of your spider.
Use a real programming language ;)
Step 1 is kind of bad why are you downloading if it might be in the db. Put that inside the if and see if you can put a mutex around it. Maybe so something in sql to imitate one.
I hope harvest_links uses a proper html processor with css selector support (i like fizzler for .NET). I guess regular expression would be fine if its just to get links but it is possible to mess up.
I see step 4 and i don't think its bad but personally i'd do it a different way.
I'd have something like step one to insert url,page,flag into a db. Then i'd have another process or the same one ask the db for unprocessed pages and set the flag to some value if it errors and another if its successful. This is so if something fails of the process exits (shutdown, crash, power out, etc) it can pick it up easily and don't need to scan every page to find where it left off. It just ask the database for the next link and redoes what it didnt finish
PHP doesn't support multithreading, therefore it doesn't support mutexes or any other synchronization methods. As others have said in their answers, this will lead to a race condition.
You'll have to write a wrapper in C or bash. That way, the PHP script can submit targets to the wrapper, and the wrapper will handle scheduling.
Another approach is to rewrite your spider in Python or Ruby, both of which support multithreading. That will eliminate the need for interprocess communication.
Edit: On second thought, the best way is to write the wrapper in Python or Ruby and reuse your existing PHP code as a black box. That's a compromise of the solutions above.
If the spider is for practical purposes, you might want to google "curl multithread"
cURL Multi Threading with PHP
I tried to run a massive update of field values through an API and I ran into maximum execution time for my PHP script.
I divided my job into smaller tasks to run them asynchronously as smaller jobs...
Asynchronous PHP calls?
I found this post and It looks about right but the comments are a little off-putting... Will using curl to run external script files prevent the caller file triggering maximum execution time or will the curl still wait for a response from the server and kill my page?
The question really is: How do you do asynchronous jobs in PHP? Something like Ajax.
EDIT::///
There is a project management tool which has lots of rows of data.
I am using this tools API to access the rows of data and display them on my page.
The user using my tool will select multiple rows of data with a checkbox, and type a new value into a box.
The user will then press an "update row values" button which runs an update script.
this update script divides the hundreds or thousands of items possibly selected into groups of 100.
At this point I was going to use some asynchronous method to contact the project management tool and update all 100 items.
Because when it is updating those items, it could take that server a long time to run its process, I need to make sure that my original page splitting those jobs is no longer waiting for a request from that operation so that I can fire off more requests to update items. and allow my server page to say to my user "Okay, the update is currently happening, it may take a while and we'll send an email once its complete".
$step = 100;
$itemCount = GetItemCountByAppId( $appId );
$loopsRequired = $itemCount / $step;
$loopsRequired = ceil( $loopsRequired );
$process = array();
for( $a = 0; $a < $loopsRequired; $a++ )
{
$items = GetItemsByAppId( $appId, array(
"amount" => $step,
"offset" => ( $step * $a )
) );
foreach( $items[ "items" ] as $key => $item )
{
foreach( $fieldsGroup as $fieldId => $fieldValues )
{
$itemId = $item->__attributes[ "item_id" ];
/*array_push( $process, array(
"itemId" => $itemId,
"fieldId" => $fieldId,
) );*/
UpdateFieldValue( $itemId, $fieldId, $fieldValues );
// This Update function is actually calling the server and I assume it must be waiting for a response... thus my code times out after 30 secs of execution
}
}
//curl_post_async($url, $params);
}
If you are using PHP-CLI, try Threads, or fork() for non-thread-safe version.
Depending on how you implement it, asynchronous PHP might be used to decouple the web request from the processing and therefore isolate the web request from any timeout in the procesing (but you could do the same thing within a single thread). Will breaking the task into smaller concurrent parts make it run faster? Probably not - usually this will extend the length of time it takes for the job to complete - about the only time this is not the case is when you've got a very large processing capacity and can distribute the task effective (e.g. map-reduce). Are HTTP calls (curl) an efficient way to distribute work like this? No. There are other methods, including synchronous and asynchronous messaging, batch processing, process forking, threads....each with their own benefits and complications - and we don't know what the problem you are trying to solve is.
So even before we get to your specific questions, this does not look like a good strategy.
Will using curl to run external script files prevent the caller file triggering maximum execution time
It will be constrained by whatever timeouts are configured on the target server - if that's the same server as the invoking script, then it will be the same timeouts.
will the curl still wait for a response from the server and kill my page?
I don't know what you're asking here - it rather implies that there are functional dependenciesyou've not told us about.
It sounds like you've picked a solution and are now trying to make it fit your problem.
A site I am working with is starting to get a little sluggish, and I would like to refine it. I think the problem is with the PHP, but I can't be sure. How can I see how long functions are taking to perform?
If you want to test the execution time :
<?php
$startTime = microtime(true);
// Your content to test
$endTime = microtime(true);
$elapsed = $endTime - $startTime;
echo "Execution time : $elapsed seconds";
?>
Try the profiler feature in XDebug or Zend Debugger?
Two things you can do.
place Microtime calls everywhere although its not convenient if you want to test more than one function. So there is a simpler way to do it a better solution if you want to test many functions which i assume you would like to do.
just have a class (click on link to follow tutorial) where you can test how long all your functions take. Rather than place microtime everywhere. you just use this class. which is very convenient
http://codeaid.net/php/calculate-script-execution-time-%28php-class%29
the second thing you can do is to optimize your script is by taking a look at the memory usage.
By observing the memory usage of your scripts, you may be able optimize your code better.
PHP has a garbage collector and a pretty complex memory manager. The amount of memory being used by your script. can go up and down during the execution of a script. To get the current memory usage, we can use the memory_get_usage() function, and to get the highest amount of memory used at any point, we can use the memory_get_peak_usage() function.
view plaincopy to clipboardprint?
echo "Initial: ".memory_get_usage()." bytes \n";
/* prints
Initial: 361400 bytes
*/
// let's use up some memory
for ($i = 0; $i < 100000; $i++) {
$array []= md5($i);
}
// let's remove half of the array
for ($i = 0; $i < 100000; $i++) {
unset($array[$i]);
}
echo "Final: ".memory_get_usage()." bytes \n";
/* prints
Final: 885912 bytes
*/
echo "Peak: ".memory_get_peak_usage()." bytes \n";
/* prints
Peak: 13687072 bytes
*/
http://net.tutsplus.com/tutorials/php/9-useful-php-functions-and-features-you-need-to-know/
PK
You can also make it manually, by recording microtime() value in various places, like this:
<?
$TIMER['start']=microtime(TRUE);
// some code
$query="SELECT ...";
$TIMER['before q']=microtime(TRUE);
$res=mysql_query($query);
$TIMER['after q']=microtime(TRUE);
while ($row = mysql_fetch_array($res)) {
// some code
}
$TIMER['array filled']=microtime(TRUE);
// some code
$TIMER['pagination']=microtime(TRUE);
/and so on
?>
and then visualize it
<?
if ('127.0.0.1' === $_SERVER['REMOTE_ADDR']) {
echo "<table border=1><tr><td>name</td><td>so far</td><td>delta</td><td>per cent</td></tr>";
reset($TIMER);
$start=$prev=current($TIMER);
$total=end($TIMER)-$start;
foreach($TIMER as $name => $value) {
$sofar=round($value-$start,3);
$delta=round($value-$prev,3);
$percent=round($delta/$total*100);
echo "<tr><td>$name</td><td>$sofar</td><td>$delta</td><td>$percent</td></tr>";
$prev=$value;
}
echo "</table>";
}
?>
an IP address check implies that we are doing this profiling on the working site
Though I doubt it's PHP itself. Most likely it's database. So, pay most attention to query execution timing.
however, a "site" term is very broad. It includes also JS, CSS, images and stuff. So, I'd suggest to start form FirebFug's Net page to see what part of whole page takes more time.
Of course, refining can be done only after analysis of profiling results, and cannot be advised here without it.
Your best bet is Xdebug. Im happy as it comes bundled in my PHPed IDE. I can get profiler data at the click of a button.
So maybe you could consider that.
I had similar issues and so I created 2 new tables on the database and two new functions. One was audit_sql and the other was audit_code. Because I used an SQL abstraction class it was easy to time every single SQL call (I used php microtime as some others have suggested). So, I called microtime before and after the SQL call and stored the results on the database.
Similarly with pages. I called microtime at the start and end of each page and if necessary at the start and end of functons, divs - whatever I thought might be a culprit.
The general results were:
SQL calls to MySQL were almost instantaneous and were nto a problem at all. The only thing I would say is that even I was surprised at the number being executed! The site is generated from the database - even the menus, permissions etc. To produce the home page the SQL calls were measured in the 100s.
PHP was not the culprit. This was even more instantaneous that MySQL.
The culprit was.... (big build up!) calls to You Tube and Picassa and other sites like that. I host videos and photo albums on the site (well, I don't actually store them - they are stored on YT etc.) and on the home page are thumbnails that are extracted from You Tube and the like via the You Tube PHP API/Zend Framework. Because this is all http based to the other sites, each one was taking 1, 2 or 3 seconds. This was causing those divs containing these to take between 6 and 12 seconds and the home page up to 17 seconds.
The solution - store all thumbnails on my server. The first time one has to be served from the remote site (YT, Picassa etc.) so do that and then store it on your own site. Future times, you check if you have it and if so serve it always from your server. Cuts the page load time down to 2-3 seconds tops. Granted the first person to view the first home page load after someone has loaded more videos/images will take some time, but not thereafter. People will put a long one-off page load time down to their connection/the internet in general. Too many slow loads of your site and they will stop visiting!
I hope that helps somewhat.