Breaking Up Massive MySQL Update - php

Right now I have something like this in my CodeIgniter model:
<?php
$array = array(...over 29k IDs...);
$update = array();
foreach ($array as $line) {
$update[] = array('id' => $line, 'spintax' => $this->SpinTax($string));
### $this->SpinTax parses the spintax from a string I have. It has to be generated for each row.
}
$this->db->update_batch('table', $update, 'id');
?>
The first 20k records get updated just fine, but I get a 504 Gateway Time-out before it completes.
I have tried increasing the nginx server timeout to something ridiculous (like 10 minutes), and I still get the error.
What can I do to make this not timeout. I've read many answers and HOW-TOs to segment the update, but I continue to get the server timeout. A PHP or CodeIgniter solution would be excellent, and I need to deploy this code to multiple servers that might not be using nginx (similar error in Apache).
Thanks in advance.

You'll likely need to run this through command line and set_time_limit(0). IF you're in codeigniter, check this out on how to run a command line through the user guide. http://codeigniter.com/user_guide/general/cli.html
Now, before you do that, you mentioned you are using array chunk. If you're getting all the values from the database, no need to use array_chunk. Just set a get variable for instance.
/your/url?offset=1000, when that finishes, do a redirect to the same thing, but with 2000 and so on until it finishes.
Not the nicest or cleanest, but will likely get it done.

Related

2006 MySQL server has gone away while saving object

Im getting this error "General error: 2006 MySQL server has gone away" when saving an object.
Im not going to paste the code since it way too complicated and I can explain with this example, but first a bit of context:
Im executing a function via Command line using Phalcon tasks, this task creates a Object from a Model class and that object calls a casperjs script that performs some actions in web page, when it finishes it saves some data, here's where sometimes I get mysql server has gone away, only when the casperjs takes a bit longer.
Task.php
function doSomeAction(){
$object = Class::findFirstByName("test");
$object->performActionOnWebPage();
}
In Class.php
function performActionOnWebPage(){
$result = exec ("timeout 30s casperjs somescript.js");
if($result){
$anotherObject = new AnotherClass();
$anotherObject->value = $result->value;
$anotherObject->save();
}
}
It seems like the $anotherObject->save(); method is affected by the time exec ("timeout 30s casperjs somescript.js"); takes to get an answer, when it shouldn`t.
Its not a matter of the data saved since it fails and saves succesfully with the same input, the only difference I see is the time casperjs takes to return a value.
It seems like if for some reason phalcon opens the MySQL conection during the whole execution of the "Class.php" function, provoking the timeout when casperjs takes too long, does this make any sense? Could you help me to fix it or find a workaround to this?
Problem seems that either you are trying to fetch heavy data in single packet than allowed in your mysql config file or your wait_timeout variable value is not set properly as per your code requirement.
check your wait_timeout and max_allowed_packet values, you can check by below command-
SHOW GLOBAL VARIABLES LIKE 'wait_timeout';
SHOW GLOBAL VARIABLES LIKE 'max_allowed_packet';
And increase these values as per your requirement in your my.cnf (linux) or my.ini (windows) config file and restart mysql service.

Long PHP script runs multiple times

I have a products database that synchronizes with product data ever morning.
The process is very clear:
Get all products from database by query
Loop through all products, and get and xml from the other server by product_id
Update data from xml
Log the changes to file.
If I query a low amount of items, but limiting it to 500 random products for example, everything goes fine. But when I query all products, my script SOMETIMES goes on the fritz and starts looping multiple times. Hours later I still see my log file growing and products being added.
I checked everything I could think of, for example:
Are variables not used twice without overwriting each other
Does the function call itself
Does it happen with a low amount of products too: no.
The script is called using a cronjob, are the settings ok. (Yes)
The reason that makes it especially weird is that it sometimes goes right, and sometimes it doesnt. Could this be some memory problem?
EDIT
wget -q -O /dev/null http://example.eu/xxxxx/cron.php?operation=sync its in webmin called on a specific hour and minute
Code is hundreds of lines long...
Thanks
You have:
max_execution_time disabled. Your script won't end until the process is complete for as long as it needed.
memory_limit disabled. There is no limit to how much data stored in memory.
500 records were completed without issues. This indicates that the scripts completes its process before the next cronjob iteration. For example, if your cron runs every hour, then the 500 records are processed in less than an hour.
If you have a cronjob that is going to process large amount of records, then consider adding lock mechanism to the process. Only allow the script to run once, and start again when the previous process is complete.
You can create script lock as part of a shell script before executing your php script. Or, if you don't have an access to your server you can use database lock within the php script, something like this.
class ProductCronJob
{
protected $lockValue;
public function run()
{
// Obtain a lock
if ($this->obtainLock()) {
// Run your script if you have valid lock
$this->syncProducts();
// Release the lock on complete
$this->releaseLock();
}
}
protected function syncProducts()
{
// your long running script
}
protected function obtainLock()
{
$time = new \DateTime;
$timestamp = $time->getTimestamp();
$this->lockValue = $timestamp . '_syncProducts';
$db = JFactory::getDbo();
$lock = [
'lock' => $this->lockValue,
'timemodified' => $timestamp
];
// lock = '0' indicate that the cronjob is not active.
// Update #__cronlock set lock = '', timemodified = '' where name = 'syncProducts' and lock = '0'
// $result = $db->updateObject('#__cronlock', $lock, 'id');
// $lock = SELECT * FROM #__cronlock where name = 'syncProducts';
if ($lock !== false && (string)$lock !== (string)$this->lockValue) {
// Currently there is an active process - can't start a new one
return false;
// You can return false as above or add extra logic as below
// Check the current lock age - how long its been running for
// $diff = $timestamp - $lock['timemodified'];
// if ($diff >= 25200) {
// // The current script is active for 7 hours.
// // You can change 25200 to any number of seconds you want.
// // Here you can send notification email to site administrator.
// // ...
// }
}
return true;
}
protected function releaseLock()
{
// Update #__cronlock set lock = '0' where name = 'syncProducts'
}
}
Your script is running for quite some time (~45m) and wget think it's "timing out" since you don't return any data. By default wget will have a 900s timeout value and a retry count of 20. So first you should probably change your wget command to prevent this:
wget --tries=0 --timeout=0 -q -O /dev/null http://example.eu/xxxxx/cron.php?operation=sync
Now removing the timeout could lead to other issue, so instead you could send (and flush to force webserver to send it) data from your script to make sure wget doesn't think the script "timed out", something every 1000 loops or something like that. Think of this as a progress bar...
Just keep in mind that you will hit an issue when the run time will get close to your period as 2 crons will run in parallel. You should optimize your process and/or have a lock mechanism maybe?
I see two possibilities:
- chron calls the script much more often
- script takes too long somehow.
you can try estimate the time a single iteration of the loop takes.
this can be done with time(). perhaps the result is suprising, perhaps not. you can probably get the number of results too. multiply the two, that way you will have an estimate of how long the process should take.
$productsToSync = $db->loadObjectList();
and
foreach ($productsToSync AS $product) {
it seems you load every result into an array. this wont work for huge databases because obviously a million rows wont fit in memory. you should just get one result at a time. with mysql there are methods that just fetch one thing at a time from the resource, i hope yours allows the same.
I also see you execute another query each iteration of the loop. this is something I try to avoid. perhaps you can move this to after the first query has ended and do all of those in one big query? otoh this may bite my first suggestion.
also if something goes wrong, try to be paranoid when debugging. measure as much as you can. time as much as you can when its a performance issue. put the timings in you log file. usually you will find the bottleneck.
I solved the problem myself. Thanks for all the replies!
My MySQL timed out, that was the problem. As soon as I added:
ini_set('mysql.connect_timeout', 14400);
ini_set('default_socket_timeout', 14400);
to my script the problem stopped. I really hope this helps someone. Ill upvote all the locking answers, because those were very helpful!

Process to big for one request, when splitted in multiple request too the same page i get redirect loop

I have a foreach in cakephp that processes products from a distributor, but the thing is the lists have up to 200products each product can have 3 big pictures with 2 resizes.
So i have in total 1200 big actions to much for one request.
I breaked the foreach at each 10 products, removing them from the array and redirected to the same page. But after a while I get a redirect loop.
Any ideeas on how to avoid this?
If I add another page in this redirect freenzy will it work?
The redirect loop appears only when redirecting in the same page?
The thing is the loop will end, but the browser doesn't know that.
$this->data = $this->Session->read('Parser.data');
$limit = 0;
foreach ($this->data as $key => $data):
$limit++;
if ($limit == 4)
$this->redirect($this->here);
...
$this->Session->delete('Parser.data.' . $key);
endforeach;
$this->redirect(array('controller' => 'parser', 'action' => 'index')); //if $this->data is empty it redirects to upload page
The server work with any number of records from what I have tested, but I have this action along the lines:
$this->getImage(WWW_ROOT . $folder . DS, $new_path, $image['path']);
which looks like this:
protected function getImage($folder = null, $path = null, $from = null) {
if (isset($from) && !empty($from))
file_put_contents($folder . $path, file_get_contents($from));
}
this loads up the server's memory and crashes.
This is why I have to break the foreach a couple of times.
I also tried other functions to get the images as cUrl, but with same results!
Let me copy my answer from another very similar question:
Never use URLs to do these kind of tasks, it is simply plain wrong, insecure and can cause your script to die or the server to become not responding any more.
Lets say you have 10000 users and a script runtime of 30 sec, it is very likely that the script times out before it finished and you end up with just a part of your users being processed at this time. The other scenario with a high or infinite amount of script runtime can lock your server. Depending on the script or DB actions it might cause the server to have a high load and users who use the site while the script is running will encounter a horrible slow to non responding site.
Also you can't really run a loop on a single URL, well you could redirect from one to another that does the limit and offset thing to simulate a loop over the 100000 users. If you don't loop over the records but fetch all 100000 at the same time it's likely your script dies because of running out of memory.
You should create a shell that processes the users in a loop and always just processes batches of for example 10, 50 or 100 users.
When executing your shell I recommend to use it with the "nice" command together to limit the amount of CPU time the shell is allowed to use to prevent the shell from taking 100% CPU usage to keep your site responding.
Look at creating a shell
and setting up a cron in cake.

PHP resets variables

I'm trying to create a script that creates unique codes and writes them to a textfile.
I've managed to generate the codes, and write them to the file.
Now my problem is the fact that my loop keeps running, resulting in over 92 000 codes being written to the file, before the server times-out.
I've done some logging, and it seems that everything works fine, it's just that after a certain amount of seconds, all my variables are reset and everything starts from scratch. The time interval after which this happens varies from time to time.
I've already set ini_set('memory_limit', '200M'); ini_set('max_execution_time',0); at the top of my script. Maybe there's a php time-out setting I'm missing?
The script is a function in a controller. I set the ini_set at the beginning of this function. This is the loop I'm going through:
public function generateAction() {
ini_set('memory_limit', '200M');
ini_set('max_execution_time',0);
$codeArray = array();
$numberOfCodes = 78000;
$codeLength = 8;
$totaalAantal = 0;
$file = fopen("codes.txt","a+");
while(count($codeArray)<$numberOfCodes){
$code = self::newCode($codeLength);
if(!in_array($code,$codeArray))
{
$totaalAantal++;
$codeArray[] = $code;
fwrite($file,'total: '.$totaalAantal."\r\n");
}
}
fclose($file);
}
In the file this would give something like this:
total: 1
total: 2
total: ...
total: 41999
total: 42000
total: 1
total: 2
total: ...
total: 41999
total: 42000
Thanks.
Edit: so far we've established that the generateAction() is called 2 or 3 times, before the end of the script, when it should only be called once.
I already found the solution for this problem.
The host's script limit was set to 90 seconds, and because this script had to run for longer, I had to run it via the command line.
Taking account of the test with uniqid(), we can say that variables are not reseted, but the method generateAction() is called several times.
Since you code is probably synchronous, we may say that generateAction() is called several times because the main script is called several times.
What happens in detail?
Because of the nature of your algorithm, each pass in the loop is slower then the previous one. So the duration of executing generateAction() may be quite long.
You probably don't wait for the end, and you stop the process or even start the process from a new page. Nevertheless, the process don't really stop so soon, and it keeps running in back-end. I've observed such a behavior on my local WAMP/LAMP installation: the script is not actually stopped even if I stop the page, if I close the page, even if I close the navigator or if I restart Apache.
So it happens to you that several script processes are writing simultaneously in the codes.txt file.
In order to avoid this, you can for example lock the file during the loop using function flock().

check cron job has run script properly - proper way to log errors in batch processing

I have set up a cronjob to run a script daily. This script pulls out a list of Ids from a database, loops through each to get more data from the database and geneates an XML file based on the data retrieved.
This seems to have run fine for the first few days, however, the list of Ids is getting bigger and today I have noticed that not all of the XML files have been generated. It seems to be random IDs that have not run. I have manually run the script to generate the XML for some of the missing IDs individually and they ran without any issues.
I am not sure how to locate the problem as the cron job is definately running, but not always generating all of the XML files. Any ideas on how I can pin point this problem and quickly find out which files have not been run.
I thought perhaps add timestart and timeend fields to the database and enter these values at the start and end of each XML generator being run, this way I could see what had run and what hadn't, but wondered if there was a better way.
set_time_limit(0);
//connect to database
$db = new msSqlConnect('dbconnect');
$select = "SELECT id FROM ProductFeeds WHERE enabled = 'True' ";
$run = mssql_query($select);
while($row = mssql_fetch_array($run)){
$arg = $row['id'];
//echo $arg . '<br />';
exec("php index.php \"$arg\"", $output);
//print_r($output);
}
My suggestion would be to add some logging to the script. A simple
error_log("Passing ID:".$arg."\n",3,"log.txt");
Can give you some info on whether the ID is being passed. If you find that that is the case, you can introduce logging to index.php to further evaluate the problem.
Btw, can you explain why you are using exec() to run a php script? Why not excute a function in the loop. This could well be the source of the problem.
Because with exec I think the process will run in the background and the loop will continue, so you could really choke you server that way, maybe that's worth trying out as well. (I think this also depends on the way of outputting:
Note: If a program is started with this function, in order for it to continue running in the background, the output of the program must be redirected to a file or another output stream. Failing to do so will cause PHP to hang until the execution of the program ends.
Maybe some other users can comment on this.
Turned out the apache was timing out. Therefore nothing to do with using a function or the exec() function.

Categories