How to split php cronjob into parts

How to split php cronjob into parts - php

There is a PHP script. It gets data from external API and import(update/delete) data into WordPress database (products for Woocommerce). There are a lot of products... To import all of them the script needs about 2-3 hours.
The problem is that when the script executes, the memory is not cleaned which leads to its overflow. After that, the script just silently dies without any error.
In short, the script looks like this:
$products = getProductsFromApi();
foreach ($products as $key => $product) {
$this->import($product);
}
The idea is to split the cronjob script into parts: if $currentMemory > 100Mb then stop the script and run it again but not from the beginning, from the moment it stopped.
How can this be realized? If there is a restriction on a server: only 1 cronjob script per 2 hours.
Any other ideas?

You can use a tool such as Gearman to create a queue and workers for importing processes. You can program each worker to process a certain amount of products that would take time less than the server's maximum execution time.
Gearman will also allow you to control how many workers can run simultaneously. Therefore, the importing process would be faster and you'll make sure the server resources aren't being totally consumed by workers.

You can serilize the $products array when $currentMemory > 100Mb to a file and then execute the script again:
$limit = 100*1000*1000;
$store = 'products.bin';
$products = [];
if ( !file_exists($store)) {
$products = getProductsFromApi();
} else {
$products = unserialize(file_get_contents($store));
}
foreach ($products as $key => $product) {
$this->import($product);
unset($products[$key]);
if (memory_get_usage() > $limit) {
file_put_contents($store,serialize($products));
exec('nohup /usr/bin/php -f myscript.php');
exit(1);
}
}
unlink ($store);

You can use sleep function
For example
$products = getProductsFromApi();
$i=0;
foreach ($products as $key => $product) {
// you can use your condition here instead of this
if($i%10==0){// run ten times then sleep for 100 second
sleep(100);
}
$this->import($product);
$i++;
}
https://php.net/manual/en/function.sleep.php

Related

execute all web service calls in loop php in parallel

foreach ($items as $k=>$value) {
$valuesChunk = array_chunk($value, 1000);
foreach ( $valuesChunk as $chunk){
//call a web service for this chunk
$ws = WS::getItems($chunk);
//time response 20 seconds
}
}
For example if I want to execute this code for 10000 items it takes 10 X 20 seconds.
Can I execute all the webservice calls in parallel to reduce the response time?

dynamically get results from exec

I have a php script that calls a go script. It gets results every 1-2 seconds, and print's them. Using php's exec and output, I only get the results when the program finishes. Is there a way I can check the output to see when it changes and output that while it's still running?
Something like this, but pausing the execution?:
$return_status = 0;
$output = [];
$old_output = ["SOMETHING ELSE"];
while ($return_status == 0) {
exec($my_program,$output,$return_status); #somehow pause this?
if $output != $old_output {
echo($output);
$old_output = $output;
}
}

Yes. Use the popen() function to get a file handle for the command's output, then read from it a line at a time.

php exec() never ended

I'm using php exec() to run a executable file and it seems never ended.
But running this executable file in shell is ok
Here's the main things the executable file do:
fork();
child process does some time-wasting things.
And I setrlimit a CPU time
In parent process: listen signals and kill child process when the used_time calculated exceeds limit
How can I do to make php exec() work?
Update:
because the code is too long,I just select some of them
main function
child_pid = fork();
if(child_pid == 0)
{
compile();
exit(0);
}
else
{
int res = watch();
if(res)
puts("YES");
else
puts("NO");
}
child process
LIM.rlim_cur = LIM.rlim_max = COMPILE_TIME;
setrlimit(RLIMIT_CPU,&LIM);
alarm(0);
alarm(LIM.rlim_cur * 10);
switch(language)
{
//..... here is execl() to call compiler like gcc,g++,javac
}
parent process
int status = 0;
int used_time = 0;
struct timeval case_startv, case_nowv;
struct timezone case_startz, case_nowz;
gettimeofday(&case_startv, &case_startz);
while(1)
{
usleep(50000);
kill(child_pid,SIGKILL);
gettimeofday(&case_nowv, &case_nowz);
used_time = case_nowv.tv_sec - case_startv.tv_sec;
if(waitpid(child_pid,&status,WNOHANG) == 0) //still running
{
if(used_time > COMPILE_TIME)
{
report_log("Compile time limit exceed");
kill(child_pid,SIGKILL);
return 0;
}
}
else
{
//handle signals
}
}
For test,just the function exec() in php file
The situation what i said only occurred when :
use php exec() run the executable file to compile user code like:
#include "/dev/random"
//....

Php script on server has limited time to execute. It is generally not a good idea to execute long running scripts this way. It is recommended that they be run as background jobs.
Thi is defined in php.ini which is different for apache and shell

At last, I find out why this happened..
I just kill childpid but not kill other process cause by childpid
So php exec() will always run

waiting for all pids to exit in php

My issue is this. I am forking a process so that I can speed up access time to files on disk. I store any data from these files in a tmp file on local desk. ideally, after all processes have finished, I need to access that tmp file and get that data into an array. I then unlink the tmp file as it is no longer needed. My problem is that it would seem that pcntl_wait() does not acutally wait until all child processes are done before moving on to the final set of operations. So I end up unlinking that file before some random process can finish up.
I can't seem to find a solid way to wait for all processes to exit cleanly and then access my data.
$numChild = 0;
$maxChild = 20; // max number of forked processes.
// get a list of "availableCabs"
foreach ($availableCabs as $cab) {
// fork the process
$pids[$numChild] = pcntl_fork();
if (!$pids[$numChild]) {
// do some work
exit(0);
} else {
$numChild++;
if ($numChild == $maxChild) {
pcntl_wait($status);
$numChild--;
}
} // end fork
}
// Below is where things fall apart. I need to be able to print the complete serialized data. but several child processes don't actually exit before i unlink the file.
$dataFile = fopen($pid, 'r');
while(($values = fgetcsv($dataFile,',')) !== FALSE) {
$fvalues[] = $values;
}
print serialize($fvalues);
fclose($dataFile);
unlink($file);
please note that i'm leaving a lot of code out regarding what i'm actually doing, if we need that posted thats not issue.

Try restructuring you code so that you have two loops - one that spawns processes and one that waits for them to finish. You should also use pcntl_waitpid() to check for specific process IDs, rather than the simple child counting approach you are currently using.
Something like this:
<?php
$maxChildren = 20; // Max number of forked processes
$pids = array(); // Child process tracking array
// Get a list of "availableCabs"
foreach ($availableCabs as $cab) {
// Limit the number of child processes
// If $maxChildren or more processes exist, wait until one exits
if (count($pids) >= $maxChildren) {
$pid = pcntl_waitpid(-1, $status);
unset($pids[$pid]); // Remove PID that exited from the list
}
// Fork the process
$pid = pcntl_fork();
if ($pid) { // Parent
if ($pid < 0) {
// Unable to fork process, handle error here
continue;
} else {
// Add child PID to tracker array
// Use PID as key for easy use of unset()
$pids[$pid] = $pid;
}
} else { // Child
// If you aren't doing this already, consider using include() here - it
// will keep the code in the parent script more readable and separate
// the logic for the parent and children
exit(0);
}
}
// Now wait for the child processes to exit. This approach may seem overly
// simple, but because of the way it works it will have the effect of
// waiting until the last process exits and pretty much no longer
foreach ($pids as $pid) {
pcntl_waitpid($pid, $status);
unset($pids[$pid]);
}
// Now the parent process can do it's cleanup of the results

How can I enforce a maximum amount of forked children?

EDIT: I've tagged this C in a hope to get more response. It's more the theory I'm interested in than a specific language implementation. So if you're a C coder please treat the following PHP as pseudo-code and feel free to respond with an answer written in C.
I am trying to speed up a PHP CLI script by having it execute its tasks in parallel instead of serial. The tasks are completely independent of each other so it doesn't matter which order they start/finish in.
Here's the original script (note all these examples are stripped-back for clarity):
<?php
$items = range(0, 100);
function do_stuff_with($item) { echo "$item\n"; }
foreach ($items as $item) {
do_stuff_with($item);
}
I've managed to make it work on the $items in parallel with pcntl_fork() as shown below:
<?php
ini_set('max_execution_time', 0);
ini_set('max_input_time', 0);
set_time_limit(0);
$items = range(0, 100);
function do_stuff_with($item) { echo "$item\n"; }
$pids = array();
foreach ($items as $item) {
$pid = pcntl_fork();
if ($pid == -1) {
die("couldn't fork()");
} elseif ($pid > 0) {
// parent
$pids[] = $pid;
} else {
// child
do_stuff_with($item);
exit(0);
}
}
foreach ($pids as $pid) {
pcntl_waitpid($pid, $status);
}
Now I want to extend this so there's a maximum of, say, 10 children active at once. What's the best way of handling this? I've tried a few things but haven't had much luck.

There is no syscall to get a list of child pids, but ps can do it for you.
--ppid switch will list all children for you process so you just need to count number of lines outputted by ps.
Alternatively you can maintain your own counter that you will increment on fork() and decrement on SIGCHLD signal, assuming ppid stays unchanged for fork'ed processed.

The best thing I can come up with is to add all the tasks to a queue, launch the maximum number of threads you want, and then have each thread requesting a task from the queue, execute the task and requesting the next one. Don't forget to have the threads terminate when there are no more tasks to do.

Forking is an expensive operation. From the looks of it, what you really want is multithreading, not multiprocessing. The difference is that threads are much lighter weight than processes, since threads share a virtual address space but processes have separate virtual address spaces.
I'm not a PHP developer, but a quick Google search reveals that PHP does not support multithreading natively, but there are libraries to do the job.
Anyways, once you figure out how to spawn threads, you should figure out how many threads to spawn. In order to do this, you need to know what the bottleneck of your application is. Is the bottleneck CPU, memory, or I/O? You've indicated in your comments that you are network-bound, and network is a type of I/O.
If you were CPU bound, you're only going to get as much parallelism as you have CPU cores; any more threads and you're just wasting time doing context switches. Assuming you can figure out how many total threads to spawn, you should divide your work into that many units, and have each thread process one unit independently.
If you were memory bound, then multithreading would not help.
Since you're I/O bound, figuring out how many threads to spawn is a little trickier. If all work items take approximately the same time to process with very low variance, you can estimate how many threads to spawn by measuring how long one work item takes. However, since network packets tend to have highly variable latencies, this is unlikely to be the case.
One option is to use thread pools - you create a whole bunch of threads, and then for each item to process, you see if there is a free thread in the pool. If there is, you have that thread perform the work, and you move onto the next item. Otherwise, you wait for a thread to become available. Choosing the size of the thread pool is important - too big, and you're wasting time doing unnecessary context switches. Too few, and you're waiting for threads too often.
Yet another option is to abandon multithreading/multiprocessing and just do asynchronous I/O instead. Since you mentioned you're working on a single-core processor, this will probably be the fastest option. You can use functions like socket_select() to test if a socket has data available. If it does, you can read the data, otherwise you move onto a different socket. This requires doing a lot more bookkeeping, but you avoid waiting for data to come in on one socket when data is available on a different socket.
If you want to eschew threads and asynchronous I/O and stick with multiprocessing, it can still be worthwhile if the per-item processing is expensive enough. You might then do the work division like so:
$my_process_index = 0;
$pids = array();
// Fork off $max_procs processes
for($i = 0; $i < $max_procs - 1; $i++)
{
$pid = pcntl_fork();
if($pid == -1)
{
die("couldn't fork()");
}
elseif($pid > 0)
{
// parent
$my_process_index++;
$pids[] = $pid
}
else
{
// child
break;
}
}
// $my_process_index is now an integer in the range [0, $max_procs), unique among all the processes
// Each process will now process 1/$max_procs of the items
for($i = $my_process_index; $i < length($items); $i += $max_procs)
{
do_stuff_with($items[$i]);
}
if($my_process_index != 0)
{
exit(0);
}

man 2 setrlimit
That's going to be per-user which may be what you want anyway.

<?php
ini_set('max_execution_time', 0);
ini_set('max_input_time', 0);
set_time_limit(0);
$items = range(0, 100);
function do_stuff_with($item) { echo "$item\n"; }
$pids = array();
while (count($items)){
$item = array_pop($items);
$pid = pcntl_fork();
if ($pid == -1) {
die("couldn't fork()");
} elseif ($pid > 0) {
// parent
$pids[] = $pid;
} else {
// child
do_stuff_with($item);
exit(0);
}
while (count($pids) >= 10){ // limit
while (($wait_pid = pcntl_waitpid(0, $status)) != -1) {
$status = pcntl_wexitstatus($status);
array_pop($pids);
echo "$wait_pid $status".PHP_EOL;
break;
}
}
}
while (count($pids)){
while (($wait_pid = pcntl_waitpid(0, $status)) != -1) {
$status = pcntl_wexitstatus($status);
array_pop($pids);
echo "CHILD: child $status completed $wait_pid".PHP_EOL;
break;
}
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How to split php cronjob into parts - php

Related

execute all web service calls in loop php in parallel

dynamically get results from exec

php exec() never ended

waiting for all pids to exit in php

How can I enforce a maximum amount of forked children?

Categories

Resources