foreach ($items as $k=>$value) {
$valuesChunk = array_chunk($value, 1000);
foreach ( $valuesChunk as $chunk){
//call a web service for this chunk
$ws = WS::getItems($chunk);
//time response 20 seconds
}
}
For example if I want to execute this code for 10000 items it takes 10 X 20 seconds.
Can I execute all the webservice calls in parallel to reduce the response time?
Related
I have my code already working, it will parse the files and insert the records, the issue that is stumping me as I have never had to do this is, how can I tell my code to parse 1-300 files then wait then parse the next "batch" 301-500 and so on until it's finished parsing all the files. I'm needed to parse over 50 thousand files, so obviously I'm reaching php's memory limit and execution time which has already been increased but I don't think I could set it extremely high to process 50 thousand.
I need help with how do I tell my code to run 1-x then rerun and run x-y?
My code is (Note, I am gathering more information that what's in my snip below)
$xml_files = glob(storage_path('path/to/*.xml'));
foreach ($xml_files as $file) {
$data = simplexml_load_file($file);
... Parse XML and get certain nodes ...
$name = $data->record->memberRole->member->name;
... SQL to insert record into DB ...
Members::firstOrCreate(
['name' => $name]
);
}
Simplest, if inelegant solution, is calling the script multiple times with an offset and using a for loop instead of forach.
$xml_files = glob(storage_path('path/to/*.xml'));
$offset = $_GET['offset'];
// Or if calling the script via command line:
// $offset = $argv[1];
$limit = $offset + 300;
for ($i = $offset; $i < $limit; $i++) {
$data = simplexml_load_file($xml_files[$i]);
// process and whatever
}
If you're calling the script as a web page, just add a query param like my-xml-parser.php?offset=300 and get offset like this: $offset = $_GET['offset'].
If you're calling this as a command line script, call it like this: php my-xml-parser.php 300, and get the offset from argv: $offset = $argv[1]
EDIT
If it's a web script, you can try and add a curl call that would call itself with the next offset without waiting for an answer.
There is a PHP script. It gets data from external API and import(update/delete) data into WordPress database (products for Woocommerce). There are a lot of products... To import all of them the script needs about 2-3 hours.
The problem is that when the script executes, the memory is not cleaned which leads to its overflow. After that, the script just silently dies without any error.
In short, the script looks like this:
$products = getProductsFromApi();
foreach ($products as $key => $product) {
$this->import($product);
}
The idea is to split the cronjob script into parts: if $currentMemory > 100Mb then stop the script and run it again but not from the beginning, from the moment it stopped.
How can this be realized? If there is a restriction on a server: only 1 cronjob script per 2 hours.
Any other ideas?
You can use a tool such as Gearman to create a queue and workers for importing processes. You can program each worker to process a certain amount of products that would take time less than the server's maximum execution time.
Gearman will also allow you to control how many workers can run simultaneously. Therefore, the importing process would be faster and you'll make sure the server resources aren't being totally consumed by workers.
You can serilize the $products array when $currentMemory > 100Mb to a file and then execute the script again:
$limit = 100*1000*1000;
$store = 'products.bin';
$products = [];
if ( !file_exists($store)) {
$products = getProductsFromApi();
} else {
$products = unserialize(file_get_contents($store));
}
foreach ($products as $key => $product) {
$this->import($product);
unset($products[$key]);
if (memory_get_usage() > $limit) {
file_put_contents($store,serialize($products));
exec('nohup /usr/bin/php -f myscript.php');
exit(1);
}
}
unlink ($store);
You can use sleep function
For example
$products = getProductsFromApi();
$i=0;
foreach ($products as $key => $product) {
// you can use your condition here instead of this
if($i%10==0){// run ten times then sleep for 100 second
sleep(100);
}
$this->import($product);
$i++;
}
https://php.net/manual/en/function.sleep.php
As I know the solution for this is.
ini_set('memory_limit','-1');
What if even this is not enough.
Problem is I am using a loop and creating and destroying the variables used in loop. But still I have not found the exact reason behind this. that memory utilization after every loop execution increases. my loop is going to run almost 2000 to 10000 times. So even 4gb ram is not going to enough.
As I observed using top commond that memory is using 50mb at the begining of loops, once loop goes on it increases size 10 to 15 mb after every iteration. So my code is not getting executed completely.
ini_set('memory_limit', '-1');
ini_set('xdebug.max_nesting_level', 1000);
$ex_data = some data;
$config = some data;
$docConf = some data;
$codeNameIndex = some data;
$originalName = some data;
CONST LIMIT = 3000;
CONST START = 1000;
//till here it is using 55 to 6o mb memory
for ($i = self::START; $i < (self::START + self::LIMIT); $i++) {
$start_memory = memory_get_usage();
$object = new ImportProjectController();
$object->ex_data = $ex_data;
$object->config = $config;
$object->docConf = $docConf;
$StratProInsertDateTime = microtime(true);
try {
DB::connection()->getPdo()->beginTransaction();
$object->ex_data[$codeNameIndex[2]][$codeNameIndex[1]] = $originalName . '_' . $i;
$object->ex_data[$codeCodeIndex[2]][$codeCodeIndex[1]] = $originalCode . '_' . $i;
if (!$object->insert_project()) {
throw new Exception('error while inserting project');
}
if (!$object->insert_documents()) {
throw new Exception('error while inserting documents');
}
App::make('AccessController')->rebuildCache();
DB::connection()->getPdo()->commit();
} catch (Exception $ex) {
DB::connection()->getPdo()->rollBack();
echo $ex;
}
//it is increasing memory utilization every iteration.
echo "Memory used for inserting a ".$i."th project :- ";
echo memory_get_usage() - $start_memory.PHP_EOL;
unset($object->ex_data);
unset($object->config);
unset($object->docConf);
$object = null;
echo "Memory utilization before inserting project :- ";
echo memory_get_usage() - $start_memory.PHP_EOL;
}
$object->insert_project()
$object->insert_documents()
App::make('AccessController')->rebuildCache()
Methods do some database inserts.
As I am unsetting the $object variable at the end of loop. but still it is not releasing the memory. And I am sure there is nothing that occupying the memory in above method.
Swap: 0k total, 0k used, 0k free, 241560k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
27671 ec2-user 20 0 1489m 1.1g 9908 R 66.0 30.4 8:15.00 php
4307 mysql 20 0 852m 140m 5576 S 18.2 3.7 14:21.50 mysqld
Above is the top commond output, as you can clearly see the memory utilization goes to 1.1gb. and it is increasing..
Plz let me know if need more description.
I got answer from my colleague for this problem.
Laravel do a query logging, and all query keeps it in-memory, Thats why I was getting such issue. With the following code my script is running well with the use of only 250mb of memory. Hope this will help to others.
DB::disableQueryLog();
Im trying to create counter that using shared block memory, just look code:
$i=0; $counter = new counter('g');
while($i<3){
$pid = pcntl_fork();
echo $counter->get()."\t".$i."\t".$pid."\n";
$i++;
}
class counter {
protected static $projID = array();
protected $t_key;
protected $length;
function __construct($projID){
!in_array( $projID, self::$projID) or die('Using duplicate project identifer "'.$projID.'" for creating counter');
self::$projID[] = $projID;
$this->t_key = ftok(__FILE__, $projID);
$this->shmid = shmop_open($t_key, 'c', 0755, 64);
$this->length = shmop_write($this->shmid, 0, 0);
shmop_close($this->shmid);
}
function get(){
$sem = sem_get($this->t_key, 1);
sem_acquire($sem);
$shmid = shmop_open($this->t_key, 'c', 0755, 64);
$inc = shmop_read($shmid, 0, $this->length);
$this->length = shmop_write($shmid, $inc+1, 0);
shmop_close($shmid);
sem_release($sem);
return $inc;
}
}
But il get strange result
7 0 2567
8 1 2568
9 0 0
1 1 0
2 2 2569
40 1 2570
4 2 2572
3 2 0
51 2 2571
52 1 0
63 2 0
5 2 0
64 2 2573
65 2 0
I want to create this class for read and write strings in file in multithreading.
You're not ending child processes at all, they'll never finish. You're also not checking whether the process forked correctly or not, there's no control over what's finished processing and in what order. Forking a process isn't really multithreading that other languages provide, all that happens is that the current process is being copied and variables are shared - your $i won't end at 3, nor is there a guarantee which process is finishing first or last.
Try with:
while($i < 3)
{
$pid = pcntl_fork();
if($pid == -1)
{
// some sort of message that the process wasn't forked
exit(1);
}
else
{
if($pid)
{
pcntl_wait($status); // refer to PHP manual to check what this function does
}
else
{
// enter your code here, for whatever you want to be done in parallel
// bear in mind that some processes can finish sooner, some can finish later
// good use is when you have tasks dependent on network latency and you want
// them executed asynchronously (such as uploading multiple files to an ftp or
// synchronizing of something that's being done over network
// after you're done, kill the process so it doesn't become a zombie
posix_kill(getmypid(), 9); // not the most elegant solution, and can fail
}
}
}
You aren't dealing with the PID after your call to pcntl_fork. Your forks are forking because the loop continues to execute and fork.
Unless you're trying to create a localized fork bomb, you probably don't want your forks to fork.
I did some work locally to try and figure out if that alone would solve the problem, but it didn't. It almost looks like the shared memory segment isn't being written to correctly, as if one of the digits on either side of the string is being repeated, which corrupts all of it and forces things to start over.
Complete speculation.
You might want to consider a different way of performing parallel processing with PHP. Using Gearman as a multi-process work queue is a favorite solution of mine.
EDIT: I've tagged this C in a hope to get more response. It's more the theory I'm interested in than a specific language implementation. So if you're a C coder please treat the following PHP as pseudo-code and feel free to respond with an answer written in C.
I am trying to speed up a PHP CLI script by having it execute its tasks in parallel instead of serial. The tasks are completely independent of each other so it doesn't matter which order they start/finish in.
Here's the original script (note all these examples are stripped-back for clarity):
<?php
$items = range(0, 100);
function do_stuff_with($item) { echo "$item\n"; }
foreach ($items as $item) {
do_stuff_with($item);
}
I've managed to make it work on the $items in parallel with pcntl_fork() as shown below:
<?php
ini_set('max_execution_time', 0);
ini_set('max_input_time', 0);
set_time_limit(0);
$items = range(0, 100);
function do_stuff_with($item) { echo "$item\n"; }
$pids = array();
foreach ($items as $item) {
$pid = pcntl_fork();
if ($pid == -1) {
die("couldn't fork()");
} elseif ($pid > 0) {
// parent
$pids[] = $pid;
} else {
// child
do_stuff_with($item);
exit(0);
}
}
foreach ($pids as $pid) {
pcntl_waitpid($pid, $status);
}
Now I want to extend this so there's a maximum of, say, 10 children active at once. What's the best way of handling this? I've tried a few things but haven't had much luck.
There is no syscall to get a list of child pids, but ps can do it for you.
--ppid switch will list all children for you process so you just need to count number of lines outputted by ps.
Alternatively you can maintain your own counter that you will increment on fork() and decrement on SIGCHLD signal, assuming ppid stays unchanged for fork'ed processed.
The best thing I can come up with is to add all the tasks to a queue, launch the maximum number of threads you want, and then have each thread requesting a task from the queue, execute the task and requesting the next one. Don't forget to have the threads terminate when there are no more tasks to do.
Forking is an expensive operation. From the looks of it, what you really want is multithreading, not multiprocessing. The difference is that threads are much lighter weight than processes, since threads share a virtual address space but processes have separate virtual address spaces.
I'm not a PHP developer, but a quick Google search reveals that PHP does not support multithreading natively, but there are libraries to do the job.
Anyways, once you figure out how to spawn threads, you should figure out how many threads to spawn. In order to do this, you need to know what the bottleneck of your application is. Is the bottleneck CPU, memory, or I/O? You've indicated in your comments that you are network-bound, and network is a type of I/O.
If you were CPU bound, you're only going to get as much parallelism as you have CPU cores; any more threads and you're just wasting time doing context switches. Assuming you can figure out how many total threads to spawn, you should divide your work into that many units, and have each thread process one unit independently.
If you were memory bound, then multithreading would not help.
Since you're I/O bound, figuring out how many threads to spawn is a little trickier. If all work items take approximately the same time to process with very low variance, you can estimate how many threads to spawn by measuring how long one work item takes. However, since network packets tend to have highly variable latencies, this is unlikely to be the case.
One option is to use thread pools - you create a whole bunch of threads, and then for each item to process, you see if there is a free thread in the pool. If there is, you have that thread perform the work, and you move onto the next item. Otherwise, you wait for a thread to become available. Choosing the size of the thread pool is important - too big, and you're wasting time doing unnecessary context switches. Too few, and you're waiting for threads too often.
Yet another option is to abandon multithreading/multiprocessing and just do asynchronous I/O instead. Since you mentioned you're working on a single-core processor, this will probably be the fastest option. You can use functions like socket_select() to test if a socket has data available. If it does, you can read the data, otherwise you move onto a different socket. This requires doing a lot more bookkeeping, but you avoid waiting for data to come in on one socket when data is available on a different socket.
If you want to eschew threads and asynchronous I/O and stick with multiprocessing, it can still be worthwhile if the per-item processing is expensive enough. You might then do the work division like so:
$my_process_index = 0;
$pids = array();
// Fork off $max_procs processes
for($i = 0; $i < $max_procs - 1; $i++)
{
$pid = pcntl_fork();
if($pid == -1)
{
die("couldn't fork()");
}
elseif($pid > 0)
{
// parent
$my_process_index++;
$pids[] = $pid
}
else
{
// child
break;
}
}
// $my_process_index is now an integer in the range [0, $max_procs), unique among all the processes
// Each process will now process 1/$max_procs of the items
for($i = $my_process_index; $i < length($items); $i += $max_procs)
{
do_stuff_with($items[$i]);
}
if($my_process_index != 0)
{
exit(0);
}
man 2 setrlimit
That's going to be per-user which may be what you want anyway.
<?php
ini_set('max_execution_time', 0);
ini_set('max_input_time', 0);
set_time_limit(0);
$items = range(0, 100);
function do_stuff_with($item) { echo "$item\n"; }
$pids = array();
while (count($items)){
$item = array_pop($items);
$pid = pcntl_fork();
if ($pid == -1) {
die("couldn't fork()");
} elseif ($pid > 0) {
// parent
$pids[] = $pid;
} else {
// child
do_stuff_with($item);
exit(0);
}
while (count($pids) >= 10){ // limit
while (($wait_pid = pcntl_waitpid(0, $status)) != -1) {
$status = pcntl_wexitstatus($status);
array_pop($pids);
echo "$wait_pid $status".PHP_EOL;
break;
}
}
}
while (count($pids)){
while (($wait_pid = pcntl_waitpid(0, $status)) != -1) {
$status = pcntl_wexitstatus($status);
array_pop($pids);
echo "CHILD: child $status completed $wait_pid".PHP_EOL;
break;
}
}