Interesting thing happened while writing data into redis within a php loop - php

i wrote a php script to pull data from one server (lets call it Server A) to the other (Server B). data in server A is a redis list stores all the operating commands need to be written in server B, such as :
["setex",["session:xxxx",604800,"xxxx"]]
["set",["uid:xxx","xxxxx"]]
["pipeline",[]]
["set",["uid:xxx","xxxxx"]]
["hIncrBy",["Signin:xxxx","totalTimes",1]]
["pipeline",[]]
....
my php codes are :
while($i < 1000){
$line = $redis['server_a']->rpop('sync:op');
list($op,$params) = json_decode($line,1);
$r = call_user_func_array(array($redis['server_b'], $op), $params);
$i++;
}
The wired thing is, when the call_user_func_array method executes the redis command uncorrectly, all the rest commands in the queue cannot be written correctly into server B.
i stuck in this problem almost one week for seeking answers. after thousands of tests i found if i remove the "bad commands" that cannot be executed correctly, such as the ["pipeline",[]] row. all the other commands can be inserted properly. so it reminds me of some redis transaction issue. maybe there are some machanisms that when a command executed unproperly in redis , all the other commands afterwards will be treated as a transaction. so i add a exec() command into the while loop :
while($i < 1000){
$line = $redis['server_a']->rpop('sync:op');
list($op,$params) = json_decode($line,1);
$r = call_user_func_array(array($redis['server_b'], $op), $params);
$redis['server_b']->exec(); //this is the significant update
$i++;
}
then, my problem solved !!!
My question is , can anybody help me explain the redis machanism? is my assumption correct ?

Your library is probably using transactions for pipelining for whatever reason. pipeline is no actual Redis command, see http://redis.io/commands
Just strip all pipeline commands with empty arguments or just use ->exec when you issued a pipeline before.

Related

Start and terminate a background python script with PHP and a timer

What I've to do it's a bit complicated.
I've a python script and I want to run it from PHP, but in background. I had seen somewhere that for run a python script in background I have to use the PHP command exec(script.py) to run without wait for a return: and thus far no problem.
First question:
I have to stop this loop script with another PHP command, how to do this?
Second question:
I have to implement a server-side timer, who stops the script at the end of the time.
I found this code:
<?php
$timer = 60*5; // seconds
$timestamp_file = 'end_timestamp.txt';
if(!file_exists($timestamp_file))
{
file_put_contents($timestamp_file, time()+$timer);
}
$end_timestamp = file_get_contents($timestamp_file);
$current_timestamp = time();
$difference = $end_timestamp - $current_timestamp;
if($difference <= 0)
{
echo 'time is up, BOOOOOOM';
// execute your function here
// reset timer by writing new timestamp into file
file_put_contents($timestamp_file, time()+$timer);
}
else
{
echo $difference.'s left...';
}
?>
From this answer.
Then, there is a way to implement it in a MySQL database? (The integration with the script stop is not a problem)
That's actually pretty simple. You can use a memory object caching system. I would recommend memcached. Memory objects from memcached can be accessed literally from anywhere in your system. The only requirement is that a connection to the memcached backend server is supported. (PHP does, Python does, etc.)
Answer to your first question:
Create a variable called stopme with the value 0 in the memcached database.
Connect from your python script to the memcached database and read the variable stopme permanently. Let's say the python script is running when the variable stopme has the value 0.
In order to stop your script from PHP, make a connection from your PHP script to the memcached server and set stopme to 1.
The python script receives the updated value instantly and exits.
Answer to your second question:
It could be done like explained in my answer before through reading shared variables, but additionally I would like to mention that you also could use a cronjob to kill a running script.

Forking processes while mysql locked

I am working on a shell script in cakephp that processes a queue of items in my mysql database. To speed up the process I am using pcntlfork like so:
$pids = array();
$i = 0;
for($i = 0; $i < count($queue); $i++)
{
$pids[$i] = pcntl_fork();
if(!$pids[$i]) {
# do code
exit();
}
}
While this code is executing the shell script may run before the current script has time to delete the items from the queue. I was using mysql and locking the table like so:
$this->Queue->query("SELECT GET_LOCK('".$this->mysqlLock."', ".$this->mysqlLockTime.") AS 'GetLock'");
This implementation does not work giving me the error "General error: mysql server has gone away". This is because the connection is lost in the children. It appears to be a flaw within fork itself in php.
My question is there a better solution to locking this processes until it finishes and then releasing it for the other shell scripts to execute?
I have found the best solution to assure that the program does not call and execute a instance of itself. I will be using semaphores in php to lock the program.
Here is a short tutorial on semaphores:
http://www.re-cycledair.com/php-dark-arts-semaphores

PHP multithread to call several soap servers

I have to call a php function wich takes one second to response, in a "for" loop :
for ($i=0; $i<count($wsdlTab); $i++)
{
$serverLoadTab[$i] = $this->getServerLoad($wsdlTab[$i]);
}
My problem is that I would like to call my getServerLoad($wsdlTab[$i]) function simultaneous for each row of my $wsdlTab[$i], to not have to wait one second on each loop.
That is the reason why I need to call that function in a thread.
I have seen various ways to "emulate" threads, but I have not found any way with my limitations :
I have to get the return value of my getServerLoad($wsdlTab[$i]), and put in in an array
The Apache server is on Windows
Thanks in advance for your responses.
You should check out Gearman for parallel processing: http://gearman.org/
PHP doesn't really have asynchronous or threading built-in, as you've discovered.
What I might do in a case like this is separate the script I need to execute in a parallel, putting it into its own, small and self-contained PHP file. Then I'd execute that in a separate thread, storing the result somewhere I could monitor in the original thread. Once all the scripts have returned and filled the results, or with some given timeout, I would then continue with processing.
So, for instance,
prepareResults(); // something like clearing a db row or zeroing out a file or whatever
for ($i=0; $i<count($wsdlTab); $i++) {
exec('./doServerLoad.php ' . $wsdlTab[$i] . ' &');
}
while (!waitingForResults()) { // checking the results table/row/file
}
$serverLoadTab = parseTheResults();
May be request WSDLs in multi threads/forks using pcntl_fork() or curl_multi, and save them to the local drive. Then just parse them:
$soapClient = new SoapClient('WSDLSTemp/wsdl1.wsdl');
// .. do whatever you want...

check cron job has run script properly - proper way to log errors in batch processing

I have set up a cronjob to run a script daily. This script pulls out a list of Ids from a database, loops through each to get more data from the database and geneates an XML file based on the data retrieved.
This seems to have run fine for the first few days, however, the list of Ids is getting bigger and today I have noticed that not all of the XML files have been generated. It seems to be random IDs that have not run. I have manually run the script to generate the XML for some of the missing IDs individually and they ran without any issues.
I am not sure how to locate the problem as the cron job is definately running, but not always generating all of the XML files. Any ideas on how I can pin point this problem and quickly find out which files have not been run.
I thought perhaps add timestart and timeend fields to the database and enter these values at the start and end of each XML generator being run, this way I could see what had run and what hadn't, but wondered if there was a better way.
set_time_limit(0);
//connect to database
$db = new msSqlConnect('dbconnect');
$select = "SELECT id FROM ProductFeeds WHERE enabled = 'True' ";
$run = mssql_query($select);
while($row = mssql_fetch_array($run)){
$arg = $row['id'];
//echo $arg . '<br />';
exec("php index.php \"$arg\"", $output);
//print_r($output);
}
My suggestion would be to add some logging to the script. A simple
error_log("Passing ID:".$arg."\n",3,"log.txt");
Can give you some info on whether the ID is being passed. If you find that that is the case, you can introduce logging to index.php to further evaluate the problem.
Btw, can you explain why you are using exec() to run a php script? Why not excute a function in the loop. This could well be the source of the problem.
Because with exec I think the process will run in the background and the loop will continue, so you could really choke you server that way, maybe that's worth trying out as well. (I think this also depends on the way of outputting:
Note: If a program is started with this function, in order for it to continue running in the background, the output of the program must be redirected to a file or another output stream. Failing to do so will cause PHP to hang until the execution of the program ends.
Maybe some other users can comment on this.
Turned out the apache was timing out. Therefore nothing to do with using a function or the exec() function.

Speeding up a PHP App

I have a list of data that needs to be processed. The way it works right now is this:
A user clicks a process button.
The PHP code takes the first item that needs to be processed, takes 15-25 secs to process it, moves on to the next item, and so on.
This takes way too long. What I'd like instead is that:
The user clicks the process button.
A PHP script takes the first item and starts to process it.
Simultaneously another instance of the script takes the next item and processes it.
And so on, so around 5-6 of the items are being process simultaneously and we get 6 items processed in 15-25 secs instead of just one.
Is something like this possible?
I was thinking that I use CRON to launch an instance of the script every second. All items that need to be processed will be flagged as such in the MySQL database, so whenever an instance is launched through CRON, it will simply take the next item flagged to be processed and remove the flag.
Thoughts?
Edit: To clarify something, each 'item' is stored in a mysql database table as seperate rows. Whenever processing starts on an item, it is flagged as being processed in the db, hence each new instance will simply grab the next row which is not being processed and process it. Hence I don't have to supply the items as command line arguments.
Here's one solution, not the greatest, but will work fine on Linux:
Split the processing PHP into a separate CLI scripts in which:
The command line inputs include `$id` and `$item`
The script writes its PID to a file in `/tmp/$id.$item.pid`
The script echos results as XML or something that can be read into PHP to stdout
When finished the script deletes the `/tmp/$id.$item.pid` file
Your master script (presumably on your webserver) would do:
`exec("nohup php myprocessing.php $id $item > /tmp/$id.$item.xml");` for each item
Poll the `/tmp/$id.$item.pid` files until all are deleted (sleep/check poll is enough)
If they are never deleted kill all the processing scripts and report failure
If successful read the from `/tmp/$id.$item.xml` for format/output to user
Delete the XML files if you don't want to cache for later use
A backgrounded nohup started application will run independent of the script that started it.
This interested me sufficiently that I decided to write a POC.
test.php
<?php
$dir = realpath(dirname(__FILE__));
$start = time();
// Time in seconds after which we give up and kill everything
$timeout = 25;
// The unique identifier for the request
$id = uniqid();
// Our "items" which would be supplied by the user
$items = array("foo", "bar", "0xdeadbeef");
// We exec a nohup command that is backgrounded which returns immediately
foreach ($items as $item) {
exec("nohup php proc.php $id $item > $dir/proc.$id.$item.out &");
}
echo "<pre>";
// Run until timeout or all processing has finished
while(time() - $start < $timeout)
{
echo (time() - $start), " seconds\n";
clearstatcache(); // Required since PHP will cache for file_exists
$running = array();
foreach($items as $item)
{
// If the pid file still exists the process is still running
if (file_exists("$dir/proc.$id.$item.pid")) {
$running[] = $item;
}
}
if (empty($running)) break;
echo implode($running, ','), " running\n";
flush();
sleep(1);
}
// Clean up if we timeout out
if (!empty($running)) {
clearstatcache();
foreach ($items as $item) {
// Kill process of anything still running (i.e. that has a pid file)
if(file_exists("$dir/proc.$id.$item.pid")
&& $pid = file_get_contents("$dir/proc.$id.$item.pid")) {
posix_kill($pid, 9);
unlink("$dir/proc.$id.$item.pid");
// Would want to log this in the real world
echo "Failed to process: ", $item, " pid ", $pid, "\n";
}
// delete the useless data
unlink("$dir/proc.$id.$item.out");
}
} else {
echo "Successfully processed all items in ", time() - $start, " seconds.\n";
foreach ($items as $item) {
// Grab the processed data and delete the file
echo(file_get_contents("$dir/proc.$id.$item.out"));
unlink("$dir/proc.$id.$item.out");
}
}
echo "</pre>";
?>
proc.php
<?php
$dir = realpath(dirname(__FILE__));
$id = $argv[1];
$item = $argv[2];
// Write out our pid file
file_put_contents("$dir/proc.$id.$item.pid", posix_getpid());
for($i=0;$i<80;++$i)
{
echo $item,':', $i, "\n";
usleep(250000);
}
// Remove our pid file to say we're done processing
unlink("proc.$id.$item.pid");
?>
Put test.php and proc.php in the same folder of your server, load test.php and enjoy.
You will of course need nohup (unix) and PHP cli to get this to work.
Lots of fun, I may find a use for it later.
Use an external workqueue like Beanstalkd which your PHP script writes a bunch of jobs too. You have as many worker processes pulling jobs from beanstalkd and processing them as fast as possible. You can spin up as many workers as you have memory / CPU. Your job body should contain as little information as possible, maybe just some IDs which you hit the DB with. beanstalkd has a slew of client APIs and itself has a very basic API, think memcached.
We use beanstalkd to process all of our background jobs, I love it. Easy to use, its very fast.
There is no multithreading in PHP, however you can use fork.
php.net:pcntl-fork
Or you could execute a system() command and start another process which is multithreaded.
can you implementing threading in javascript on the client side? seems to me i've seen a javascript library (from google perhaps?) that implements it. google it and i'm sure you'll find something. i've never done it, but i know its possible. anyway, your client-side javascript could activate (ajax) a php script once for each item in separate threads. that might be easier than trying to do it all on the server side.
-don
If you are running a high traffic PHP server you are INSANE if you do not use Alternative PHP Cache: http://php.net/manual/en/book.apc.php . You do not have to make code modifications to run APC.
Another useful technique that can work along with APC is using the Smarty template system which allows you to cache output so that pages do not have to be rebuilt.
To solve this problem, I've used two different products; Gearman and RabbitMQ.
The benefit of putting your jobs into some sort of queuing software like Gearman or Rabbit is that you have multiple machines, they can all participate in processing items off the queue(s).
Gearman is easier to setup, so I'd suggest poking around with it a bit first. If you find you need something more heavy duty with queue robustness; Look into RabbitMQ
http://www.danga.com/gearman/
http://pear.php.net/package/Net_Gearman (PEAR library)
You can use pcntl_fork() and family to fork a process - however you may need something like IPC to communicate back to the parent process that the child process (the one you fork'd) is finished.
You could have them write to shared memory, like via memcache or a DB.
You could also have the child process write the completed data to a file, that the parent process keeps checking - as each child process completes the file is created/written to/updated, and parent process can grab it, one at a time, and them throw them back to the callee/client.
The parent's job is to control the queue, to make sure the same data isn't processed twice and also to sanity check the children (better kill that runaway process and start over...etc)
Something else to keep in mind - on windows platforms you are going to be severely limited - I dont even think you have access to pcntl_ unless you compiled PHP with support for it.
Also, can you cache the data once its been processed, or is it unique data every time? that would surely speed things up..?

Categories