How to make functions in PHP synchronized so that same function won't be executed concurrently ? 2nd user must wait till 1st user is done with the function. Then 2nd user can execute the function.
Thanks
This basically comes down to setting a flag somewhere that the function is locked and cannot be executed until the first caller returns from that function.
This can be done in a number of ways:
use a lock file (first function locks a file name "f.lok", second function checks if the lock file exists and executes or doesn't based on that evaluation)
set a flag in the database (not recomended)
use semaphores as #JvdBerg suggested (the fastest)
When coding concurrent application always beware of race conditions and deadlocks!
UPDATE
using semaphores (not tested):
<?php
define('SEM_KEY', 1000);
function noconcurrency() {
$semRes = sem_get(SEM_KEY, 1, 0666, 0); // get the resource for the semaphore
if(sem_acquire($semRes)) { // try to acquire the semaphore. this function will block until the sem will be available
// do the work
sem_release($semRes); // release the semaphore so other process can use it
}
}
PHP needs to be compiled with sysvsem support in order to use sem_* functions
Here's a more in depth tutorial for using semaphores in PHP:
http://www.re-cycledair.com/php-dark-arts-semaphores
You are looking for a Semaphore
Bear in mind that using a semaphore (or any other blocking mechanism) can have serious peformance issues, as the requests can not be handled while the semaphore is up.
off the top of my head:
function checks if a database field called isFunctionRunning is equal 1. if not start executing
you update the database field called isFunctionRunning to 1
function does magic here
you update the database field called isFunctionRunning to 0
but somehow i think what you are trying to do is "wrong" and can be achieved in another way. could help if you said more details
edit: wasn't aware of php semaphores, the answer above will be way faster.
You can use the "flock" (file locking) function with the "LOCK_EX" (exclusive lock) flag to create a custom "synchronized" function that accepts a handler to be synchronized.
You may may found the code here.
I hope this helps.
Related
EDIT : My problem came from the "intelligent" behaviour of Firefox. If you call the same page on two different tabs, it automatically start the second after the first is done. If you want parallel execution you must add a different parameter.
Was trying to create a mutex using a directory. For exemple :
$dir = 'test' ;
echo is_dir($dir) ;
mkdir($dir)
wait(30)
rmdir($dir)
In a browser, I call the script, on another tab a few seconds later I call the same script.
is_dir returns false and there isno error on mkdir on the second call
ON the disk the dir is created with the first script and remain until the second end.
If I call on command line the two script one after the other I have the
expected result is_dir is true and mk_dir failed with dir already exists error.
The web server is an apache2.
Can't explain such a behavior.
When you use stat(), lstat(), or any of the other functions listed in the affected functions list (below), PHP caches the information those functions return in order to provide faster performance. However, in certain cases, you may want to clear the cached information. For instance, if the same file is being checked multiple times within a single script, and that file is in danger of being removed or changed during that script's operation, you may elect to clear the status cache. In these cases, you can use the clearstatcache() function to clear the information that PHP caches about a file.
This function caches information about specific filenames, so you only need to call clearstatcache() if you are performing multiple operations on the same filename and require the information about that particular file to not be cached.
Affected functions include stat(), lstat(), file_exists(), is_writable(), is_readable(), is_executable(), is_file(), is_dir(), is_link(), filectime(), fileatime(), filemtime(), fileinode(), filegroup(), fileowner(), filesize(), filetype(), and fileperms().
TLDR, add a clearstatcache(); before any checks
source : http://php.net/manual/en/function.clearstatcache.php
You might want to explain a bit better, and paste a better code exemple...
Meanwhile, here is a better way to handle your mkdir/rmdir
$mydir= 'my/dir/'
if(!is_dir($myDir)) {
mkdir($myDir, 0755, true);
wait(30);
rmdir(mydir);
}
You might need to find out how to recursively delete dirs and files, it might help... ;)
Also, is wait()a PHP function you made?!
I do know sleep() but not wait()...
The code could be prettier and more realistic, was just trying to be concise. Add thought of apc or xcode cache problem...
Wandering on the interweb for a hint, I read that when calling the same script on two tabs firefox was so intelligent (f... him) that it waited for the first to be done before executing the second.
Adding a different param to each call (?t=1 and ?t=2) or using chrome for one call and ff for the other make it working flawlessly.... What a waste of time....
How to make functions in PHP synchronized so that same function won't be executed concurrently ? 2nd user must wait till 1st user is done with the function. Then 2nd user can execute the function.
Thanks
This basically comes down to setting a flag somewhere that the function is locked and cannot be executed until the first caller returns from that function.
This can be done in a number of ways:
use a lock file (first function locks a file name "f.lok", second function checks if the lock file exists and executes or doesn't based on that evaluation)
set a flag in the database (not recomended)
use semaphores as #JvdBerg suggested (the fastest)
When coding concurrent application always beware of race conditions and deadlocks!
UPDATE
using semaphores (not tested):
<?php
define('SEM_KEY', 1000);
function noconcurrency() {
$semRes = sem_get(SEM_KEY, 1, 0666, 0); // get the resource for the semaphore
if(sem_acquire($semRes)) { // try to acquire the semaphore. this function will block until the sem will be available
// do the work
sem_release($semRes); // release the semaphore so other process can use it
}
}
PHP needs to be compiled with sysvsem support in order to use sem_* functions
Here's a more in depth tutorial for using semaphores in PHP:
http://www.re-cycledair.com/php-dark-arts-semaphores
You are looking for a Semaphore
Bear in mind that using a semaphore (or any other blocking mechanism) can have serious peformance issues, as the requests can not be handled while the semaphore is up.
off the top of my head:
function checks if a database field called isFunctionRunning is equal 1. if not start executing
you update the database field called isFunctionRunning to 1
function does magic here
you update the database field called isFunctionRunning to 0
but somehow i think what you are trying to do is "wrong" and can be achieved in another way. could help if you said more details
edit: wasn't aware of php semaphores, the answer above will be way faster.
You can use the "flock" (file locking) function with the "LOCK_EX" (exclusive lock) flag to create a custom "synchronized" function that accepts a handler to be synchronized.
You may may found the code here.
I hope this helps.
I wrote a web spider to spider pages concurrently. For each link that the spider finds, I want to fork off a new child that starts the process all over again.
I don't want to overload the target server so I created a static array that all objects can access. Each child can add their PID to the array, and either parent or child should check the array to see if $maxChildren have been met, and if so, patiently wait until any child finishes.
As you see, I have $maxChildren set to 3. I am expecting to see 3 simultaneous processes at any given time. However, that's not the case. The linux top command shows 12 to 30 processes at any given time. In concurrent programming, how can I regulate the number of simultaneous processes? My logic is currently inspired by how Apache handles it's max children, but I'm not exactly sure how that works.
As pointed out in one of the answers, globally accessing the static variable brings up issues with race conditions. To deal with this, the $children array takes the unique $PID of the process as both the key and it's value, thereby creating a unique value. My thinking is that since any object can only deal with one $children[$pid] value, locking is not necessary. Is this not true? Is there a chance that two processes could try to unset or add the same value at some point?
private static $children = array();
private $maxChildren = 3;
public function concurrentSpider($url) {
// STEP 1:
// Download the $url
$pageData = http_get($url, $ref = '');
if (!$this->checkIfSaved($url)) {
$this->save_link_to_db($url, $pageData);
}
// STEP 2:
// extract all hyperlinks from this url's page data
$linksOnThisPage = $this->harvest_links($url, $pageData);
// STEP 3:
// Check the links array from STEP 2 to see if this page has
// already been saved or is excluded because of any other
// logic from the excluded_link() function
$filteredLinks = $this->filterLinks($linksOnThisPage);
shuffle($filteredLinks);
// STEP 4: loop through each of the links and
// repeat the process
foreach ($filteredLinks as $filteredLink) {
$pid = pcntl_fork();
switch ($pid) {
case -1:
print "Could not fork!\n";
exit(1);
case 0:
if ($this->checkIfSaved($filteredLink)) {
exit();
}
//$pid = getmypid();
print "In child with PID: " . getmypid() . " processing $filteredLink \n";
$var[$pid]->concurrentSpider($filteredLink);
sleep(2);
exit(1);
default:
// Add an element to the children array
self::$children[$pid] = $pid;
// If the maximum number of children has been
// achieved, wait until one or more return
// before continuing.
while (count(self::$children) >= $this->maxChildren) {
//print count(self::$children) . " children \n";
$pid = pcntl_waitpid(-1, $status);
unset(self::$children[$pid]);
}
}
}
}
This is written in PHP. I know that the pcntl_waitpid function with argument of -1 waits for any child to complete regardless of the parent (http://php.net/manual/en/function.pcntl-waitpid.php).
What's wrong with my logic and how can I correct it so that only $maxChildren processes are running simultaneously? I'm also open to improving the logic in general if you have suggestions.
First thing to note: if this is truly a global being shared among multiple threads, it's possible that multiple threads are adding to it at once and you're running afoul of a race condition. You need some sort of concurrency control to ensure that only one process is accessing your global array at once.
Also, try the simple debugging trick of having each process write out (to the console or to a file) its PID and the full contents of the global array each time a new spider is forked. It will help you to check your assumptions (which are plainly wrong at some point) and figure out what's going wrong.
EDIT: (In response to the comments)
I'm not a PHP developer, but if I had to guess, based on the fact that you're using an OS tool that counts OS-level processes, I'd guess that your fork is spawning multiple processes, but your static array is global within the current process. Implementing system-wide shared memory is a lot more complicated!
If you just want to count something and ensure that instances of a shared resource don't grow out of control, look into semaphores, and see if you can find a way in PHP to create a named semaphore object that can be shared between multiple instances of your spider.
Use a real programming language ;)
Step 1 is kind of bad why are you downloading if it might be in the db. Put that inside the if and see if you can put a mutex around it. Maybe so something in sql to imitate one.
I hope harvest_links uses a proper html processor with css selector support (i like fizzler for .NET). I guess regular expression would be fine if its just to get links but it is possible to mess up.
I see step 4 and i don't think its bad but personally i'd do it a different way.
I'd have something like step one to insert url,page,flag into a db. Then i'd have another process or the same one ask the db for unprocessed pages and set the flag to some value if it errors and another if its successful. This is so if something fails of the process exits (shutdown, crash, power out, etc) it can pick it up easily and don't need to scan every page to find where it left off. It just ask the database for the next link and redoes what it didnt finish
PHP doesn't support multithreading, therefore it doesn't support mutexes or any other synchronization methods. As others have said in their answers, this will lead to a race condition.
You'll have to write a wrapper in C or bash. That way, the PHP script can submit targets to the wrapper, and the wrapper will handle scheduling.
Another approach is to rewrite your spider in Python or Ruby, both of which support multithreading. That will eliminate the need for interprocess communication.
Edit: On second thought, the best way is to write the wrapper in Python or Ruby and reuse your existing PHP code as a black box. That's a compromise of the solutions above.
If the spider is for practical purposes, you might want to google "curl multithread"
cURL Multi Threading with PHP
Is there a way to prevent a code-block or a function within a code from running more than once even if I re-execute (or reload) the PHP file?
I mean, can I restrict someone from executing a php script more than once? I can't seem to find the way to do this.
Yes, you can use a $_SESSION variable to determine if the code has been executed. The session variable will be set until the user closes their browser. If you want to extend it further than that, you can set a cookie. Please see the following links for more details.
Session Variables
Cookies
If you are using sessions, then you can set a flag in the user's session array after the code has executed:
function doSomething(){
if (empty($_SESSION['completed'])){
//Do stuff here if it has not been executed.
}
$_SESSION['completed'] = TRUE;
}
You should also check the sesison variable to see if the task has been executed previously. This assumes that the user can accept a session cookie.
I have an app that does that.
What we did was create a table in the db called version, and stored a version number in there. When the script is ran, it compared the version number in the database with that in the php script. And perform whatever it needs to "upgrade" it to the new version, and then updates the version number in the database.
Of couse, if the version table does not exist, the code will create it and mark it as storing version zero.
Just put a counter in the function. If the counter is greater that 0, then don't do anything. The counter variable should be static so it "remembered" across multiple calls.
function sample() {
static $call_counter = 0;
if ( $call_counter>0 ) {
return;
}
...
$call_counter++;
}
As for making sure a file is only executed once, just use "include_once()" instead of "include()".
Is there any function / global variable in PHP that returns the current state of the script (something like runnning, terminating)?
Or is the only way to set this state by making use of register_shutdown_function()?
That function looks inflexible to me as an already registered shutdown functions can be overriden with it. And the shutdown function gets executed when a user aborts the connection, which is not what I'm looking for explicitly and I don't want to introduce too many constraints.
Are there any alternatives to register_shutdown_function() available? Or if not, how to deal with the shortcomings of that function?
UPDATE
Just to clarify: I'm not looking for connection state (e.g. connection_aborted()) but for the run state of the PHP script (running, terminating). Functions to find out more about the connection state I already know of, but how about the current state of the script? Has the script already been terminated and are objects (going to be) destroyed because of that?
UPDATE2
To clarify even more, I'm still not looking for connection state but for something comparable regarding the run-state. It should work in CLI as well which does not have any connection state as there is no TCP connection related to executing the code - to better illustrate what I'm looking for.
After reading a larger part of the PHP sourcecode I came to the conclusion that even if such state(s) exist on the level of experience, they do not really exist within the interpreter in form of a flag or variable.
The code about throwing Exceptions for example decides on various variables if that is possible or not.
The answer to the question is no therefore.
The best workaround I could find so far is to have a global variable for this which is set in a registered shutdown function. But a flag from PHP seems to be not really available.
<?php
register_shutdown_function(function() {$GLOBALS['shutdown_flag']=1;});
class Test {
public function __destruct() {
isset($GLOBALS['shutdown_flag'])
&& var_dump($GLOBALS['shutdown_flag'])
;
}
}
$test = new Test;
#EOF; Script ends here.
You are looking for:
Connection_aborted();
http://it.php.net/manual/en/function.connection-aborted.php
or
Connection_status();
http://it.php.net/manual/en/function.connection-status.php
Addendum
There can't be any Terminated status, because if it's terminated you can't check its status lol
I have never made (practical) use of it myself yet, but you might be able to make use of:
http://www.php.net/manual/en/function.register-tick-function.php
Using this means you can write a file or update a db or something while script is running... i.e. write a record session/some id and a timestamp id to a file or something and check for time between execution perhaps, you could say if it's not been updated in X seconds it's still running.
But as stated PHP is stateless so it's not a notion that PHP will be aware of.
Failing this, you could set a DB field in some way when a script starts/just before it 'ends', but would place a lot of overhead really.
Is there any function / global
variable in PHP that returns the
current state of the script (something
like runnning, terminating)?
No, PHP is stateless.