My php script uses php simplehtmldom to parse html and get all the links and images that I want and this can run for a duration depending on the amount of images to download.
I thought it would be good idea to allow cancelling in this case. Currently I call my php using Jquery-Ajax, the closest thing I could find is php register_shutdown_function but not sure if it can work for my case. Any ideas?
So once php is launched, it cant be disturbed? like fire ajax again to call an exit to the same php file?
This is good only in case you are processing really massive data loads through AJAX. For other cases, just handle it in JS to not display result if canceled.
But as I said If you are processing huge loads of data, then you can add a interrupt condition in every nth step of running script and fulfill that condition using another script. For example you can use a file to store a interrupt data, or MySQL MEMORY table.
Example.
1, process.php (ajax script processing loads of data)
// clean up previous potential interrupt flag
$fileHandler = fopen('interrupt_condition.txt', 'w+');
fwrite($fileHandler, '0');
fclose($fileHandler);
function interrupt_check() {
$interruptfile = file('interrupt_condition.txt');
if (trim($interruptfile[0]) == "1") { // read first line, trim it and parse value - if value == 1 interrupt script
echo json_encode("interrupted" => 1);
die();
}
}
$i = 0;
foreach ($huge_load_of_data as $object) {
$i++;
if ($i % 10 == 0) { // check for interrupt condition every 10th record
interrupt_check();
}
// your processing code
}
interrupt_check(); // check for last time (if something changed while processing the last 10 entries)
2, interrupt_process.php (ajax script to propagate cancel event to file)
$fileHandler = fopen('interrupt_condition.txt', 'w+');
fwrite($fileHandler, '1');
fclose($fileHandler);
This will definitely affect performance of your script, but makes you a backdoor to close execution. This is very simple example - you need to make it more complex to make it work for more users simultaneously, etc.
You can also use MySQL MEMORY Table, MEMCACHE - Non-persistent Caching Server or whatever non-persistent storage you could find.
Related
I've got a script that is called from an API. The problem I'm having is that the script is being called too many times within a short period of time, 1/1000 of a second at times. In my script, I have sql queries that do some deletions and updates depending on the times that someone schedules a playlist.
The process is as follows:
Add as many playlists
Click the 'save playlist schedule' button.
Use API call for each playlist created.
Seems simple, however, if 2 or more playlists are created (then 2 or more API calls are made), the queries aren't running linearly so the database isn't changing to the desired numbers.
I've edited the Jquery to sleep for a second each time just to ensure that this was the problem and it was, however, I don't want to limit frequency that this API call can be made.
I tried to create a lock file like so in php at the very beginning of the script that is run:
# -----ENSURES USERS DON'T CALL THE API TOO MANY TIMES---------
# Check lock file so someone can't overwrite database before edits are done.
do
{
$file = fopen('playlist_lock.txt', 'w+');
if (fgets($file) == 0) // 0 being unlocked, 1 being locked
{
fwrite($file, '1');
fclose($file);
break;
}
usleep(10000);
}
while ($is_locked);
and then at the very end of the file, I add:
fwrite($file, '0');
fclose($file);
However, this didn't seem to work. It seemed to lower the problems caused by calling more than 4 or so, but is still problematic for the initial 3 or 4. Meaning, I think the file isn't being written to fast enough regardless to lock the other calls from running just yet.
Can anyone guide me in the right direction?
My guess is to use two file-names for checking/changing lock-status, instead of reading/writing from the same file. Here is the sample code:
$locked = 'playlist_lock.txt';
$unlocked = 'playlist_unlock.txt';
if (!file_exists($unlocked) && !file_exists($locked)){
file_put_contents($unlocked, '1');
}
do {
if (file_exists($unlocked)){
rename($unlocked, $locked); // lock if it is unlocked
break;
}
usleep(10000);
} while ($is_locked);
and then at the very end of the file:
rename($locked, $unlocked); // unlock
Update
Perhaps, it's better to create playlist_unlock.txt manually, than check its existence everytime script runs. So that this block of code can be removed:
if (!file_exists($unlocked) && !file_exists($locked)){
file_put_contents($unlocked, '1');
}
Ok here is my problem.
I have a file which outputs an XML based on an input X
I have another file which calls the above(1) file with 10000 (i mean many) times with different numbers for X
When an user clicks "Go" It should go through all those 10000 Xs and simultaneously show him a progress of how many are done. (hmm may be updated once every 10sec).
How do i do it? I need ideas. I know how to AJAX and stuff, but whats the structure my program should take?
EDIT
So according to the answer given below i did store my output in a session variable. It then outputs the answer. What is happening is:
When i execute a loong script. It gets executed say within 1min. But in the mean time if i open (in a new window) just the file which outputs my SESSION variable, then it doesnt output will the first script has run. Which is completely opposite to what i want. Whats the problem here? Is it my syste/server which doesnt handle multiple requests or what?
EDIT 2
I use the files approach:
To read what i want
> <?php include_once '../includeTop.php'; echo
> util::readFromLog("../../Files/progressData.tmp"); ?>
and in another script
$processed ++;
util::writeToLog($dir.'/progressData.tmp', "Files processed: $processed");
where the functions are:
public static function writeToLog($file,$data) {
$f = fopen($file,"w");
fwrite($f, $data);
fclose($f);
}
public static function readFromLog($file) {
return file_get_contents($file);
}
But still the same problem persist :(. I can manually see the file gettin updated like 1, 2, 3 etc. But when i run my script to do from php it just waits till my original script is output.
EDIT 3
Ok i finally found the solution. Instead of seeking the output from the php file i directly goto the log now and seek it.
Put the progress (i.e. how far are you into the 2nd file) into a memcached directly from the background job, then deliver that value if requested by the javascript application (triggered by a timer, as long as you did not reach a 100%). The only thing you need to figure out is how to pass some sort of "transaction ID" to both the background job and the javascript side, so they access the same key in memcached.
Edit: I was wrong about $_SESSION. It doesn't update asynchronously, i.e. the values you store in it are not accessible until the script has finished. Whoops.
So the progress needs to be stored in something that does update asynchronously: Memory (like pyroscope suggests, and which is still the best solution), a file, or the database.
In other words, instead of using $_SESSION to store the value, it should be stored by memcached, in a file or in the database.
I.e. using the database
$progress = 0;
mysql_query("INSERT INTO `progress` (`id`, `progress`) VALUES ($uid, $progress)");
# loop starts
# processing...
$progress += $some_increment;
mysql_query("UPDATE `progress` SET `progress`=$progress WHERE `id`=$uid");
# loop ends
Or using a file
$progress = 0;
file_put_contents("/path/to/progress_files/$uid", $progress);
# loop starts
# processing...
$progress += $some_increment;
file_put_contents("/path/to/progress_files/$uid", $progress);
# loop ends
And then read the file/select from the database, when requesting progress via ajax. But it's not a pretty solution compared to memcached.
Also, remember to remove the file/database row once it's all done.
You could put the progress in a $_SESSION variable (you'll need a unique name for it), and update it while the process runs. Meanwhile your ajax request simply gets that variable at a specific interval
function heavy_process($input, $uid) {
$_SESSION[$uid] = 0;
# loop begins
# processing...
$_SESSION[$uid] += $some_increment;
# loop ends
}
Then have a url that simply spits out the $_SESSION[$uid] value when it's requested via ajax. Then use the returned value to update the progress bar. Use something like sha1(microtime()) to create the $uid
Edit: pyroscope's solution is technically better, but if you don't have a server with memcached or the ability to run background processes, you can use $_SESSION instead
I have a PHP web crawler that just checks out websites. I decided a few days ago to make the crawlers progress show in real time using AJAX. The php script writes to a file in JSON and AJAX reads the tiny file.
I double and triple checked my PHP script wondering what the hell was going on because after I finished the simple AJAX script the data appearing on my browser leaped up and down in strange directions.
The php script executed perfectly and very quickly but my AJAX would slowly increase the values, every 2 seconds as set, then drop. The numbers only increase in PHP they do not go down. However, the numbers showing up on my webpage go up and down as if the buffer is working on multiple sessions or reading from something that is being updated even though the PHP stopped about an hour ago.
Is there something I'm missing or need to keep clear like a buffer or a reset button?
This is the most I can show, I just slapped it together a really long time ago. If you know of better code then please share, I love any help possible. But, I'm sort of new so please explain things outside of basic functions.
AJAX
//open our json file
ajaxRequest.onreadystatechange = function(){
if(ajaxRequest.readyState == 4){
//display json file contents
document.form.total_emails.value = ajaxRequest.responseText;
}
}
ajaxRequest.open("GET", "test_results.php", true);
ajaxRequest.send(null);
PHP
//get addresses and links
for($x=(int)0; $x<=$limit; $x++){
$input = get_link_contents($link_list[0]);
array_shift($link_list);
$link_list = ($x%100==0 || $x==5)?filter_urls($link_list,$blacklist):$link_list;
//add the links to the link list and remove duplicates
if(count($link_list) <= 1000) {
preg_match_all($link_reg, $input, $new_links);
$link_list = array_merge($link_list, $new_links);
$link_list = array_unique(array_flatten($link_list));
}
//check the addresses against the blacklist before adding to a a file in JSON
$res = preg_match_all($regex, $input, $matches);
if ($res) {
foreach(array_unique($matches[0]) as $address) {
if(!strpos_arr($address,$blacklist)){
$enum++;
json_file($results_file,$link_list[0],$enum,$x);
write_addresses_to_file($address, $address_file);
}
}
}
unset($input, $res, $efile);
}
The symptoms might indicate the PHP script not closing the file properly after writing, and/or a race condition where the AJAX routine is fetching the JSON data in between the PHP's fopen() and the new data being written.
A possible solution would be for the PHP script to write to a temp file, then rename to the desired filename after the data is written and the file is properly closed.
Also, it's a good idea to check response.status == 200 as well as response.readyState == 4.
Tools like ngrep and tcpdump can help debugging this type of problem.
I know this is a bit generic, but I'm sure you'll understand my explanation. Here is the situation:
The following code is executed every 10 minutes. Variable "var_x" is always read/written to an external text file when its refereed to.
if ( var_x != 1 )
{
var_x = 1;
//
// here is where the main body of the script is.
// it can take hours to completely execute.
//
var_x = 0;
}
else
{
// exit script as it's already running.
}
The problem is: if I simulate a hardware failure (do a hard reset when the script is executing) then the main script logic will never execute again because "var_x" will always be "1". (I already have logic to work out the restore point).
Thanks.
You should lock and unlock files with flock:
$fp = fopen($your_file);
if (flock($fp, LOCK_EX)) { )
{
//
// here is where the main body of the script is.
// it can take hours to completely execute.
//
flock($fp, LOCK_UN);
}
else
{
// exit script as it's already running.
}
Edit:
As flock seems not to work correctly on Windows machines, you have to resort to other solutions. From the top of my head an idea for a possible solution:
Instead of writing 1 to var_x, write the process ID retrieved via getmypid. When a new instance of the script reads the file, it should then lookup for a running process with this ID, and if the process is a PHP script. Of course, this can still go wrong, as there is the possibility of another PHP script obtaining the same PID after a hardware failure, so the solution is far from optimal.
Don't you think this would be better solved using file locks? (When the reset occurs file locks are reset as well)
http://php.net/flock
It sounds like you're doing some kind of manual semaphore for process management.
Rather than writing to a file, perhaps you should use an environment variable instead. That way, in the event of failure, your script will not have a closed semaphore when you restore.
I have a list of data that needs to be processed. The way it works right now is this:
A user clicks a process button.
The PHP code takes the first item that needs to be processed, takes 15-25 secs to process it, moves on to the next item, and so on.
This takes way too long. What I'd like instead is that:
The user clicks the process button.
A PHP script takes the first item and starts to process it.
Simultaneously another instance of the script takes the next item and processes it.
And so on, so around 5-6 of the items are being process simultaneously and we get 6 items processed in 15-25 secs instead of just one.
Is something like this possible?
I was thinking that I use CRON to launch an instance of the script every second. All items that need to be processed will be flagged as such in the MySQL database, so whenever an instance is launched through CRON, it will simply take the next item flagged to be processed and remove the flag.
Thoughts?
Edit: To clarify something, each 'item' is stored in a mysql database table as seperate rows. Whenever processing starts on an item, it is flagged as being processed in the db, hence each new instance will simply grab the next row which is not being processed and process it. Hence I don't have to supply the items as command line arguments.
Here's one solution, not the greatest, but will work fine on Linux:
Split the processing PHP into a separate CLI scripts in which:
The command line inputs include `$id` and `$item`
The script writes its PID to a file in `/tmp/$id.$item.pid`
The script echos results as XML or something that can be read into PHP to stdout
When finished the script deletes the `/tmp/$id.$item.pid` file
Your master script (presumably on your webserver) would do:
`exec("nohup php myprocessing.php $id $item > /tmp/$id.$item.xml");` for each item
Poll the `/tmp/$id.$item.pid` files until all are deleted (sleep/check poll is enough)
If they are never deleted kill all the processing scripts and report failure
If successful read the from `/tmp/$id.$item.xml` for format/output to user
Delete the XML files if you don't want to cache for later use
A backgrounded nohup started application will run independent of the script that started it.
This interested me sufficiently that I decided to write a POC.
test.php
<?php
$dir = realpath(dirname(__FILE__));
$start = time();
// Time in seconds after which we give up and kill everything
$timeout = 25;
// The unique identifier for the request
$id = uniqid();
// Our "items" which would be supplied by the user
$items = array("foo", "bar", "0xdeadbeef");
// We exec a nohup command that is backgrounded which returns immediately
foreach ($items as $item) {
exec("nohup php proc.php $id $item > $dir/proc.$id.$item.out &");
}
echo "<pre>";
// Run until timeout or all processing has finished
while(time() - $start < $timeout)
{
echo (time() - $start), " seconds\n";
clearstatcache(); // Required since PHP will cache for file_exists
$running = array();
foreach($items as $item)
{
// If the pid file still exists the process is still running
if (file_exists("$dir/proc.$id.$item.pid")) {
$running[] = $item;
}
}
if (empty($running)) break;
echo implode($running, ','), " running\n";
flush();
sleep(1);
}
// Clean up if we timeout out
if (!empty($running)) {
clearstatcache();
foreach ($items as $item) {
// Kill process of anything still running (i.e. that has a pid file)
if(file_exists("$dir/proc.$id.$item.pid")
&& $pid = file_get_contents("$dir/proc.$id.$item.pid")) {
posix_kill($pid, 9);
unlink("$dir/proc.$id.$item.pid");
// Would want to log this in the real world
echo "Failed to process: ", $item, " pid ", $pid, "\n";
}
// delete the useless data
unlink("$dir/proc.$id.$item.out");
}
} else {
echo "Successfully processed all items in ", time() - $start, " seconds.\n";
foreach ($items as $item) {
// Grab the processed data and delete the file
echo(file_get_contents("$dir/proc.$id.$item.out"));
unlink("$dir/proc.$id.$item.out");
}
}
echo "</pre>";
?>
proc.php
<?php
$dir = realpath(dirname(__FILE__));
$id = $argv[1];
$item = $argv[2];
// Write out our pid file
file_put_contents("$dir/proc.$id.$item.pid", posix_getpid());
for($i=0;$i<80;++$i)
{
echo $item,':', $i, "\n";
usleep(250000);
}
// Remove our pid file to say we're done processing
unlink("proc.$id.$item.pid");
?>
Put test.php and proc.php in the same folder of your server, load test.php and enjoy.
You will of course need nohup (unix) and PHP cli to get this to work.
Lots of fun, I may find a use for it later.
Use an external workqueue like Beanstalkd which your PHP script writes a bunch of jobs too. You have as many worker processes pulling jobs from beanstalkd and processing them as fast as possible. You can spin up as many workers as you have memory / CPU. Your job body should contain as little information as possible, maybe just some IDs which you hit the DB with. beanstalkd has a slew of client APIs and itself has a very basic API, think memcached.
We use beanstalkd to process all of our background jobs, I love it. Easy to use, its very fast.
There is no multithreading in PHP, however you can use fork.
php.net:pcntl-fork
Or you could execute a system() command and start another process which is multithreaded.
can you implementing threading in javascript on the client side? seems to me i've seen a javascript library (from google perhaps?) that implements it. google it and i'm sure you'll find something. i've never done it, but i know its possible. anyway, your client-side javascript could activate (ajax) a php script once for each item in separate threads. that might be easier than trying to do it all on the server side.
-don
If you are running a high traffic PHP server you are INSANE if you do not use Alternative PHP Cache: http://php.net/manual/en/book.apc.php . You do not have to make code modifications to run APC.
Another useful technique that can work along with APC is using the Smarty template system which allows you to cache output so that pages do not have to be rebuilt.
To solve this problem, I've used two different products; Gearman and RabbitMQ.
The benefit of putting your jobs into some sort of queuing software like Gearman or Rabbit is that you have multiple machines, they can all participate in processing items off the queue(s).
Gearman is easier to setup, so I'd suggest poking around with it a bit first. If you find you need something more heavy duty with queue robustness; Look into RabbitMQ
http://www.danga.com/gearman/
http://pear.php.net/package/Net_Gearman (PEAR library)
You can use pcntl_fork() and family to fork a process - however you may need something like IPC to communicate back to the parent process that the child process (the one you fork'd) is finished.
You could have them write to shared memory, like via memcache or a DB.
You could also have the child process write the completed data to a file, that the parent process keeps checking - as each child process completes the file is created/written to/updated, and parent process can grab it, one at a time, and them throw them back to the callee/client.
The parent's job is to control the queue, to make sure the same data isn't processed twice and also to sanity check the children (better kill that runaway process and start over...etc)
Something else to keep in mind - on windows platforms you are going to be severely limited - I dont even think you have access to pcntl_ unless you compiled PHP with support for it.
Also, can you cache the data once its been processed, or is it unique data every time? that would surely speed things up..?