I have a PHP web crawler that just checks out websites. I decided a few days ago to make the crawlers progress show in real time using AJAX. The php script writes to a file in JSON and AJAX reads the tiny file.
I double and triple checked my PHP script wondering what the hell was going on because after I finished the simple AJAX script the data appearing on my browser leaped up and down in strange directions.
The php script executed perfectly and very quickly but my AJAX would slowly increase the values, every 2 seconds as set, then drop. The numbers only increase in PHP they do not go down. However, the numbers showing up on my webpage go up and down as if the buffer is working on multiple sessions or reading from something that is being updated even though the PHP stopped about an hour ago.
Is there something I'm missing or need to keep clear like a buffer or a reset button?
This is the most I can show, I just slapped it together a really long time ago. If you know of better code then please share, I love any help possible. But, I'm sort of new so please explain things outside of basic functions.
AJAX
//open our json file
ajaxRequest.onreadystatechange = function(){
if(ajaxRequest.readyState == 4){
//display json file contents
document.form.total_emails.value = ajaxRequest.responseText;
}
}
ajaxRequest.open("GET", "test_results.php", true);
ajaxRequest.send(null);
PHP
//get addresses and links
for($x=(int)0; $x<=$limit; $x++){
$input = get_link_contents($link_list[0]);
array_shift($link_list);
$link_list = ($x%100==0 || $x==5)?filter_urls($link_list,$blacklist):$link_list;
//add the links to the link list and remove duplicates
if(count($link_list) <= 1000) {
preg_match_all($link_reg, $input, $new_links);
$link_list = array_merge($link_list, $new_links);
$link_list = array_unique(array_flatten($link_list));
}
//check the addresses against the blacklist before adding to a a file in JSON
$res = preg_match_all($regex, $input, $matches);
if ($res) {
foreach(array_unique($matches[0]) as $address) {
if(!strpos_arr($address,$blacklist)){
$enum++;
json_file($results_file,$link_list[0],$enum,$x);
write_addresses_to_file($address, $address_file);
}
}
}
unset($input, $res, $efile);
}
The symptoms might indicate the PHP script not closing the file properly after writing, and/or a race condition where the AJAX routine is fetching the JSON data in between the PHP's fopen() and the new data being written.
A possible solution would be for the PHP script to write to a temp file, then rename to the desired filename after the data is written and the file is properly closed.
Also, it's a good idea to check response.status == 200 as well as response.readyState == 4.
Tools like ngrep and tcpdump can help debugging this type of problem.
Related
I have a PHP webpage, that makes multiple queries to the database and displays the results on charts.
The logic is, there is index.php, where the query can be made. After submitting the data, 6 different PHP pages are called. The PHP pages log the query, run the appropriate Python script and make charts with Javascript. Each of those 6 PHP pages are displayed in index.php in divs. All of the python scripts have the same input and queried against the same database. Difference comes from the data pulled from the database and also the subsequent Javascript to make the charts.
Example of calling one of the PHP pages:
$("#chartFOO").load("http://example/test/get_foo.php? bar=".concat(bar)+"&start=".concat(start)+"&end=".concat(end), function(responseTxt, statusTxt, xhr){
if(statusTxt == "error")
alert("Error: " + xhr.status + ": " + xhr.statusText);
});
Example of calling the Python script:
if ($msisdn) {
$command = escapeshellcmd("/home/example/scripts/graph_foo.py $bar $start $end");
$output = shell_exec($command);
}
And the output is then used in the PHP file, to make charts. All of the PHP files are displayed in divs with different styling on index.php.
The problem is, it doesn't run them on multiple threads and locks up the system, which makes the response time for the query quite slow. Is that right, that only one shell command can be ran at a time?!
I have tried putting all the Python scripts as functions and 6 of the PHP files as strings in one file. Trying to call it all with one command, but so far I have problems with formating the PHP files, I can't use '{}' to format, because the PHP files already contain those. Had the idea to use threading module, to run the functions. And use one connection to the database, to save time from connecting 6 times, because it takes time each time.
Is there any reasonable solution to have the scripts run threaded, without having to rework the whole webpage? How can PHP, Javascript and Python be mixed?
A lot to read and a lot to ask, but thanks for advance for your time.
EDIT:
I created a new file, which basically has all the 6 files in it. But calling the Python scripts is a bit different now. And from index.php only calling this one file now, like I did before with 6 files.
Example of new way:
$part->handles = [
popen("/home/example/scripts/graph_foo.py {$bar} {$start} {$end}", 'r'),
popen("/home/example/scripts/graph_foo2.py {$bar} {$start} {$end}", 'r')
];
And the way I solved the memory issue:
$output0 = '';
while (!feof($part->handles[0])) {
$output0 .= fread($part->handles[0], 32768);
}
$output1 = '';
while (!feof($part->handles[1])) {
$output1 .= fread($part->handles[1], 32768);
}
Don't know if the best way, but works. Don't know PHP well. But it did get 0.5 minutes off the request time, which helps.
I have a PHP script that has to reload a page on the client (server push) when something specific happens on the server. So I have to listen for changes. My idea is to have a text file that contains the number of page loads for the current page. So I would like to monitor the file and as soon as it is modified, to use server push in order to update the content on the client. The question is how to track the file for changes in PHP?
You could do something like:
<?php
while(true){
$file = stat('/file');
if($file['mtime'] == time()){
//... Do Something Here ..//
}
sleep(1);
}
This will continuously look for a change in the modified time of a file every second. If you don't constrain it you could kill your disk IO and may need to adjust your ulimit.
This will check your file for a change:
<?php
$current_contents = "";
function checkForChange($filepath) {
global $current_contents;
$new_contents = file_get_contents($filepath);
if (strcmp($new_contents, $current_contents) {
$current_contents = $new_contents;
return true;
}
return false;
}
But that will not solve your problem. The php file that serves the client finishes executing before the rendered html is sent to the client. That client will need to call back to some php file to check for a change... and since that is also a http request, the file will finish executing and forget anything in memory.
In order to properly solve this, you'll probably have to back off the idea of checking a file. Either the server needs to know when and how to contact currently connected clients, or those clients need to poll a lightweight service at a regular interval.
This is sort of hacky but what about creating a cron job that sucks in the page, stores it in a scope or table, and then simply compares it every 30 seconds?
I want to write a small management tool to oversee my server processes, my problem is now, how could i wait for user input and at same time update the screen with current stats? Is it even possible with PHP-CLI or are there any tricks for doing this I missing currently?
I have looked into newt and ncurses PECL extensions but both doesn't seem to fit my needs.
Go for libevent http://www.php.net/manual/en/book.libevent.php
You can run your main loop while listening to console with a code roughly like this one:
<?php
// you need libevent, installable via PEAR
$forever=true;
$base=event_base_new();
$console=event_buffer_new(STDIN,"process_console");
event_buffer_base_set($console,$base);
event_buffer_enable($console,EV_READ);
while ($forever) {
event_base_loop($base,EVLOOP_NONBLOCK); // Non blocking poll to console listener
//Do your video update process
}
event_base_free($base); //Cleanup
function process_console($buffer,$id) {
global $base;
global $forever;
$message='';
while ($read = event_buffer_read($buffer, 256)) {
$message.=$read;
}
$message=trim($message);
print("[$message]\n");
if ($message=="quit") {
event_base_loopexit($base);
$forever=false;
}
else {
//whatever.....
}
}
I don't think you can do it with PHP CLI. As I know, when interpret the script with PHP, you can only view the final output.
I think you do want ncurses. If you can convert the simple example C code here, which you should be able to with the PHP wrapper, you'd have your "bootstrap" for solving your problem.
Make sure to blog your code somewhere! :)
My advice would be to try to avoid any solutions that talk about leaving processes running whilst exiting PHP. Here is a really simple example of how to do it with a bit of jQuery:
window.setInterval(checkstat, 10000); //10 second interval
function checkstat() {
//Change a div with id stat to show updating (don't need this but it's nice)
$('#stat').html('Updating...');
$.get('/getmystats.php?option=blah', function(data) {
//Update the results when the data is returned.
$('#stat').html(data);
});
}
If you are need to update more than one area on your page, you can do one call but return JSON or XML and then populate the bits as required.
My php script uses php simplehtmldom to parse html and get all the links and images that I want and this can run for a duration depending on the amount of images to download.
I thought it would be good idea to allow cancelling in this case. Currently I call my php using Jquery-Ajax, the closest thing I could find is php register_shutdown_function but not sure if it can work for my case. Any ideas?
So once php is launched, it cant be disturbed? like fire ajax again to call an exit to the same php file?
This is good only in case you are processing really massive data loads through AJAX. For other cases, just handle it in JS to not display result if canceled.
But as I said If you are processing huge loads of data, then you can add a interrupt condition in every nth step of running script and fulfill that condition using another script. For example you can use a file to store a interrupt data, or MySQL MEMORY table.
Example.
1, process.php (ajax script processing loads of data)
// clean up previous potential interrupt flag
$fileHandler = fopen('interrupt_condition.txt', 'w+');
fwrite($fileHandler, '0');
fclose($fileHandler);
function interrupt_check() {
$interruptfile = file('interrupt_condition.txt');
if (trim($interruptfile[0]) == "1") { // read first line, trim it and parse value - if value == 1 interrupt script
echo json_encode("interrupted" => 1);
die();
}
}
$i = 0;
foreach ($huge_load_of_data as $object) {
$i++;
if ($i % 10 == 0) { // check for interrupt condition every 10th record
interrupt_check();
}
// your processing code
}
interrupt_check(); // check for last time (if something changed while processing the last 10 entries)
2, interrupt_process.php (ajax script to propagate cancel event to file)
$fileHandler = fopen('interrupt_condition.txt', 'w+');
fwrite($fileHandler, '1');
fclose($fileHandler);
This will definitely affect performance of your script, but makes you a backdoor to close execution. This is very simple example - you need to make it more complex to make it work for more users simultaneously, etc.
You can also use MySQL MEMORY Table, MEMCACHE - Non-persistent Caching Server or whatever non-persistent storage you could find.
Ok here is my problem.
I have a file which outputs an XML based on an input X
I have another file which calls the above(1) file with 10000 (i mean many) times with different numbers for X
When an user clicks "Go" It should go through all those 10000 Xs and simultaneously show him a progress of how many are done. (hmm may be updated once every 10sec).
How do i do it? I need ideas. I know how to AJAX and stuff, but whats the structure my program should take?
EDIT
So according to the answer given below i did store my output in a session variable. It then outputs the answer. What is happening is:
When i execute a loong script. It gets executed say within 1min. But in the mean time if i open (in a new window) just the file which outputs my SESSION variable, then it doesnt output will the first script has run. Which is completely opposite to what i want. Whats the problem here? Is it my syste/server which doesnt handle multiple requests or what?
EDIT 2
I use the files approach:
To read what i want
> <?php include_once '../includeTop.php'; echo
> util::readFromLog("../../Files/progressData.tmp"); ?>
and in another script
$processed ++;
util::writeToLog($dir.'/progressData.tmp', "Files processed: $processed");
where the functions are:
public static function writeToLog($file,$data) {
$f = fopen($file,"w");
fwrite($f, $data);
fclose($f);
}
public static function readFromLog($file) {
return file_get_contents($file);
}
But still the same problem persist :(. I can manually see the file gettin updated like 1, 2, 3 etc. But when i run my script to do from php it just waits till my original script is output.
EDIT 3
Ok i finally found the solution. Instead of seeking the output from the php file i directly goto the log now and seek it.
Put the progress (i.e. how far are you into the 2nd file) into a memcached directly from the background job, then deliver that value if requested by the javascript application (triggered by a timer, as long as you did not reach a 100%). The only thing you need to figure out is how to pass some sort of "transaction ID" to both the background job and the javascript side, so they access the same key in memcached.
Edit: I was wrong about $_SESSION. It doesn't update asynchronously, i.e. the values you store in it are not accessible until the script has finished. Whoops.
So the progress needs to be stored in something that does update asynchronously: Memory (like pyroscope suggests, and which is still the best solution), a file, or the database.
In other words, instead of using $_SESSION to store the value, it should be stored by memcached, in a file or in the database.
I.e. using the database
$progress = 0;
mysql_query("INSERT INTO `progress` (`id`, `progress`) VALUES ($uid, $progress)");
# loop starts
# processing...
$progress += $some_increment;
mysql_query("UPDATE `progress` SET `progress`=$progress WHERE `id`=$uid");
# loop ends
Or using a file
$progress = 0;
file_put_contents("/path/to/progress_files/$uid", $progress);
# loop starts
# processing...
$progress += $some_increment;
file_put_contents("/path/to/progress_files/$uid", $progress);
# loop ends
And then read the file/select from the database, when requesting progress via ajax. But it's not a pretty solution compared to memcached.
Also, remember to remove the file/database row once it's all done.
You could put the progress in a $_SESSION variable (you'll need a unique name for it), and update it while the process runs. Meanwhile your ajax request simply gets that variable at a specific interval
function heavy_process($input, $uid) {
$_SESSION[$uid] = 0;
# loop begins
# processing...
$_SESSION[$uid] += $some_increment;
# loop ends
}
Then have a url that simply spits out the $_SESSION[$uid] value when it's requested via ajax. Then use the returned value to update the progress bar. Use something like sha1(microtime()) to create the $uid
Edit: pyroscope's solution is technically better, but if you don't have a server with memcached or the ability to run background processes, you can use $_SESSION instead