How to split Long Running PHP Scripts - php

I using simple_html_dom for scrape pages website, the problem is if i want to scrape many page like 500 url pages that takes a long time (5-30 minutes) to complete, and thats make my server error 500.
Some of these things I've done is:
try using set_time_limit
set ini_set('max_execution_time')
add delay() timing
I many read from stackoverflow to use cronjob to split Long Running PHP Scripts, my question is How to split Long Running PHP Scripts ? can u give best way to split it ? can u give me step by step script because iam a beginner.
About my program, i have two file :
file 1, i have array more than 500 link url
file 2, this file have function to process scrape
example this is file 1:
set_time_limit(0);
ini_set('max_execution_time', 3000); //3000 seconds = 30 minutes
$start = microtime(true); // start check render time page
error_reporting(E_ALL);
ini_set('display_errors', 1);
include ("simple_html_dom.php");
include ("scrape.php");
$link=array('url1','url2','url3'...);
array_chunk($link, 25); // this i try to split for 25 but not working
$hasilScrape = array();
for ( $i=1; $i<=count($link); $i++){
//this is the process i want to call function get_data to scrape
$hasilScrape[$i-1] = json_decode(get_data($link[$i-1]), true);
}
$filename='File_Hasil_Scrape';
$fp = fopen($filename . ".csv", 'w');
foreach ($hasilScrape as $fields) {
fputcsv($fp, $fields);
}
fclose($fp);
i have thinking can i split array link for 25 array and thank i pause or make it stop for temporary (NOT DELAY because i have been try it no useless) the proses and run again, can u tell me please, thank you so much.

Related

PHP - Saving array to file - Not stacking up

Really stumped on this one and feel like an idiot! I have a small PHP cron job that does it's thing every few minutes. The client has requested that the app emails them with a daily overview of issues raised....
To do this, I decided to dump an array to a file for storage purposes. I decided against a SQL DB to keep this standalone and lightweight.
What I want to do is open said file, add to a set of numbers and save again.
I have tried this with SimpleXML and serialize/file_put_contents.
The issue I have is what is written to file does not correspond with the array being echo'd the line before. Say I'm adding 2 to the total, the physical file has added 4.
The following is ugly and just a snippet:
echo "count = ".count($result);"<br/>";
$arr = loadLog();
dumpArray($arr, "Pre Load");
$arr0['count'] = $arr['count']+(count($result));
echo "test ".$arr0['count'];
dumpArray($arr0, "Pre Save");
saveLog($arr0);
sleep(3);
$arr1 = loadLog();
dumpArray($arr1, "Post Save");
function saveLog($arr){
$content = serialize($arr);
var_dump($content);
file_put_contents(STATUS_SOURCE, $content);
}
function loadLog(){
$content = unserialize(file_get_contents(STATUS_SOURCE));
return $content;
}
function dumpArray($array, $title = false){
echo "<p><h1>".$title."</h1><pre>";
var_dump($array);
echo "</pre></p>";
}
Output View here
Output File: a:1:{s:5:"count";i:96;}
I really appreciate any heads up - Have had someone else look who also scratched his head.
Check .htaccess isn't sending 404 errors to the same script. Chrome was looking for favicon.ico which did not exist. This caused the script to execute a second time.

PHP: Cancel running script using POST/AJAX?

My php script uses php simplehtmldom to parse html and get all the links and images that I want and this can run for a duration depending on the amount of images to download.
I thought it would be good idea to allow cancelling in this case. Currently I call my php using Jquery-Ajax, the closest thing I could find is php register_shutdown_function but not sure if it can work for my case. Any ideas?
So once php is launched, it cant be disturbed? like fire ajax again to call an exit to the same php file?
This is good only in case you are processing really massive data loads through AJAX. For other cases, just handle it in JS to not display result if canceled.
But as I said If you are processing huge loads of data, then you can add a interrupt condition in every nth step of running script and fulfill that condition using another script. For example you can use a file to store a interrupt data, or MySQL MEMORY table.
Example.
1, process.php (ajax script processing loads of data)
// clean up previous potential interrupt flag
$fileHandler = fopen('interrupt_condition.txt', 'w+');
fwrite($fileHandler, '0');
fclose($fileHandler);
function interrupt_check() {
$interruptfile = file('interrupt_condition.txt');
if (trim($interruptfile[0]) == "1") { // read first line, trim it and parse value - if value == 1 interrupt script
echo json_encode("interrupted" => 1);
die();
}
}
$i = 0;
foreach ($huge_load_of_data as $object) {
$i++;
if ($i % 10 == 0) { // check for interrupt condition every 10th record
interrupt_check();
}
// your processing code
}
interrupt_check(); // check for last time (if something changed while processing the last 10 entries)
2, interrupt_process.php (ajax script to propagate cancel event to file)
$fileHandler = fopen('interrupt_condition.txt', 'w+');
fwrite($fileHandler, '1');
fclose($fileHandler);
This will definitely affect performance of your script, but makes you a backdoor to close execution. This is very simple example - you need to make it more complex to make it work for more users simultaneously, etc.
You can also use MySQL MEMORY Table, MEMCACHE - Non-persistent Caching Server or whatever non-persistent storage you could find.

Crazy buffer from Ajax and php script

I have a PHP web crawler that just checks out websites. I decided a few days ago to make the crawlers progress show in real time using AJAX. The php script writes to a file in JSON and AJAX reads the tiny file.
I double and triple checked my PHP script wondering what the hell was going on because after I finished the simple AJAX script the data appearing on my browser leaped up and down in strange directions.
The php script executed perfectly and very quickly but my AJAX would slowly increase the values, every 2 seconds as set, then drop. The numbers only increase in PHP they do not go down. However, the numbers showing up on my webpage go up and down as if the buffer is working on multiple sessions or reading from something that is being updated even though the PHP stopped about an hour ago.
Is there something I'm missing or need to keep clear like a buffer or a reset button?
This is the most I can show, I just slapped it together a really long time ago. If you know of better code then please share, I love any help possible. But, I'm sort of new so please explain things outside of basic functions.
AJAX
//open our json file
ajaxRequest.onreadystatechange = function(){
if(ajaxRequest.readyState == 4){
//display json file contents
document.form.total_emails.value = ajaxRequest.responseText;
}
}
ajaxRequest.open("GET", "test_results.php", true);
ajaxRequest.send(null);
PHP
//get addresses and links
for($x=(int)0; $x<=$limit; $x++){
$input = get_link_contents($link_list[0]);
array_shift($link_list);
$link_list = ($x%100==0 || $x==5)?filter_urls($link_list,$blacklist):$link_list;
//add the links to the link list and remove duplicates
if(count($link_list) <= 1000) {
preg_match_all($link_reg, $input, $new_links);
$link_list = array_merge($link_list, $new_links);
$link_list = array_unique(array_flatten($link_list));
}
//check the addresses against the blacklist before adding to a a file in JSON
$res = preg_match_all($regex, $input, $matches);
if ($res) {
foreach(array_unique($matches[0]) as $address) {
if(!strpos_arr($address,$blacklist)){
$enum++;
json_file($results_file,$link_list[0],$enum,$x);
write_addresses_to_file($address, $address_file);
}
}
}
unset($input, $res, $efile);
}
The symptoms might indicate the PHP script not closing the file properly after writing, and/or a race condition where the AJAX routine is fetching the JSON data in between the PHP's fopen() and the new data being written.
A possible solution would be for the PHP script to write to a temp file, then rename to the desired filename after the data is written and the file is properly closed.
Also, it's a good idea to check response.status == 200 as well as response.readyState == 4.
Tools like ngrep and tcpdump can help debugging this type of problem.

How can I restrict the amount of time a PHP include will wait for a result?

I'm including a local class that requests a file from a remote server. This process is rather unreliable — because the remote server is often overloaded — and I sometimes have to wait 20 or so seconds before the include gives up and continues.
I would like to have a limit on the execution time of the included script; say, five seconds.
Current code:
include('siteclass.class.php');
Update:
My code inside the class:
$movie = str_replace(" ","+",$movie);
$string = join('',file($siteurl.$l.'/moviename-'.$movie));
if(!$i) { static $i = 1;}
if($file_array = $string)
{
$result = Return_Substrings($file_array, '<item>', '</item>');
foreach($result as $res) {
That's basically it, as far as the loading goes. The internal processing takes about 0.1 s. I guess that's pretty doable.
Note that I didn't test this code, take this like a proposition :
$fp = fopen('siteclass.class.php', 'r');
stream_set_timeout($fp, 2);
stream_set_timeout($fp,$timeout);
$info = stream_get_meta_data($fp);
if ($info['timed_out']) {
echo "Connection Timed Out!";
} else {
$file = '';
while (!feof($fp)) {
$file .= fgets($fp);
}
eval($file);
}
The timeout is set in seconds, so the example set it to two seconds.
This isn't an exact fit to what you're looking for, but this will set the time limit for the include and execution to a total of 25 seconds. If the time limit is reached, it throws a fatal error.
set_time_limit(25);
It sounds like set_time_limit() might do what you want:
PHP Manual for that function
Fix the included code to have a timeout on the HTTP Request and then recover nicely, instead of just aborting by setting a time limit on the script itself.
My advice would be to get to the root of the problem instead of looking for a workaround.

AJAX - Progress bar for a shell command that is executed

I am making use of AJAX on my site and I would like to show users progress of a file that is being downloaded by my server.
The download is done by script that outputs a percentage to the shell. I would like to pass this info back to the user using AJAX. How can I do this?
Thank you for any help and direction.
I hope your solutions do not involve writing to a text file and retrieving that percentage from the text file!! Too much over head I think.
EDIT - More Info
It is a Linux Shell command - Fedora Core 10.
Currently this is how the shell output looks like:
[download] 9.9% of 10.09M at 10.62M/s ETA 00:00
The percentage changes and I wish to capture that and send it back to the user as it changes.
To execute this, I make use of PHPs exec() function.
Instead of exec, you could use popen. This will give you a handle you use with fread to grab the output your command generates as it happens.
You'll need to parse out the updates it makes to the percentage indicator. Once you have that data, there are a few ways you could get it to a client, e.g. with a "comet" style push, or have an Ajax request poll for updates.
I haven't tried this, but I think this approach would work.
You need three pieces:
Have shell script output its stream to netcat connected to a port
Have a php script listening to stream coming from said port for incoming data, updating a record in memcache or some database w/ the percentage finished.
Have your web script periodically make ajax calls, to the server which checks this value in your backend store.
I'm working on a similar problem. I have to parse the output of my video conversion shell script. I use popen and parse the output of the returned resource. At first I used fgets but that didn't recognize the updated values as new lines. So I created a simple function to that takes an optional $arg_delimiter so you can check for other return types like the chr(13) cariage return. The example code is a bit modified and therefor untested because in my case these functions were methods on my parser object.
function get_line ($arg_handle, $arg_delimiter = NULL)
{
$delimiter = (NULL !== $arg_delimiter) ? $arg_delimiter : chr(10);
$result = array();
while ( ! feof($arg_handle))
{
$currentCharacter = fgetc($arg_handle);
if ($delimiter === $currentCharacter)
{
return implode('', $result);
}
$result[] = $currentCharacter;
}
return implode('', $result);
}
I simply loop over the results from the popen() resource like this:
$command = '/usr/bin/yourcommand';
$handle = popen($command . ' 2>&1', 'r');
while ( ! feof($handle))
{
$line = get_line($handle, chr(13));
preg_match($yourParserRegex, $line, $data);
if (count($data) > 0)
{
printf("<script type='text/javascript'>\n //<![CDATA[\n window.alert('Result: %s');\n // ]]>\n</script>"
,$data[1]
);
flush();
}
}
Now all you need to do is figure out the comet stuff.

Categories