I have a hefty PHP script.
So much so that I have had to do
ini_set('memory_limit', '3000M');
set_time_limit (0);
It runs fine on one server, but on another I get: Out of memory (allocated 1653342208) (tried to allocate 71 bytes) in /home/writeabo/public_html/propturk/feedgenerator/simple_html_dom.php on line 848
Both are on the same package from the same host, but different servers.
Above Problem solved new problem below for bounty
Update: The script is so big because it rawls a site and parsers data from 252 pages, including over 60,000 images, which it makes two copies of. I have since broken it down into parts.
I have another problem now though. when I am writing the image from outside site to server like this:
try {
$imgcont = file_get_contents($va); // $va is an img src from an array of thousands of srcs
$h = fopen($writeTo,'w');
fwrite($h,$imgcont);
fclose($h);
} catch(Exception $e) {
$error .= (!isset($error)) ? "error with <img src='" . $va . "' />" : "<br/>And <img src='" . $va . "' />";
}
All of a sudden it goes to a 500 internal server error page and I have to do it again, at which point it works, because files are only copied it they don't already exist. Is there anyway I can receive the 500 response code and send it back it to the url to make it go again? As this is to all be an automated process?
If this is memory related, I would personally use copy() rather than file_get_contents(). It supports the file wrappers the same way, and I don't see any advantage in loading the whole file in memory just to write it back on the filesystem.
Otherwise, your error_log might give you more information as of why the 500 happens.
There are three parties involved here:
Remote - The server(s) that contain the images you're after
Server - The computer that is running your php script
Client - Your home computer if you are running the script from a web browser, or the same computer as the server if you are running it from Cron.
Is the 500 error you are seeing being generated by 'Remote' and seen by 'Server' (i.e. the images are temporarily unavailable);
Or is it being generated by 'Server' and seen by 'Client' (i.e. there is a problem with your script).
If it is being generated by 'Remote', then see Ali's answer for how to retry.
If it is being generated by your script on 'Server', then you need to identify exactly what the error is - the php error logs should give you more information. I can think of two likely causes:
Reaching PHP's time limit. PHP will only spend a certain amount of time working before returning a 500 error. You can set this to a higher value, or regularly re-set the timer with a call to set_time_limit(), but that won't work if your server is configured in safe mode.
Reaching PHP's memory limit. You seem to have encoutered this already, but worth making sure you're script still isn't eating lots of memory. Consider outputing debug data (possibly only if you set $config['debug_mode'] = true or something). I'd suggest:
try {
echo 'Getting '.$va.'...';
$imgcont = file_get_contents($va); // $va is an img src from an array of thousands of srcs
$h = fopen($writeTo,'w');
fwrite($h,$imgcont);
fclose($h);
echo 'saved. Memory usage: '.(memory_get_usage() / (1024 * 1024)).' <br />';
unset($imgcont);
} catch(Exception $e) {
$error .= (!isset($error)) ? "error with <img src='" . $va . "' />" : "<br/>And <img src='" . $va . "' />";
}
I've also added a line to remove the image from memory, incase PHP isn't doing this correctly itself (in theory that line shouldn't be necessary).
You can avoid both problems by making your script process fewer images at a time and calling it regularly - either using Cron on the server (the ideal solution, although not all shared webhosts allow this), or some software on your desktop computer. If you do this, make sure you consider what will happen if there are two copies of the script running at the same time - will they both fetch the same image at the same time?
So it sounds like you're running this process via a web browser. I'm guessing that you may be getting the 500 error from Apache timing out somehow after a certain period of time or the process dies or something funky. I would suggest you do one of the following:
A) Move the image downloading to a background process, you can run the crawl script in the browser which will write the urls of the images to be downloaded to the db or something and another script will fire up via cron and fetch all the images. You could also have this script work in batches of 100 or so at a time to keep memory consumption down
B) Call the script directly from the command line (this is really the preferred method for something like this anyway, and you should still probably separate the image fetching to another script)
C) If the command line is not an option for some reason, have your browser loaded script touch a file, and have a cron that runs every minute and looks for the file to exist. Then it fires up your script, you can have the output written to a file for you to check later or send an email when it's completed
Is there anyway I can receive the 500 response code and send it back it to the url to make it go again? As this is to all be an automated process?
Here's the simple version of how I would do it:
function getImage($va, $writeTo, $retries = 3)
{
while ($retries > 0) {
if ($imgcont = file_get_contents($va)) {
file_put_contents($writeTo, $imgcont);
return true;
}
$retries--;
}
return false;
}
This doesn't create the file unless we successfully get our image file, and will retry three times by default. You will of course need to add any require exception handling, error checking, etc.
I would definitely stop using file_get_contents() and write the files in chunks, like this:
$read = fopen($url, 'rb');
$write = fope($local, 'wb');
$chunk = 8096;
while (!feof($read)) {
fwrite($write, fread($read, $chunk));
}
fclose($fp);
This will be nicer to your server, and should hopefully solve your 500 problems. As for "catching" a 500 error, this is simply not possible. It is an irretrievable error thrown by your script and written to the client by the web server.
I'm with Swish, this is not really the kind of task that PHP is intended for - you'de be much better using some sort of server side scripting.
Is there anyway I can receive the 500 response code and send it back it to the url to make it go again?
Have you considered using another library? Fetching files from an external server seems to me more like a job for curl or ftp than file_get_content &etc. If the error is external, and you're using curl, you can detect the 500 return code and handle it appropriately without crashing. If not, then maybe you should split your program into two files - one of which fetches a single file/image, and the other that uses curl to repeatedly call the first one. Unless the 500 error means that all php execution crashes, you would be able to detect the failure and handle it.
Something like this pseudocode:
file1.php:
foreach(list_of_files as filename){
do {
x = call_curl('file2.php', filename);
}
while(x == 500);
}
file2.php:
filename=$_GET['filename'];
results = use_curl_to_get_page(filename);
echo results;
Thanks for all your input. I had seperated everything by the time I wrote this question, so the crawler, fired the image grabber, etc.
I took on board the solution to split the number of images, and that also helped.
I also added a try, catch round the file read.
This was only being called from the browser during testing, but now that it is all up and running it is going to be a cron job.
Thanks Swish and Benubird for your particularly detailed and educational answers. Unfortunately I had no cooperation with the developers on the backend where the images are coming from (long and complicated story).
Anyway, all good now so thanks. (Swish how do you call a script from the command line, my knowledge of this field is severely lacking?)
Related
I have a PHP script that has to reload a page on the client (server push) when something specific happens on the server. So I have to listen for changes. My idea is to have a text file that contains the number of page loads for the current page. So I would like to monitor the file and as soon as it is modified, to use server push in order to update the content on the client. The question is how to track the file for changes in PHP?
You could do something like:
<?php
while(true){
$file = stat('/file');
if($file['mtime'] == time()){
//... Do Something Here ..//
}
sleep(1);
}
This will continuously look for a change in the modified time of a file every second. If you don't constrain it you could kill your disk IO and may need to adjust your ulimit.
This will check your file for a change:
<?php
$current_contents = "";
function checkForChange($filepath) {
global $current_contents;
$new_contents = file_get_contents($filepath);
if (strcmp($new_contents, $current_contents) {
$current_contents = $new_contents;
return true;
}
return false;
}
But that will not solve your problem. The php file that serves the client finishes executing before the rendered html is sent to the client. That client will need to call back to some php file to check for a change... and since that is also a http request, the file will finish executing and forget anything in memory.
In order to properly solve this, you'll probably have to back off the idea of checking a file. Either the server needs to know when and how to contact currently connected clients, or those clients need to poll a lightweight service at a regular interval.
This is sort of hacky but what about creating a cron job that sucks in the page, stores it in a scope or table, and then simply compares it every 30 seconds?
I have noticed a few websites such as hypem.com show a "You didnt get served" error message when the site is busy rather than just letting people wait, time out or refresh; aggravating what is probably a server load issue.
We are too loaded to process your request. Please click "back" in your
browser and try what you were doing again.
How is this achieved before the server becomes overloaded? It sounds like a really neat way to manage user expectation if a site happens to get overloaded whilst also giving the site time to recover.
Another options is this:
$load = sys_getloadavg();
if ($load[0] > 80) {
header('HTTP/1.1 503 Too busy, try again later');
die('Server too busy. Please try again later.');
}
I got it from php's site http://php.net/sys_getloadavg, altough I'm not sure what the values represent that the sys_getloadavg returns
You could simply create a 500.html file and have your webserver use that whenever a 50x error is thrown.
I.e. in your apache config:
ErrorDocument 500 /errors/500.html
Or use a php shutdown function to check if the request timeout (which defaults to 30s) has been reached and if so - redirect/render something static (so that rendering the error itself cannot cause problems).
Note that most sites where you'll see a "This site is taking too long to respond" message are effectively generating that message with javascript.
This may be to do with the database connection timing out, but that assumes that your server has a bigger DB load than CPU load when times get tough. If this is the case, you can make your DB connector show the message if no connection happens for 1 second.
You could also use a quick query to the logs table to find out how many hits/second there are and automatically not respond to any more after a certain point in order to preserve QOS for the others. In this case, you would have to set that level manually, based on server logs. An alternative method can be seen here in the Drupal throttle module.
Another alternative would be to use the Apache status page to get information on how many child processes are free and to throttle id there are none left as per #giltotherescue's answer to this question.
You can restrict the maximum connection in apache configuration too...
Refer
http://httpd.apache.org/docs/2.2/mod/mpm_common.html#maxclients
http://www.howtoforge.com/configuring_apache_for_maximum_performance
This is not a strictly PHP solution, but you could do like Twitter, i.e.:
serve a mostly static HTML and Javascript app from a CDN or another server of yours
the calls to the actual heavy work server-side (PHP in your case) functions/APIs are actually done in AJAX from one of your static JS files
so you can set a timeout on your AJAX calls and return a "Seems like loading tweets may take longer than expected"-like notice.
You can use the php tick function to detect when a server isn't loading for a specified amount of time, then display an error messages. Basic usage:
<?php
$connection = false;
function checkConnection( $connectionWaitingTime = 3 )
{
// check connection & time
global $time,$connection;
if( ($t = (time() - $time)) >= $connectionWaitingTime && !$connection){
echo ("<p> Server not responding for <strong>$t</strong> seconds !! </p>");
die("Connection aborted");
}
}
register_tick_function("checkConnection");
$time = time();
declare (ticks=1)
{
require 'yourapp.php' // load your main app logic
$connection = true ;
}
The while(true) is just to simulate a loaded server.
To implement the script in your site, you need to remove the while statement and add your page logic E.G dispatch event or front controller action etc.
And the $connectionWaitingTime in the checkCOnnection function is set to timeout after 3 seconds, but you can change that to whatever you want
Given a simple code like :
$file = 'hugefile.jpg';
$bckp_file = 'hugeimage-backup.jpg';
// here comes some manipulation on $bckp_file.
The assumed problem is that if the file is big or huge - let´s say a jpg - One would think that it will take the server some time to copy it (by time I mean even a few milliseconds) - but one would also assume that the execution of the next line would be much faster ..
So in theory - I could end up with "no such file or directory" error when trying to manipulate file that has not yet created - or worse - start to manipulate a TRUNCATED file.
My question is how can I assure that $bckp_file was created (or in this case -copied) successfully before the NEXT line which manipulates it .
What are my options to "pause" , "delay" the next line execution until the file creation / copy was completed ?
right now I can only think of something like
if (!copy($file, $bckp_file)) {
echo "failed to copy $file...\n";
}
which will only alert me but will not resolve anything (same like having the php error)
or
if (copy($file, $bckp_file)) {
// move the manipulation to here ..
}
But this is also not so valid - because let´s say the copy was not executed - I will just go out of the loop without achieving my goal and without errors.
Is that even a problem or am I over-thinking it ?
Or is PHP has a bulit-in mechanism to ensure that ?
Any recommended practices ?
any thoughts on the issue ? ??
What are my options to "pause" , "delay" the next line execution until the file is creation / copy was completes
copy() is a synchronous function meaning that code will not continue after the call to copy() until copy() either completely finishes or fails.
In other words, there's no magic involved.
if (copy(...)) { echo 'success!'; } else { echo 'failure!'; }
Along with synchronous IO, there is also asynchronous IO. It's a bit complicated to explain in technical detail, but the general idea of it is that it runs in the background and your code hooks into it. Then, whenever a significant event happens, the background execution alerts your code. For example, if you were going to async copy a file, you would register a listener to the copying that would be notified when progress was made. That way, your code could do other things in the mean time, but you could also know what progress was being made on the file.
PHP handles file uploads by saving the whole file in a temporary directory on the server before executing any of script (so you can use $_FILES from the beginning), and it's safe to assume all functions are synchronous -- that is, PHP will wait for each line to execute before moving to the next line.
I have a simple file upload service, written out in PHP, which also includes a script that controls download speeds by sending limited-sized packets when a user requests a download from this site.
I want to implement a system to limit parallel/simultaneous downloads to 1 per user if they are not premium members. In the download script above, I can use a MySQL database to store a record that has: (1) the user ID; (2) the file ID; (3) when the download was initiated; and (4) when the last packet was sent, which is updated each time this is done (if DL speed is limited to 150 kB/sec, then after every 150 kB, this record is updated, etc.).
However, thus far, the database record will only be deleted once the download has successfully completed — at the end of the script, after the download has been fully served, the record is deleted from the table:
insert DB record;
while (download is being served) {
serve packet of data;
update DB record with current date/time;
}
// Download is now complete
delete DB record;
How will I be able to detect when a download has been cancelled? Would I just have to have a Cron job (or something similar) detect if an existing download record is more than X minutes/hours old? Or is there something else I can do that I'm missing?
I hope I've explained this well enough. I don't think posting specific code is required; I'm interested more in the logistics of how/whether this can be done. If specific is needed, I will gladly provide it.
NOTE: I know how to detect if a file was successfully downloaded; I need to know how to detect if it was cancelled, aborted, or otherwise stopped (and not just paused). This will be useful in stopping parallel downloads, as well as preventing a situation where the user cancels Download #1 and tries to initiate Download #2, only to find that the site claims he is still downloading file #1.
EDIT: You can find my download script here: http://codetidy.com/1319/ — it already supports multi-part downloads and download resuming.
<?php
class DownloadObserver
{
protected $file;
public function __construct($file) {
$this->file = $file;
}
public function send() {
// -> note in DB you've started
readfile($this->file);
}
public function __destruct() {
// download is done, either completed or aborted
$aborted = connection_aborted();
// -> note in DB
}
}
$dl = new DownloadObserver("/tmp/whatever");
$dl->send();
should work just fine. No need for a shutdown_function or any funky self-built connection observation.
You will want to check out the following functions: connection_status(), connection_aborted() and ignore_user_abort() (see the connection handling section of the PHP manual for more info).
Although I can't guarantee the reliability (it's been a while since I've played around with it), with the right combination you should be able to accomplish what you want. There are a few caveats when working with these though, the big one being that if something goes wrong you could end up with stranded PHP scripts running on the server requiring you to kill Apache to stop them.
The following should give you a good idea of how to do it (adapted from the PHP code examples and a couple of the comments):
<?php
//Set PHP not to cancel execution if the connection is aborted
//and drop the time limit to allow for big file downloads
ignore_user_abort(true);
set_time_limit(0);
while(true){
//See the ignore_user_abort() docs re having to send data
echo chr(0);
//Make sure the data gets flushed properly or the connection check won't work
flush();
ob_flush();
//Check then connection status and exit loop if aborted
if(connection_status() != CONNECTION_NORMAL || connection_aborted()) break;
//Just to provide some spacing in this example
sleep(1);
}
file_put_contents("abort.txt", "aborted\n", FILE_APPEND);
//Never hurts to ensure that the script halts execution
die();
Obviously for how you would be using it the data being sent would simply be the download data chunk (just make sure you flush the buffer properly to ensure the data is actually sent). As far as I'm aware, there is no way of making a distinction between pausing and aborting/stopping. Pause/resume functionality (and multi-part downloading - i.e. how download managers accelerate downloads) relies on the "Range" header, basically requesting byte x to byte y of the file. So if you want to allow resumable downloads you'll have to deal with that too.
There is no HTTP "cancel" signal that is sent by default. So, it looks like you will need to decide on a timeout, the length of time a connection can sit without sending/receiving another packet. If you are sending rather small packets (as I presume you are) keep the timeout short for best effect.
In your while condition you will need to check the age of the last timestamp update, if its too old, stop sending the file.
is there a way for me to check to see if a file is copied before continuing to execute a php loop?
i have a for loop, and within the loop it is going to copy a file. now, i want it so that it waits until the current file is copied before continuing the loop.
example:
for ($i = 1; $i <= 10; $i++)
{
$temp = $_FILES['tmp_name'];
$extension = '.jpg';
copy("$temp_$i_$extension", "$local_$i_$extension");
// not sure what to do here
if (FILE_DONE_COPYING())
{
CONTINUE_LOOP();
}
else
{
PAUSE_LOOP();
}
}
thats just an example. i have no clue how to do this...can anyone chime in?
That's what copy() does in PHP - it blocks until the file is copied. There's nothing you need to do, except checking the return value to see if the operation was successful.
PHP is taking it line by line, step by step, so it's waiting until copy() is completed
for ($i = 1; $i <= 10; $i++)
{
$temp = $_FILES['tmp_name'];
$extension = '.jpg';
$result = copy("$temp_$i_$extension", "$local_$i_$extension");
if($result){
//done
}
else{
//failed
}
}
copy returns true on success and false on failure. Check for that.
Unless you go through the trouble of using threading and have copy fired asynchronously, PHP will not move to the line after copy until after it has completed.
copy does wait for completion before continuing execution. It is a syncronous call. But, it can return false if it didn't work, and your copy wont work since $temp_ and $i_ are not defined variables. So maybe you are thinking the copy isn't finishing, when it actually just isn't working at all.
You should use:
copy("{$temp}_{$i}_$extension", "{$local}_{$i}_$extension");
OR
copy($temp.'_'.$i.'_'.$extension, $local.'_'.$i.'_'.$extension);
What makes you think that copy() will return before it has finished?
You could of course compare filesize of original file and copy to be sure the process is complete.
You could use a while loop with sleep calls to delay checking, and just exit the while loop once the file exists under the new name.
I know this is an ancient question but I feel I really need to talk about this problem. COPY is a great command - BUT - it does not work all of the time. I can honestly tell you this. Why? Why does it not always work? Simple - the Operating System is at fault. Here are two examples. One is using a standard disk drive and the second one deals with a Ram disk. The COPY command reads CHUNKS of a file and writes these chunks out to the destination. This means it really does NOT just do a File_get_contents but instead does the fopen(IN), fopen(OUT), while( !EOF(IN) ){ fread(IN), fwrite(OUT) } and then fclose(IN), and fclose(out). It should be noted that these commands try to make sure everything goes ok but if the disk drive buffers what it does - then the file might take a second or two to finish. This can be seen by having a file_exists() on the output file's name. It can come back as FALSE(it IS NOT there). This is because the disk drive's hardware has not caught up with the commands.
I even installed the AMD RamDisk software and ran a program using the above commands (both file_get_contents->file_put_contents and the fopen-fread/fwrite-fclose commands). The same thing happened then also. Every now and then (not always) the file_exists() function returned FALSE because the test got there before the file had finished being created. Don't ask me why - it just would do this.
So what do I suggest? Use the SLEEP() command. Maybe use three(3) seconds (so SLEEP(3);) -after- the COPY() command. I also determined that a CHMOD(, 0777); was a good idea also. With a SLEEP() command after it so it has time to apply the changes. (Which is probably closer to one second).
Now, remember - everyone's hardware is different. So some hardware might work better or faster than the one I am using. So - this is one of those - try it if you are having problems. Or don't - if you are not having problems. It is that simple. So - this is happening to me - I'm using it - it works now that the system gets three seconds to breath - but it might not do anything for you - who has an atomic powered Willy-Wonka mobile which does the impossible before breakfast.
Got it? Good. :-)