Real world problem: I'm generating a page dinamically. This page is an xml which is retrieved by the user (curl, file_get_contents or whatever can by made server side scripting).
Once the user make the request, he start waiting and I start retrieving a large set of data from the db and building an xml with them (using the php dom objects). Once I've done I fire the "print $document->saveXML()". It takes about 8 minutes to create this 40 megabytes document. Then as it is ready I serve the page/document. Now I have a user who has a 60 seconds connection timeout: he said I need to send the first octet each 60 seconds. How can I achieve such a thing?
Since it's useless to post a 23987452 lines code cause nobody is gonna read them, I'll explain the script which serves this page as real-very-pseudo-pseudo-code:
grab all the data from the db: an enormous set of rows
create a domdocument element
loop through each row and add a node element to the domdocument to contain a piece of data
call the $dom->saveXML() to get the document as a string
print the string so the user retrieve an xml document
1) I can't send real data since it is an xml document and it has to begin with "<?xml..." to not mess up the parser.`
2) The user can't deal with firewall/serverconfig
3) I can't deal with "buy a more powerful server"
4) I tried using an ob_start() at the top of the script and then at the beginning of each loop a "header("Transfer-Encoding: chunked"); ob_flush(); "
but nothing: nothing comes before the 8 minutes.
Help me guys!!
I would
Generate a random value
Start the XML generating script as a background process (see e.g. here)
Make the generating script write the XML into a file with the random value as the name when the script is done
Frequently poll for the existence of that empty file, e.g. using Ajax requests every 10 seconds, until it's there. Then fetch the XML from the file.
You send padding and still have it be valid XML. Trivial examples include whitespace in a lot of places, or comments. Once you've sent the XML declaration, you could start a comment, and keep sending padding:
<?xml version="1.0">
<!-- this comment to prevent timeouts:
30
60
90
⋮
or whatever, the exact data doesn't matter of course.
That's the easy solution. The better solution is to make that generation run in the background, and e.g., use AJAX to poll the server every 10s to check if its done. Or to implement an alternate notification method (e.g., email a URL when the the document is ready).
If this isn't a browser accessing, you may want a trivially simple API: Have one request to start generating the document, and another to fetch it. The one to fetch it may return "not ready yet" as e.g., a HTTP status code 500, 503, or 504. Then the script requesting should retry later. (For example, with curl, the --retry option will do this).
Related
I have a php script that checks if email address is still valid with SMTP HELO, we mark that address as valid in separate CSV and then sends newest offer we have (if user of course requested that). Addresses are taken from txt file where they are stashed line after line.
So flow of the script is like that: open txt file, grab all lines and place it in array, iterate through each record in array & send SMTP HELO, mark as valid/invalid in separate CSV, send email to valid.
We have often 2.000+ records in each source txt file. Unfortunately, I have never passed 400th record as my CloudFlare or nginx gives my timeout.
I have tried following setup inside my php script:
set_time_limit(0); // ignore php timeout
ignore_user_abort(true); // keep on going even if user pulls the plug*
while(ob_get_level())ob_end_clean(); // remove output buffers
ob_implicit_flush(true); // output stuff directly
and some various "hacky" async approaches, but result is always the same.
I thought about "slicing" my input data to safe-size inputs and proceeding one file after another with page reload in the meantime, but I have no idea is it approach worth pursuing or I should look for something else?
I am using PHP and AJAX requests to get the output of a program that is always running and print it on a webpage at 5 second intervals. Sometimes this log file can get up to 2mb in size. It doesn't seem practical to me for the AJAX request to fetch the whole contents of this file every 5 seconds if the user has already gotten the full contents at least once. The request just needs to get whatever contents the user hasn't gotten in a previous request.
Problem is, I have no clue on where to begin to find what contents the user hasn't received. Any hints, tips, or suggestions?
Edit: The output from the program starts off with a time (HH:MM:SS AM/PM), everything after has no pattern. The log file may span over days, so there might not be just one "02:00:00 PM" in the file, for example. I didn't write the program that is being logged, so there isn't a way for me to modify the format in which it prints it's output.
I think using a head request might get you started along the right path.
check this thread:
HTTP HEAD Request in Javascript/Ajax?
if you're using jquery, it's a simple type change in the ajax call:
$.ajax({url: "some url", type: "HEAD".....});
personally, I would check the file size & date modified against the previous response, and fetch the new data if it has been updated. I'm not sure if you can fetch only parts of a file via ajax, but I'm sure this can be accomplished via php pretty easily, possibly this thread may help:
How to read only 5 last line of the text file in PHP?
It depends how your program is made and how does it print your data, but you can use timestamps to reduce the amount of data. If you have some kind of IDs, you should probably use them insteam of timestamps.
I want to create a very simple site, but unfortunately my php skills are weak. Basically, when a user shows up, I want to have a page with text and a blinking cursor (I can probably figure the cursor part out myself, but feel free to suggest). When a user types, I want it to show the text as they type, and when they hit enter (or click something/whatever), the text just typed will be sent to a database and then the page will update with that new text, for anybody else to see. The cursor will then be blinking on the next line down. So basically it's like a really simple wiki, where anyone can add anything, but nobody can ever remove what has been typed before. No logging in or anything. Can someone suggest the best way to go about this? I assume it will require a php call to the database to display the initial page, then another php request to send data, then another php request to display the new page. I just don't know the details. Thanks so much!
Bonus question 1: How can the page be updated dynamically, so if A sends text while B is typing, B sees the text A sent on B's page immediately?
Bonus question 2: What sorts of issues might arise if this database grows extremely large (say, millions of words), and how might I address these up front? If necessary, I could show only a small chunk of the (text-only) database on any given page, then have pagination.
If you only have one page, you don't need a database. All you need to do is save a text file on the server (use fopen() and related functions) that only gets appended to. If you have multiple pages, then a simple id (INTEGER), filetext (LARGEBLOB). (Note largeblob has a limit of 2^32 bytes).
For the user's browser part, you'll need to use Javascript and AJAX to inform the server of any updates. Just get in touch with a PHP script that (1) accepts the input and (2) appends it to a file.
Bonus question 1: How can the page be updated dynamically, so if A sends text while B is typing, B sees the text A sent on B's page immediately?
Also use the AJAX call to fetch new content (e.g. if you assign line numbers, then the browser just tells the script the last line it read, and the script returns all new lines past that point).
I assume it will require a php call to the database to display the initial page, then another php request to send data, then another php request to display the new page.
Pretty much. But only send the last 50 lines or so of the file when the browser visits it. You don't want to crash the browser.
Bonus question 2: What sorts of issues might arise if this database grows extremely large (say, millions of words), and how might I address these up front? If necessary, I could show only a small chunk of the (text-only) database on any given page, then have pagination.
Think in terms of bytes, not words, and you'll likely run into performance issues. You could cap file sizes or split up the storage into multiple files at a certain size so you don't have to scan pass content that will rarely be fetched.
I've got a script in php that continually grows an array as it's results are updated. It executes for a very long time on purpose as it needs to filter a few million strings.
As it loops through results it prints out strings and fills up the page until the scroll bar is super tiny. Instead of printing out the strings, I want to just show the number of successful results dynamically as the php script continues. I did echo(count($array)); and found the number at 1,232,907... 1,233,192 ... 1,234,874 and so forth printed out on many lines.
So, how do I display this increasing php variable as a single growing number on my webpage with Javascript?
Have your PHP script store that number somewhere, then use AJAX to retrieve it every so often.
You need to find a way to interface with the process, to get the current state out of it. Your script needs to export the status periodically, e.g. by writing it to a database.
The easiest way is to write the status to a text file every so often and poll this text file periodically using AJAX.
You can use the Forever Frame technique. Basically, you have a main page containing an iframe. The iframe loads gradually, intermittently adding an additional script tag. Each script tag modifies the content of the parent page.
There is a complete guide available.
That said, there are many good reasons to consider doing more pre-computation (e.g. in a cron job) to avoid doing the actual work during the request.
This isn't what you're looking for (I'm as interested in an answer to this..), but a solution that I've found works is to keep track of the count server-side, and only print every 1000/5000/whatever number works best, rather than one-by-one.
I'd suggest that you have a PHP script that returns the value in JSON format. Then in another php page you can do an AJAX call to the page and fetch the JSON value and display it. Your AJAX call can be programmed to run perhaps every 5 seconds or so depending on how fast your numbers output. Iframe though easier, is a bit outdated.
Some operations take too much time, which lead the ajax request to time out.
How do I finish responding to the request first, then continue that operation?
The ignore_user_abort directive, and ignore_user_abort function are probably what you are looking for : it should allow you to send the response to the browser, and, after that, still run some calculations on your server.
This article about it might interest you : How to Use ignore_user_abort() to Do Processing Out of Band ; quoting :
EDIT 2010-03-22 : removed the link (was pointing to http:// ! waynepan.com/2007/10/11/ ! how-to-use-ignore_user_abort-to-do-process-out-of-band/ -- remove the spaces and ! if you want to try ), after seeing the comment of #Joel.
Basically, when you use
ignore_user_abort(true) in your php
script, the script will continue
running even if the user pressed the
esc or stop on his browser. How do you
use this? One use would be to
return content to the user and allow
the connection to be closed while
processing things that don’t require
user interaction.
The following example sends out
$response to the user, closing the
connection (making the browser’s
spinner/loading bar stop), and then
executes
do_function_that_takes_five_mins();
And the given example :
ignore_user_abort(true);
header("Connection: close");
header("Content-Length: " . mb_strlen($response));
echo $response;
flush();
do_function_that_takes_five_mins();
(There's more I didn't copy-paste)
Note that your PHP script still has to fit in the max_execution_time and memory_limit constraints -- which means you shouldn't use this for manipulations that take too much time.
This will also use one Apache process -- which means you should not have dozens of pages that do that at the same time.
Still, nice trick to enhance use experience, I suppose ;-)
Spawn a background process and return background process id so user can check on it later via some secret URL.
Sort of depends on what you're trying to accomplish.
In my case, I needed to do a bunch of server-side processing, with only a minimal amount of data being sent back to the browser - summary info really.
I was trying to create a message sender - sends out an email to over 250 people, but possibly many more (depends on how many have registered with the system).
the PHP mail handler is quick, but for large numbers, not quick enough, so it was bound to time out. To get around that, i needed to delay the timeout on the server/PHP side, and keep the browser hanging on till all data was summarized and displayed.
My solution - a teaser.
essentially, i gave the user a message stating some initial stats (attempting to send this many emails), created 2 DIV boxes (one for current status info, the second for final summary info), displayed the page footer, started the processing, and when finished, updated the summary info. it goes as follows:
Start by collecting your data you're going to process, and get some summary info. In my case, i pulled the list of email addresses, validated them and counted them.
Then display the "attempting" info:
echo "Message contents:";
echo "<blockquote>$msgsubject<p>$msgbody</blockquote><p> </p>";
echo "Attempting <strong>" . $num_rows . "</strong> email addresses.";
Now create some DIVs for status/final info:
<div id=content name=content>
<div id=progress name=progress style='border: black 1px solid; width: ".$boxwidth."px; height: 20px;'></div>
<br>
</div>
where $boxwidth is the width you'd like your progress bar to be (explained later)
Notice that they are essentially empty - we'll fill them later.
Finally, fill out the rest of the page by displaying the footer of the page (in my case I just "included" the appropriate file).
Now, all that is still hanging out in the page buffer, either on the server (if PHP is buffering) or the browser (because it hasn't been told we're done yet), so let's cause it to get pushed and/or displayed using the "ignore" and "flush" from above:
ignore_user_abort(true);
flush();
Now that the browser is starting to display stuff, we need to give the user something to see, so here's the tease - we'll create a status bar that we'll display in the inner DIV, and a final message for the outer DIV.
So, as you loop through your data, periodically (I'll leave "how often" up to you to figure out), output the following:
set_time_limit (3);
...
(process your data here)
...
<script>document.getElementById('progress').innerHTML += "<img src='images/progress_red.gif' width: $width height=20 border=0>"</script>
flush();
This will essentially stack up the little 4x20 progress_red images next to each other, making it appear that a red bar is moving across the screen. In my case, i did this 100 times, based on a percentage of what number I was currently processing out of the total number to process. The "set_time_limit" will add 3 seconds to the existing limit, so that you know your script won't run into the PHP time limit wall. Adjust the 3 seconds for your application as needed.
even though the page is technically "complete", the javascript code will update the DIV html and our progress bar will "move".
when all done, i then report on how many items were actually processed and add that to the outer DIV html and the page is complete:
<script>document.getElementById('content').innerHTML += "Message was sent to $sentcnt people."</script>
The browser is happy because it's had all it's qualifications met (end of page syntactially), the user sees somethign happening right up to the end, and the server
I tend not to use Javascript if I can help it, but in the case, there was no fancy CSS stuff being done through the Javascript, so it's pretty straightforward and easy to see whats going on and low overhead on the browser site. And it functions pretty smoothly.
I don't claim this is perfect, but for a simple, low overhead solution, it works for me without having to create cronjobs, extra queuing mechanisms or secondary processes.
You need the flush() call to have the response sent to the browser immediately.