If a REST request can take 10 minutes

If a REST request can take 10 minutes - php

I'm about to implement a REST server (in ASP.NET although I think that's irrelevant here). where what I want to do is the request is made and it returns the result. However, this result is an .XLSX file that could be a million rows.
If I'm generating a million row spreadsheet, it's going to take about 10 minutes. An http request will time out. So what's the best way to handle this delay in the result.
Second, what's the best way to return a very large file as the REST result?
Update: The most common use case is the REST server is an Azure cloud service web worker (basically IIS on Azure). The client is a PHP web app running on a different server in a different location. The PHP web app needs to send up a report template (generally 25K) and the data which can be a connection string to a SQL database, or... could be a 500M XML file. So that is the request, an XML file containing the template and datasource(s).
The response if a file - PDF, DOCX, XLSX, PPTX, or HTML. That can be a BLOB inside an XML file or it can be the file itself. In the case of an error then it must return XML with the error information. The big issue is it can take 10 minutes to generate this file if everything goes right. When it's a 1 million row spreadsheet, it takes time to pull down all that data and populate the created XLSX file. Second issue, this is then a really large file.
So even if everything is perfect, there's a big delay and a large response.

I see two options:
Write file to response stream during its generation (from client side this looks like downloading large file);
Start file generation task on server side and return task id immediatly. Add API methods, that allows retreive task status, cancel it or get results (if task completed).

interesting question,
i sure hope you have a stable connection, anyway, at the client side, in this case, php, set the timeouts to very high values. in php
set_time_limit(3600*10);
curl_setopt($curlh,CURLOPT_TIMEOUT,3600*10);

Related

#only server-side# How to get the echo-html-div-result of the php code saved to png-file on this server?

Like a Log-file is written by a php-script via fwrite($fp, ---HTML---),
I need to save an HTML DIV as png-file on the server.
The client-browser only start the php-script,
but without any client-interaction the png-file should be saved on the server.
Is there a way to do this?
All posts (over thousands) I have been reading are about html2canvas,
which is (as I understand) client-side operating.
I know the html-(html-div)-rendering normally does the browser.[=client-side]
But is there a way to do it in PHP on server-side ?
Reason:
Until now the procedure is
print the div via browser on paper twice
one for the costumer,
one to scan it in again to save it on the server as picture and throw it in the paperbasket.
By more than 500 times a day ...
By security reasons it need to be a saved picture on the server.

Echo script progress and download CSV

I'm having problems sending an array to another PHP page. We send an array from one page to another to generate CSV file that has been transformed from XML. So we take a 800mb XML file and transform it down to a 20mb CSV file. There is a lot of information in it that we are removing and it runs for 30 minutes.
Anyway, we are periodically using a function to output the progress of the transformation in the browser with messages:
function outputResults($message) {
ob_start();
echo $message . "<br>";
ob_end_flush();
ob_flush();
}
$masterArray contains all the information in a associative array we have parsed from the XML.
The array ($masterArray) at the end we send from index.php to another php file called create_CSV_file.php
Originally we used include('create_CSV_file.php') within index.php , but due to the headers used in the CSV file, it was giving us the messages that
Warning: Cannot modify header information - headers already sent
. So we started looking at a solution of pushing the array as below.
echo "<a href='create_CSV_file.php?data=$masterArray'>**** Download CSV file ***</a>";
I keep getting the error message with the above echo :
Notice: Array to string conversion
What is the best method to be able to show echo statements from the server as it is running, then be able to download the result CSV at the end?

Ok, so first of all, using data in a url (GET) has some severe limitations. Older version of IE only supported 4096 byte urls. In addition, some proxies and other software impose their own limits.
I'm sure you've heard this before, but if not.... You should not be running a process that takes more than a couple of seconds (at most!) from a web server. They're not optimised for it. You definitely don't want to be passing megabytes of data to the client just so they can send it back to the server!
How about something like this...
User makes a web request (And uploads original data?) to the server
Server allocates an ID for the request (random? database?) and creates a file on disk using the ID as a name (tmp directory, or at least outside web root)
Server launches a new process (PHP?) to transform the data. As it runs, it can update the database with progress information
During this time, the user can check progress by making a sequence of AJAX requests (or just refreshing a page which shows latest status). Lots more control over appearance now
When the processing is complete, server-side process writes results to file, updates database to indicate completion.
Next time user checks status, redirect them to a PHP file that takes the ID and will read the file from disk / stream it to the user.
Benefits:
No long-running http requests
No data being passed back/forth to client in intermediate stage
Much more control over how users see progress
Depending on the tranformation you're applying / the detail stored in the database, you may be able to recover interrupted jobs (server failure)
It does have one downside which is that you need to clean up after yourself - the files you created on disk need to be deleted, however, you've got a complete audit of all files in the database and deleting anything over x days old would be trivial.

Code Logic to display .csv file on a browser

my project has me needing to read a csv file and display in a browser automatically. Before I post the code, I want to confirm I have the logic correct and not be confusing myself with more development then necessary. From my research there is 2 ways this can be done at a bare basic level.
Cross domain: A program (R) on server 1 has outputted a csv file on some set time interval. I then need to use a server side language (php) on server 1 to parse the data and put into an array. I then use a php proxy or JSONP format on server 2 for a cross domain GET to call it via AJAX and load into the client side script on server 2.
Same domain: A program (R) on the server has outputted a csv file on some set time interval. I would still need to use a php script to parse the data and put data into an array, which then I do an AJAX call to load the data into the client side script in JS.
I cannot use jquery-csv plugin and HTML5 FileReader to do so automatically in either case because that is for a client user manually uploading a file?
Also, to have a 2 way connection whereby data is push and pull I need to implement websockets or long polling/HTTP streaming.
Please confirm my logic above.
Thanks.

You need to parse CSV on the first server and send parsed data to server 1 (or download to server 1 from server 2)? If so, you just need fgetcsv on server 1 and simple curl/file_read_contents on server 2.

Dealing with large amounts of data via XML API

So, I searched some here, but couldn't find anything good, apologies if my search-fu is insufficient...
So, what I have today is that my users upload a CSV text file using a form to my PHP script, and then I import that file into a database, after validating every line in it. The text file can be put to about 70,000 lines long, and each lines contains 24 fields of values. This is obviously not a problem since dealing with that kind of data. Every line needs to be validated plus I check the DB for duplicates (according to a dynamic key generated from the data) to determine if the data should be inserted or updated.
Right, but my clients are now requesting an automatic API for this, so they don't have to manually create and upload a text file. Sure, but how would I do it?
If I were to use a REST server, memory would run out pretty quickly if one request contained XML for 70k posts to be inserted, so that's pretty much out of the question.
So, how should I do it? I have thought about three options, please help med decide or add more options to the list
One post per request. Not all clients have 70k posts, but an update to the DB could result in the API handling 70k requests in a short period, and it would probably be daily either way.
X amount of posts per request. Set a limit to the number of posts that the API deals with per request is set to, say, 100 at a time. This means 700 requests.
The API requires for the client script to upload a CSV file ready to import using the current routine. This seems "fragile" and not very modern.
Any other ideas?

If you read up on SAX processing http://en.wikipedia.org/wiki/Simple_API_for_XML and HTTP Chunk Encoding http://en.wikipedia.org/wiki/Chunked_transfer_encoding you will see that it should be feasible to parse the XML document whilst it is being sent.

I have now solved this by imposing a limit of 100 posts per request, and I am using REST through PHP to handle the data. Uploading 36,000 posts takes about two minutes with all the validation.

First of all don't use XMl for this! Use JSON, it is fastest than xml.
I Use on my project import from xls. file is very large, but script work fine, just client must create files with same structure for import

Can you get a specific xml value without loading the full file?

I recently wrote a PHP plugin to interface with my phpBB installation which will take my users' Steam IDs, convert them into the community ids that Steam uses on their website, grab the xml file for that community id, get the value of avatarFull (which contains the link to the full avatar), download it via curl, resize it, and set it as the user's new avatar.
In effect it is syncing my forum's avatars with Steam's avatars (Steam is a gaming community/platform and I run a gaming clan). My issue is that whenever I am reading the value from the xml file it takes around a second for each user as it loads the entire xml file before searching for the variable and this causes the entire script to take a very long time to complete.
Ideally I want to have my script run several times a day to check each avatarFull value from Steam and check to see if it has changed (and download the file if it has), but it currently takes just too long for me to tie up everything to wait on it.
Is there any way to have the server serve up just the xml value that I am looking for without loading the entire thing?
Here is how I am calling the value currently:
$xml = #simplexml_load_file("http://steamcommunity.com/profiles/".$steamid."?xml=1");
$avatarlink = $xml->avatarFull;
And here is an example xml file: XML file

The file isn't big. Parsing it doesn't take much time. Your second is wasted mostly for network communication.
Since there is no way around this, you must implement a cache. Schedule a script that will run on your server every hour or so, looking for changes. This script will take a lot of time - at least a second for every user; several seconds if the picture has to be downloaded.
When it has the latest picture, it will store it in some predefined location on your server. The scripts that serve your webpage will use this location instead of communicating with Steam. That way they will work instantly, and the pictures will be at most 1 hour out-of-date.
Added: Here's an idea to complement this: Have your visitors perform AJAX requests to Steam and check if the picture has changed via JavaScript. Do this only for pictures that they're actually viewing. If it has, then you can immediately replace the outdated picture in their browser. Also you can notify your server who can then download the updated picture immediately. Perhaps you won't even need to schedule anything yourself.

You have to read the whole stream to get to the data you need, but it doesn't have to be kept in memory.
If I were doing this with Java, I'd use a SAX parser instead of a DOM parser. I could handle the few values I was interested in and not keep a large DOM in memory. See if there's something equivalent for you with PHP.

SimpleXml is a DOM parser. It will load and parse the entire document into memory before you can work with it. If you do not want that, use XMLReader which will allow you to process the XML while you are reading it from a stream, e.g. you could exit processing once the avatar was fetched.
But like other people already pointed out elsewhere on this page, with a file as small as shown, this is likely rather a network latency issue than an XML issue.
Also see Best XML Parser for PHP

that file looks small enough. It shouldn't take that long to parse. It probably takes that long because of some sort of network problem and the slowness of parsing.
If the network is your issue then no amount of trickery will help you :(.
If isn't the network then you could try a regex match on the input. That will probably be marginally faster.
Try this expression:
/<avatarFull><![CDATA[(.*?)]]><\/avatarFull>/
and read the link from the first group match.
You could try the SAX way of parsing (http://php.net/manual/en/book.xml.php) but as i said since the file is small i doubt it will really make a difference.

You can take advantage of caching the results of simplexml_load_file() somewhere like memcached or filesystem. Here is typical workflow:
check if XML file was processed during last N seconds
return processing results on success
on failure get results from simplexml
process them
resize images
store results in cache

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.