How to handle large files with small process times

How to handle large files with small process times - php

On my webservice users are allowed to create forms and let theire friends or co-workers create data with them. The collected data can be downloaded as a zip file stream. Sometimes users have huge amounts of data (up to 2gb) and the server simply kills the php process for obvious reasons. Is it somehow possible to create such a file on client side without flash,java (btw java doesn't work anyway for most of my users) etc. ?

Increase your script timeout and memory usage.
use set_time_limit function Docs
And use ini_set for memory_limit parameter.
And one more solution is to give the clients file parts. i.e. give them a limit for downloading the number of records. i.e. 1-1000, 1001-2000 etc

If you have control over the web server process I suggest you explore x-send-file as a solution to this.
See this SO question
In essence it will end the php process and send the file via the http server. This way time limits aren't an issue and you don't have a php instance hanging around.

Create a worker shell that keeps running in the background in a loop and checks for new data. If it finds new, unprocessed data have it prepare the download in the background. When the data is ready for download flag it as "ready" and inform the user (by email, polling via ajax for an status update, however you like) that his data was processed and is ready for download.
You can use nice to limit the CPU power used for that shell to avoid that it consumes all the available processing power and your site becomes slow.
That's exactly how I handle audio and video processing in one of my projects and it works fine.

Related

PHP parallel File writes

I have an old application that uses files instead of a DB to store user data in an xml. All totally legacy. Until the migration to a DB has finished, I need support the current state.
The problem is that sometimes (unreproducible) it occurred, that the file contents were not written completely. This caused the xml files to not be complete and therefore to be invalid.
My assumption is, that the server can't handle the amount of parallel file writes.
Note, that parallel does not mean many scripts write one file, but each script that writes the user content writes for one user per call. There are up to 100 calls to the script in peak times.
The script uses the standard fwrite functionality to write an incoming xml string to the file. No appending, just replacing the whole content, which works in the tests. The problem only occurs when a greater amount of users use the application.
Is there a way to queue the file writes or delay the script calls?
Please comment, if the description is insufficient.

max_execution_time Alternative

So here's the lowdown:
The client i'm developing for is on HostGator, which has limited their max_execution_time to 30 seconds and it cannot be overridden (I've tried and confirmed it cannot be via their support and wiki)
What I'm have the code doing is take an uploaded file and...
loop though the xml
get all feed download links within the file
download each xml file
individually loop though each xml array of each file and insert the information of each item into the database based on where they are from (i.e. the filename)
Now is there any way I can queue this somehow or split the workload into multiple files possibly? I know the code works flawlessly and checks to see if each item exists before inserting it but I'm stuck getting around the execution_limit.
Any suggestions are appreciated, let me know if you have any questions!

The timelimit is in effect only when executing PHP scripts through a webserver, if you execute the script from CLI or as a background process, it should work fine.
Note that executing an external script is somewhat dangerous if you are not careful enough, but it's a valid option.
Check the following resources:
Process Control Extensions
And specifically:
pcntl-exec
pcntl-fork

Did you know you can trick the max_execution_time by registering a shutdown handler? Within that code you can run for another 30 seconds ;-)
Okay, now for something more useful.
You can add a small queue table in your database to keep track of where you are in case the script dies mid-way.
After getting all the download links, you add those to the table
Then you download one file and process it; when you're done, you check them off (delete from) from the queue
Upon each run you check if there's still work left in the queue
For this to work you need to request that URL a few times; perhaps use JavaScript to keep reloading until the work is done?

I am in such a situation. My approach is similar to Jack's
accept that execution time limit will simply be there
design the application to cope with sudden exit (look into register_shutdown_function)
identify all time-demanding parts of the process
continuously save progress of the process
modify your components so that they are able to start from arbitrary point, e.g. a position in a XML file or continue downloading your to-be-fetched list of XML links
For the task I made two modules, Import for the actual processing; TaskManagement for dealing with these tasks.
For invoking TaskManager I use CRON, now this depends on what webhosting offers you, if it's enough. There's also a WebCron.
Jack's JavaScript method's advantage is that it only adds requests if needed. If there are no tasks to be executed, the script runtime will be very short and perhaps overstated*, but still. The downsides are it requires user to wait the whole time, not to close the tab/browser, JS support etc.
*) Likely much less demanding than 1 click of 1 user in such moment
Then of course look into performance improvements, caching, skipping what's not needed/hasn't changed etc.

many images on one page some not loading

I am making a Warehouse management system.
The orders come in a CSV in the morning that my script then executes.
It places a php-made barcode on the top of each order. the sample CSV i am using has around 100 unique orders on, so when i load the page that will then print orders off the server is getting 100+ requests and (im guessing) some of the images time out.
When i view source and open the link to the ones that don't work it loads the image, leading me to think i need to somehow disable the timout method on the browser.
My only other idea is to load the barcodes through javascript.
Any suggestions?

I think what enygma may be getting at is the limited processing time php scripts have. Sometimes they get cut off after 30 seconds. Generating all of those images at one time might run over, causing your script to be killed on the server and stop sending data. Your idea of loading them in javascript is probably your best bet, as long as you only do a few at a time or do them serially.

If you start a session in php, the session is locked and cannot be accessed by another php script until released.
Based on you generating images with php - that's quite likely the cause of what you see.
There are other questions which go into a bit more detail of how php and sessions work; but most likely that's the direct cause for some of your images not being received - the requests are in a single, serial queue being processed in turn because each script reads the session and doesn't release it until it's finished. The requests at the end of the queue hit a time limit one way or another and return nothing.
Therefore, ensure that you call:
session_write_close();
as soon as you can in all scripts that need access to the session to prevent them from blocking all other php requests, or better still don't use the session at all (e.g. if you're using the session for authorization just include a hash in the url and compare to that for image requests).

PHP display progress messages on the fly

I am working in a tool in PHP that processes a lot of data and takes a while to finish. I would like to keep the user updated with what is going on and the current task processed.
What is in your opinion the best way to do it? I've got some ideas but can't decide for the most effective one:
The old way: execute a small part of the script and display a page to the user with a Meta Redirect or a JavaScript timer to send a request to continue the script (like /script.php?step=2).
Sending AJAX requests constantly to read a server file that PHP keeps updating through fwrite().
Same as above but PHP updates a field in the database instead of saving a file.
Does any of those sound good? Any ideas?
Thanks!

Rather than writing to a static file you fetch with AJAX or to an extra database field, why not have another PHP script that simply returns a completion percentage for the specified task. Your page can then update the progress via a very lightweight AJAX request to said PHP script.
As for implementing this "progress" script, I could offer more advice if I had more insight as to what you mean by "processes a lot of data". If you are writing to a file, your "progress" script could simply check the file size and return the percentage complete. For more complex tasks, you might assign benchmarks to particular processes and return an estimated percentage complete based on which process has completed last or is currently running.
UPDATE
This is one suggested method to "check the progress" of an active script which is simply waiting for a response from a request. I have a data mining application that I use a similar method for.
In your script that makes the request you're waiting for (the script you want to check the progress of), you can store (either in a file or a database, I use a database as I have hundreds of processes running at any time which all need to track their progress, and I have another script that allows me to monitor progress of these processes) a progress variable for the process. When the process begins, set this to 1. You can easily select an arbitrary number of 'checkpoints' the script will pass and calculate the percentage given the current checkpoint. For a large request, however, you might be more interested in knowing the approximate percent the request has completed. One possible solution would be to know the size of the returned content and set your status variable according to the percentage received at any moment. I.e. if you receive the request data in a loop, each iteration you could update the status. Or if you are downloading to a flat file you could poll the size of the file. This could be done less accurately with time (rather than file size) if you know the approximate time the request should take to complete and simply compare against the script's current execution time. Obviously neither of these are perfect solutions, but I hope they'll give you some insight into your options.

I suggest using the AJAX method, but not using a file or a database. You could probably use session values or something like that, that way you don't have to create a connection or open a file to do anything.

In the past, I've just written messages out to the page and used flush() to flush the output buffer. Very simple, but it may not work correctly on every web server or with every web browser (as they may do their own internal buffering).
Personally, I like your second option the best. Should be reliable and fairly simple to implement.

I like option 2 - using AJAX to read a status file that PHP writes to periodically. This opens up a lot of different presentation options. If you write a JSON object to the file, you can easily parse it and display things like a progress bar, status messages, etc...

A 'dirty' but quick-and-easy approach is to just echo out the status as the script runs along. So long as you don't have output buffering on, the browser will render the HTML as it receives it from the server (I know WordPress uses this technique for it's auto-upgrade).
But yes, a 'better' approach would be AJAX, though I wouldn't say there's anything wrong with 'breaking it up' use redirects.
Why not incorporate 1 & 2, where AJAX sends a request to script.php?step=1, checks response, writes to the browser, then goes back for more at script.php?step=2 and so on?

if you can do away with IE then use server sent events. its the ideal solution.

Possible methods to send the output of a PHP-invoked .exe program (that runs as a separate process, not in PHP) back to the iPhone client

My iPhone client app uploads a data to the server, which runs on PHP. There is a code to invoke a .exe program on the server side on PHP. The .exe program will take the uploaded data and run on a process on its own. That means the PHP execution will end without waiting for the .exe program to finish. After the .exe program finished processing the uploaded data and have an output, I want this output to be sent back to the iPhone.
Normally, if we call the .exe program to be run inside the php without making it a seperate process, we have to wait for the program to finish and we can send the output back to the iPhone client.
By running the .exe program as a seperate process, it is impossible to send the data back via PHP that invokes the .exe program. The question is, if we have the .exe program running on a seperate process rather than on the PHP script, what are the possible methods to send the output back to the iPhone client?

That's a need problem you've outlined. Let me explain a couple of ideas.
First of all, if you terminate the initial upload request, the only resonable way to check for it being done is to poll every few seconds from the iPhone. Send a request to "get-update.php" every 5 seconds to see if you have data.
By using $_SESSION, you should be able to store a token that will identify the data when it has finished processing.
Regarding the actually process, you may be able to accomplish that in a number of ways. One is to do a fairly standard double-fork, detaching the child process from the parent, so it will continue after the parent exits.
Another (recommended) would be to author a backend server process that would watch your database for requests, fetch them, process them, and update the database. So when the inital upload script actually uploads the data, have PHP put it in the database, store the record ID in $_SESSION, and return to the user.
The back end process will notice that there is a record to process, read the data, call the executable, and update the database with the result.
The get-update.php script will read $_SESSION for the record id, and check the database if the data has been processed (or what the status is).
If you do not have the ability to run a background process, and you are constrained to using PHP, you could do the double-fork magic, and fork of another PHP process to do the database read / exe / database update.
Feel free to comment with questions.
You need (a) a good way to pass the data to the program, and (b) a good way to get the data back.

I would say this is a perfect case for an AJAX snippet frequently polling data from, say, a text file the .exe writes its status in.
The upload script you call could return a unique identifier of some sort to the uploading client. Using that identifier, the client would poll the exe's status (e.g. "does the output file xyz already exist?") until it gets positive feedback.

You're going to have a hard time reconnecting with the iPhone once you've severed the connection. It may be out of coverage, it may have changed IP address, ......
Your best bet is to have the iPhone reconnect back to the server and poll for it's information.

You could do this by using Apple's Push Notification service, but that's probably overkill, unless you think the data processing is going to take a long time, and/or you want to update the app icon when the processing is done, even if the app isn't running.
Do you expect the user to just be patiently waiting for the result, or are they going to fire off the data, and check back later? If it's only going to take a couple of seconds, you could just have the iPhone app poll for the result after waiting a little while (while displaying a progress indicator).

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How to handle large files with small process times - php

Increase your script timeout and memory usage. use set_time_limit function Docs And use ini_set for memory_limit parameter. And one more solution is to give the clients file parts. i.e. give them a limit for downloading the number of records. i.e. 1-1000, 1001-2000 etc

If you have control over the web server process I suggest you explore x-send-file as a solution to this. See this SO question In essence it will end the php process and send the file via the http server. This way time limits aren't an issue and you don't have a php instance hanging around.

Related

PHP parallel File writes

max_execution_time Alternative

many images on one page some not loading

PHP display progress messages on the fly

Possible methods to send the output of a PHP-invoked .exe program (that runs as a separate process, not in PHP) back to the iPhone client

Categories

Resources