Issue summary: I've managed to speed up the thumbing of images upon upload dramatically from what it was, at the cost of using concurrency. Now I need to secure that concurrency against a race condition. I was going to have the dependent script poll normal files for the status of the independent one, but then decided named pipes would be better. Pipes to avoid polling and named because I can't get a PID from the script that opens them (that's the one I need to use the pipes to talk with).
So when an image is uploaded, the client sends a POST via AJAX to a script which 1) saves the image 2) spawns a parallel script (the independent) to thumb the image and 3) returns JSON about the image to the client. The client then immediately requests the thumbed version, which we hopefully had enough time to prepare while the response was being sent. But if it's not ready, Apache mod_rewrites the path to point at a second script (the dependent), which waits for the thumbing to complete and then returns the image data.
I expected this to be fairly straightforward, but, while testing the independent script alone via terminal, I get this:
$ php -f thumb.php -- img=3g1pad.jpg
successSegmentation fault
The source is here: http://codepad.org/JP9wkuba I suspect that I get a segfault because that fifo I made is still open and now orphaned. But I need it there for the dependent script to see, right? And isn't it supposed to be non-blocking? I suppose it is because the rest of the script can run.... but it can't finish? This would be a job for a normal file as I had thought at the start, except if both are open I don't want to be polling. I want to poll once at most and be done with it. Do I just need to poll and ignore the ugliness?
You need to delete created FIFO files then finish all scripts.
Related
this time I come with a question that I hope you can guide me to solve.
I have created a PHP script that allows loading a CSV file with a large amount of data (to load it I use the AJAX request). This script extracts the data from the file, then checks that this data is not already stored in the database, makes use of another script to obtain information of each data that is extracted from the file and finally saves the data that has passed successfully. all that validation process in a BD table.
It is a process that can last a few seconds or many minutes, because there are files that I can upload that contain more than 100 thousand data, so I would not like to leave the browser open all the time the process lasts.
What I want to know is how I could leave this process running internally on the server when I close the browser. Something like putting it in queue and let it continue running when I close my browser.
Once I reopen the browser and open the page of the script that shows me how the process is currently going. The idea is that the data processing is not interrupted when I close my browser.
Any suggestions or examples you could give me to achieve this?
Based on your description, I think you'd better run a dedicated daemon (either a 3rd party one or one written by yourself) yourself which does the background stuff.
The rationale behind why I don't think it right to do that in your PHP code is:
If you fork it from your server code, you have to install something else and since it is a folk, that process you are gonna spawn will inherit some data not useful at all from the parent process
With a dedicated daemon, it's easier for you to track the status of each job and more importantly, not a bunch of processes will be spawned if you just fork a new process for each job in the server code.
Well, I have a web application with multiple tools.
One such tool sends a simple Ajax request at the same PHP script, which in turn sends an HTTP request via Curl, but the problem is that this request takes a long time.
As this process takes a long time, I can not perform other tasks within the application, so I expect to complete the process in order to use other tools.
How I can assign or enable PHP to use multiple children or processes?
In this particular case, I don't need and don't want to use Thread Class, or "exec" for execute via command line.
Explanation of the problem:
I have script for upload file, but when I upload file(with big size), this script take a long time, so while loading the file, I would like to see my history of uploaded files.
To this should open another tab in the browser with the URL history uploads.
The problem is that when I open the record, this is left "waiting" until the other tab finishes loading (when finished upload the file).
(I thing)The problem is that PHP handles all in the same process/thread and this prevents you use multiple script at once(with multiples tab on browser).
So my problem is that I need to run multiple processes at the same time, without waiting for any of the process finishes running.
I am currently working with Linux Centos 7 servers, with Apache + PHP 5.4 and 4GB of RAM allocated to PHP.
Thanks
I want to have my own variable that would be (most likely an array) storing what my php application is up to right now.
The application can trigger few processes that are in background (like downloading files) and I want to have a list what is being currently processed.
For example
if php calls exec() that will be downloading for 15mins
and then another download starts
and another download starts
then if I access my application I want to be able to see that 3 downloads are in process. If none of them finished yet.
Can do that? Only in memory, not storing anything on the disk?
I thought that the solution would be a some kind of server variable.
PHP doesn't have knowledge of previous processes. As soon has a php process is finished everything it knows about itself goes with it.
I can think of two options. Write knowledge about spawned processes to a file or database and use it to sync all your php request, (store the PID of each spawned process)
Or
Create an Daemon. The people behind PHP have worked hard to clean up PHP memory handling and such to make this more feasible. Take a look at their PEAR package - http://pear.php.net/package/System_Daemon
Off the top of my head, a quick architecture would compose of 3 peices
Part A) The web app that will take in request for downloads, and report back the progress of all request
Part B) You daemon, which accepts requests for downloads, spawns process, and will report back status of all spawned reqeust
Part C) The spawn request that will perform the download you need.
Anyone for shared memory?
Obviously you would have to have some sort of daemon, but you could use the inbuilt semaphore functions to easily have contact between each of the scripts. You need to be careful though because sometimes if you're not closing the memory block properly, you could risk ending up with no blocks left.
You can't store your own variables in $_SERVER. The best method would be to store your data in a database where and query/update it as required.
I created a script that gets data from some web services and our database, formats a report, then zips it and makes it available for download. When I first started I made it a command line script to see the output as it came out and to get around the script timeout limit you get when viewing in a browser. But because I don't want my user to have to use it from the command line or have to run php on their computer, I want to make this run from our webserver instead.
Because this script could take minutes to run, I need a way to let it process in the background and then start the download once the file has been created successfully. What's the best way to let this script run without triggering the timeout? I've attempted this before (using the backticks to run the script separately and such) but gave up, so I'm asking here. Ideally, the user would click the submit button on the form to start the request, then be returned to the page instead of making them stare at a blank browser window. When the zip file they exists (meaning the process has finished), it should notify them (via AJAX? reloaded page? I don't know yet).
This is on windows server 2007.
You should run it in a different process. Make a daemon that runs continuously, hits a database and looks for a flag, like "ShouldProcessData". Then when you hit that website switch the flag to true. Your daemon process will see the flag on it's next iteration and begin the processing. Stick the results in to the database. Use the database as the communication mechanism between the website and the long running process.
In PHP you have to tell what time-out you want for your process
See PHP manual set_time_limit()
You may have another problem: the time-out of the browser itself (could be around 1~2 minutes). While that time-out should be changeable within the browser (for each browser), you can usually prevent the time-out user side to be triggered by sending some data to the browser every 20 seconds for instance (like the header for download, you can then send other headers, like encoding etc...).
Gearman is very handy for it (create a background task, let javascript poll for progress). It does of course require having gearman installed & workers created. See: http://www.php.net/gearman
Why don't you make an ajax call from the page where you want to offer the download and then just wait for the ajax call to return and also set_time_limit(0) on the other page.
I am working on a process to allow people to upload PDF files and manage the document (page order) via a web based interface.
The pages of the PDF file need to be cropped to a particular size for printing and currently we run them through a Photoshop action that takes care of this.
What I want to do is upload the PDF files to a dedicated server for performing the desired process (photoshop action, convert, send images back to web server).
What are some good ways to perform the functions, but sending updates to the webserver to allow for process tracking/progress bars to keep the user informed on how long their files are taking to process.
Additionally what are some good techniques for queueing/tracking jobs/processes in general (with an emphasis on web based technologies)?
Derek, I'm sure you have your reasons for using Photoshop, but seriously, did Imagemagick render insufficient for you? I worked once with fax utility that converted Fax.g3 files to TIFF, then increased contrast and brightnes by 15% using Imagemagick and converted it back to PDF. IM worked as standalone Linux program invoked by system() call and I know there is new Imagemagick PECL extension.
Create a queue, and push jobs to that. Have a cronjob or daemon running that takes jobs from the queue and process them. Make sure that you use some sort of locking, so you can safely stop/start the daemon/job.
If you expect the job to finish quickly, you can use a technique known as "comet". Basically, you establish a connection from javascript (Using XmlHttpRequest) to your server-side script. In this script, you check if the job is completed. If not, you sleep for a second or two - then check again. You keep doing this until the job finishes. Then you give a response back. The result is that the request will take a while to complete, but will return immediately. You can then take appropriate action in javascript (Reload the page or whatever).