How to handle queueing of video encoding during multiple video uploads? - php

I am working on developing a video streaming site where users can upload videos to the site (multiple videos at once using the uploadify jquery plugin).
Now, I am faced with the question of encoding the videos to FLV for streaming them online.
When should the video encoding process take place ? Should it take place immediately after uploads have finished (i.e redirect the user to upload success page, and then start encoding in the background using exec command for ffmpeg ?) However, using this approach, how do i determine if the encoding has finished successfully ? What if users upload a corrupt video and ffmpeg fails to encode it ? How do i handle this in PHP ?
How do i queue encoding of videos since multiple users can upload videos at the same ? Does FFMpeg has its own encoding queue ?
I also read about gearman and message queueing options such as redis and AMQP in another related SO thread. Are these one of the potential solutions ?
I would really appreciate if someone could give answers to my questions.

You should use a software called gearman. It is a job server and you can call functions using PHP. You can transfer the process to background and it automatically handles queuing. Many processes can be run parallel also.
I've used it and it is very easy to install and operate.
For your use case,
Wrap the ffmpeg process in a php file with exec.
Save the uploaded file to db and give it an id.
Call the encode function after a user uploads file.
As soon as the encode function starts,
update the db to reflect that the file was "picked".
First run the ffmpeg info on the file to see if it is fine.
Then encode the file and after it is done, update db flag to "done".
After the process is done, you can call another function to send an email, tweet etc to the user that the encoding is complete.

Since encoding may take some time you may want to add the input files in a folder and add an entry in a database. Then you have a script that runs either constantly or each x minutes that convert the pending videos from the database to FLV format.
To queue them you will need to make a custom script that re-run FFMpeg for each files.

You could use a cron job on the server, or use something like beanstalkd (or gearman as mentioned in other answers to this question).

FFMpeg is just a command line utility. It doesn't have any queue and you will need to build your own queuing system to perform the work asynchronously.
For PHP queues, I had a good experience with the beanstalkd server, using the pheanstalk PHP client. Your upload script should then insert items to the queue for encoding, and return a response to the user saying the video will be processed shortly. Your workers should fetch items from the queue and run FFMpeg on them. You can read the FFMpeg status code to figure whether the encoding was successfully completed.

Related

PHP Scrape and download files from urls with interval

I've build a scraper to get some data from another website. The scraper runs currently at the command line in a screen so the process is never stopping. Between each request I've set an interval to keep things calm. In one scrape it's possible there are coming 100 files along with which needs to be download. Also this process haves an interval after every download.
Now I want to add the functionality in the back-end to scrape on the fly. Everything works fine, I get the first data set which only has 2 requests. Within this data returned I've an array with files need to be download (can be 10 can be +100).. I would like to create something the user can see realtime how far the download process is.
The thing I am facing, when the scraper has 2 jobs to do in a browser window with up to +20 downloads including intervals to keep things clam down it will take too much time. I am thinking about to save the files needed to be download into a database table and handle this part of the data process by another shell script (screen) or cronjob.
I am wondering about if my thoughts are in the good way, overkilled or there are some better examples to handle these kind of processes.
Thanks for any advice.
p.s. I am developing in PHP
If you think that is overkill, you can run the script and waiting that task is finished before run again.
Basically you need to implement a message queue where http request handler (front controller?) emit a message to fetch a page, and one or more workers do the job, optionally emitting more messages to the queue to download files.
There are plenty of MQ brokers, but you can implement your own with database as a queue storage.

How to use ImageMagick via another server using PHP's system() function?

We do a lot of processing for images and a lot of the time this processing kills all our CPU and causes our site to crash. What we want to do is put the image processing on another server, so that we can scale that server as nessicary and not have our current server crash.
I'm wondering how to go about this though. Our current process is:
1) User's make AJAX request to our Image Processing Script.
2) We construct a string based on the user's input. This string contains the commands to perform an ImageMagick process.
3) We run the string through PHP's system() command.
4) We then send headers to the page and use PHP's imagecreatefrompng() functions on the file to output the image to the user.
So what I'd like to know, what's the best way to now transfer the ImageMagick process. I thought of remote connecting to the other server via SSH, but I'm sure there is a limit on the number of connections that can be made via SSH. We have 100s of users online at a time so we need to be able to do that many connections at a time.
Anyone with any ideas on how best to transfer our image processing to another server would be greatly welcomed.
Thanks!
SSH would not be an appropriate protocol to distribute a work request to another server. A popular trend is to leverage a messaging queue to dispatch tasks to "worker" nodes. The implementation can very greatly depending on design, needs, and resource constraints. Here's a quick bare-bone outline...
A web server receives a new image item.
Writes image to CDN, or network mount &etc.
Publishes a task to a messaging queue, like RabbitMQ
A worker node listens for new tasks.
Consumes and performs request.
Writes result output next to source on CDN
Notifies tasks complete by either updating a record in DB, or publish back to MQ.
Checkout RabbitMQ/PHP "Hello World", and "Work Queues" articles for detailed examples.

Calculate MD5 of 90,000+ files and store to a database

I am working on a script that downloads all of my images, calculates the MD5 hash, and then stores that hash in a new column in the database. I have a script that selects the images from the database and saves them locally. The image's unique id becomes the filename.
My problem is that, while cURLQueue works great for quickly downloading many files, calculating the MD5 hash of each file in a callback slows the downloading down. That was my first attempt. For my next attempt, I would like to separate the downloading and hashing parts of my code. What is the best way to do this? I would prefer to use PHP, as that is what I am most familiar with and what our servers run, but PHP's thread support is lacking to say the least.
Thoughts are to have a parent process that establishes a SQLite connection, then spawn many children that choose an image, calculate the hash of it, store it in the database, and then delete the image. Am I going down the right path?
There are a number of ways to approach this, but which you choose really depends on the particulars of your project.
A simple way would be to download the images with one PHP, then place them on the file system and add an entry to the queue database. Then a second PHP program would read the queue, and process those waiting.
For the second PHP program, you could setup a cron job to just check regularly and process all that are waiting. A second way would be to spawn the PHP program in the background every time a download finishes. The second method is more optimal, but a little more involved. Check out the post below for info on how to run a PHP script in the background.
Is there a way to use shell_exec without waiting for the command to complete?
I've covered a similar issue at work, but it will need an amqp server like rabbitmq.
Imagine to have 3 php scripts:
first: add the urls to the queue
second: get the url from the queue, download the file and adds the downloaded filename to the queue
third: get the filename to the queue and sets the md5 into the database
We use such way to handle multiple image download/processing using python scripts (php is not that far).
You can check some php libraries here and some basic examples here.
In this way we can scale each worker depending on each queue length. So if you have tons of urls to be downloaded you just start another script #2, if you have lot of unprocessed file you just start a new script #3 and so on.

Upload > Convert > Show With Flash

I want to convert and show video that user uploaded. I have dedicated server and i use php for programming. Where should i start ? Thank You
This is probably the way I would do it :
have a PHP webpage that adds a record in database to indicate "this file has to be processed" -- this page is the one that receives the uploaded file
and displays a message to the user ; something like "your file will be processed soon"
In CLI (as you have a dedicated server, you can use command line, install programs, ...), have a batch that processes the new inserted files
first, mark a record as "processing"
do the conversion things ; ffmpeg would probably be the right tool for that -- I've seen quite a few posts on SO about it, so you might find some informations about that part :-)
mark the file as "processed"
And, on some (other ?) webpage, you can show to the user in which state his file is :
if it has not been processed yet
if it's being processed
or if it's been processed -- you can then give him the link to the new video file -- or do whatever you want/need with it.
Here's a couple of other notes :
The day your application becomes bigger, you can have :
one "web server"
many "processing servers" ; in your application, it's the ffmpeg thing that will require lots of CPU, not serving web pages ; so, being able to scale that part is nice (that's another reason to "lock" files, indicating them as "processing" in DB : that way, you will not have several processing servers trying to process the same file)
You only use PHP from the web server to generate web pages, which is je job of a web server
Heavy / long processing is not the job of a web server !
The day you'll want to switch to something else than PHP for the "processing" part, it'll be easier.
Your "processing script" would have to be launch every couple of minutes ; you can use cron for that, if you are on a Linux-like machine.
Of course, you could also call ffmpeg directly from the PHP page to which the file is uploaded to... But, considering this might require quite some CPU time, it might not always be a suitable solution...
... Even if a bit easier, and will allows users to get their converted video quickier (they won't have to wait until the cron job has been executed)
(disclaimer : this answer is adapted from another one I made there)
It's pretty easy. Once uploaded, you'd want to use exec() to call a video converter - ffmpeg is a popular, free, open-source choice.
In its simplest form:
exec('ffmpeg -i /path/to/video.avi /path/to/flash/video.flv');

Using Javascript to perform a process and send updates/callbacks to a webserver

I am working on a process to allow people to upload PDF files and manage the document (page order) via a web based interface.
The pages of the PDF file need to be cropped to a particular size for printing and currently we run them through a Photoshop action that takes care of this.
What I want to do is upload the PDF files to a dedicated server for performing the desired process (photoshop action, convert, send images back to web server).
What are some good ways to perform the functions, but sending updates to the webserver to allow for process tracking/progress bars to keep the user informed on how long their files are taking to process.
Additionally what are some good techniques for queueing/tracking jobs/processes in general (with an emphasis on web based technologies)?
Derek, I'm sure you have your reasons for using Photoshop, but seriously, did Imagemagick render insufficient for you? I worked once with fax utility that converted Fax.g3 files to TIFF, then increased contrast and brightnes by 15% using Imagemagick and converted it back to PDF. IM worked as standalone Linux program invoked by system() call and I know there is new Imagemagick PECL extension.
Create a queue, and push jobs to that. Have a cronjob or daemon running that takes jobs from the queue and process them. Make sure that you use some sort of locking, so you can safely stop/start the daemon/job.
If you expect the job to finish quickly, you can use a technique known as "comet". Basically, you establish a connection from javascript (Using XmlHttpRequest) to your server-side script. In this script, you check if the job is completed. If not, you sleep for a second or two - then check again. You keep doing this until the job finishes. Then you give a response back. The result is that the request will take a while to complete, but will return immediately. You can then take appropriate action in javascript (Reload the page or whatever).

Categories