I created a video transcoder using ffmpeg. User uploads RAW videos - very big, about 20GB - via FTP.
Currently, a php script is monitoring local paths every 5 seconds with below strategy.
Look up local filesystem.
If a 'new' file appears, add it to database with modified time and its size.
After 5 seconds, check the modified time and size again,
Not changed : Set status as [DONE], and encode the video into './output' directory. ( 'output' is explicitly excluded from monitoring )
Changed : Wait another 5 seconds.
It works very well, but it burns some cpu power to find 'new file'. Is there any way to get the 'exact timing' when file uploading is being completed?
if you can, install inotify then its super easy via a bash script. otherwise a bash script may still be more efficient.
update: php supports inotify with: php.net/manual/en/book.inotify.php
Try making a perl daemon that checks for new files, i think it would be less resource intensive.
Also, another more unix like alternative, and i think better overall:
http://en.wikipedia.org/wiki/File_Alteration_Monitor
Related
I'm syncing at the moment some files at night from Server A to Server B with a Cronjob (PHP CLI) that uses LFTP and writes the log of it into a special mysql table. The images I sync are tiff files.
I'm using ImageMagick to convert this pictures into some preview-png pictures with some extras (watermarking, resizing, clipping paths and embedding a color profile)
Full commands here.
So how would be the "best" way to convert more than 100 (sometimes there are 10 - sometimes there are 250+) images at the best way?
My script should be safe - so every tiff-files always has web-previews. So I'm checking all images and if every images has its preview-pngs - if not - generate them!
I don't to overload this post with code so here is a gist
The script will be running as simple cron with php-fcgi.
This script is now very "slow". One ImageMagick-command takes about 1-2 seconds - sometimes more than 15sec (big file, complex paths). - Any shell_exec holds the script for the time it finished the command.
Is there some way to make this more efficient?
Note: I can't install extra software on the server.
If you do not need the images right away then you could put the image conversions in a queue (something like Beanstalk) and let that handle the long and intensive operations.
How to create a process on the server like ftp_get() but not waiting its results to continue the PHP script?
My issue is I'm working on a synchronization script and some files are really huge to be downloaded using PHP since it conflicts with max execution time.
Is there any way to initiate the process to download the file and leave it to proccess another?
You need threading in PHP.
See http://php.net/manual/en/class.thread.php, if you don't have experience with threading then you should look up some tutorials and examples and then some. After thinking you understand them, research them some more.
And maybe a bit more...
Creating a multi-threaded application that is stable is a hard task.
Otherwise you could always increase the max execution time, or setup the cron job to download the FTP files in advance such as 30 minutes before with other linux utilities.
I am working on a script that downloads all of my images, calculates the MD5 hash, and then stores that hash in a new column in the database. I have a script that selects the images from the database and saves them locally. The image's unique id becomes the filename.
My problem is that, while cURLQueue works great for quickly downloading many files, calculating the MD5 hash of each file in a callback slows the downloading down. That was my first attempt. For my next attempt, I would like to separate the downloading and hashing parts of my code. What is the best way to do this? I would prefer to use PHP, as that is what I am most familiar with and what our servers run, but PHP's thread support is lacking to say the least.
Thoughts are to have a parent process that establishes a SQLite connection, then spawn many children that choose an image, calculate the hash of it, store it in the database, and then delete the image. Am I going down the right path?
There are a number of ways to approach this, but which you choose really depends on the particulars of your project.
A simple way would be to download the images with one PHP, then place them on the file system and add an entry to the queue database. Then a second PHP program would read the queue, and process those waiting.
For the second PHP program, you could setup a cron job to just check regularly and process all that are waiting. A second way would be to spawn the PHP program in the background every time a download finishes. The second method is more optimal, but a little more involved. Check out the post below for info on how to run a PHP script in the background.
Is there a way to use shell_exec without waiting for the command to complete?
I've covered a similar issue at work, but it will need an amqp server like rabbitmq.
Imagine to have 3 php scripts:
first: add the urls to the queue
second: get the url from the queue, download the file and adds the downloaded filename to the queue
third: get the filename to the queue and sets the md5 into the database
We use such way to handle multiple image download/processing using python scripts (php is not that far).
You can check some php libraries here and some basic examples here.
In this way we can scale each worker depending on each queue length. So if you have tons of urls to be downloaded you just start another script #2, if you have lot of unprocessed file you just start a new script #3 and so on.
I am trying to read large files lets say illustrator file or photoshop file using cron job in my system.
Files size varies from 20 mb - 300 mb
I have been using some function but it break in middle while reading. So i wanted to have a fresh opinion.
Amount these function
file_get_contents
readfile
curl
which is most effective in terms of
consistency (should not break while reading file)
speed
resource uses
if there is more then two cron job, does it impact over all server performance.
Please share best practice code.
Thanks in advance
Use cURL. The file functions have been deprecated in favor of cURL to open remote files. It's not only faster, but also more reliable1 (you are less likely to experience timeouts).
If your script times out or runs out of memory anyways, you'll want to increase the execution time and memory limits (max_execution_time and memory_limit).
Other notes:
readfile() reads a file and prints it to the output buffer; it's no the same thing as file_get_contents().
If you compile curl with --with-curlwrappers then when you do file_get_contents() it will use cURL instead of the fopen() functions.
1 Citation needed.
you need to split the two tasks if files are so big.
first you download the file with wget and once you have your file you process it with php.
this way you are less likely to go into timeout problems.
if you don't know which file to download because it is a variable from php of some sort you can write to a file the name of the required file as first step of your job
then pass it to wget via --input-file=file as second step
and then process it as third and final step with your php program
DirectIO is a low level extension that bypasses the OS and goes straight to the hard disk, as a result it is probably the most efficient.
http://php.net/manual/en/ref.dio.php
Note that as of PHP 5.1.0 it is no longer bundled with PHP. Also, if your script is breaking in the middle of the operation, check your max_execution_time and max_memory.
I want to convert and show video that user uploaded. I have dedicated server and i use php for programming. Where should i start ? Thank You
This is probably the way I would do it :
have a PHP webpage that adds a record in database to indicate "this file has to be processed" -- this page is the one that receives the uploaded file
and displays a message to the user ; something like "your file will be processed soon"
In CLI (as you have a dedicated server, you can use command line, install programs, ...), have a batch that processes the new inserted files
first, mark a record as "processing"
do the conversion things ; ffmpeg would probably be the right tool for that -- I've seen quite a few posts on SO about it, so you might find some informations about that part :-)
mark the file as "processed"
And, on some (other ?) webpage, you can show to the user in which state his file is :
if it has not been processed yet
if it's being processed
or if it's been processed -- you can then give him the link to the new video file -- or do whatever you want/need with it.
Here's a couple of other notes :
The day your application becomes bigger, you can have :
one "web server"
many "processing servers" ; in your application, it's the ffmpeg thing that will require lots of CPU, not serving web pages ; so, being able to scale that part is nice (that's another reason to "lock" files, indicating them as "processing" in DB : that way, you will not have several processing servers trying to process the same file)
You only use PHP from the web server to generate web pages, which is je job of a web server
Heavy / long processing is not the job of a web server !
The day you'll want to switch to something else than PHP for the "processing" part, it'll be easier.
Your "processing script" would have to be launch every couple of minutes ; you can use cron for that, if you are on a Linux-like machine.
Of course, you could also call ffmpeg directly from the PHP page to which the file is uploaded to... But, considering this might require quite some CPU time, it might not always be a suitable solution...
... Even if a bit easier, and will allows users to get their converted video quickier (they won't have to wait until the cron job has been executed)
(disclaimer : this answer is adapted from another one I made there)
It's pretty easy. Once uploaded, you'd want to use exec() to call a video converter - ffmpeg is a popular, free, open-source choice.
In its simplest form:
exec('ffmpeg -i /path/to/video.avi /path/to/flash/video.flv');