I am trying to read large files lets say illustrator file or photoshop file using cron job in my system.
Files size varies from 20 mb - 300 mb
I have been using some function but it break in middle while reading. So i wanted to have a fresh opinion.
Amount these function
file_get_contents
readfile
curl
which is most effective in terms of
consistency (should not break while reading file)
speed
resource uses
if there is more then two cron job, does it impact over all server performance.
Please share best practice code.
Thanks in advance
Use cURL. The file functions have been deprecated in favor of cURL to open remote files. It's not only faster, but also more reliable1 (you are less likely to experience timeouts).
If your script times out or runs out of memory anyways, you'll want to increase the execution time and memory limits (max_execution_time and memory_limit).
Other notes:
readfile() reads a file and prints it to the output buffer; it's no the same thing as file_get_contents().
If you compile curl with --with-curlwrappers then when you do file_get_contents() it will use cURL instead of the fopen() functions.
1 Citation needed.
you need to split the two tasks if files are so big.
first you download the file with wget and once you have your file you process it with php.
this way you are less likely to go into timeout problems.
if you don't know which file to download because it is a variable from php of some sort you can write to a file the name of the required file as first step of your job
then pass it to wget via --input-file=file as second step
and then process it as third and final step with your php program
DirectIO is a low level extension that bypasses the OS and goes straight to the hard disk, as a result it is probably the most efficient.
http://php.net/manual/en/ref.dio.php
Note that as of PHP 5.1.0 it is no longer bundled with PHP. Also, if your script is breaking in the middle of the operation, check your max_execution_time and max_memory.
Related
I am in the early stages of building a PHP application, part of which involves using file_get_contents() to fetch large files from a remote server and transfer them to a user. Lets say, for example, the targeted file that is being fetched is 200 mB.
Will this process time out if downloading to the server takes too long?
If so, is there a way to extend this timeout?
Can this file that is being downloaded also be transferred to a user simultaneously, or does the file have to be saved on the server then manually fetched by the user once download has completed?
I am just trying to make sure that I know my options or limitations are before I do much too more.
Thank you for your time.
Yes, you can use set_time_limit(0) and the max_execution_time directive to cancel the time limit imposed by PHP.
You can open a stream of the file, and transfer it to the user seamlessly.
Read about fopen()
If not a timeout you may well run into memory issues depending on how your PHP is configured. You can adjust a lot of these settings manually through code without much difficulty.
http://php.net/manual/en/function.ini-set.php
ini_set('memory_limit', '256M');
I'm generating about 700 PDFs with dompdf and saving them to the disk. Each one involves a database query to fetch the information for the PDF. When I run it I get about 2 minutes in and then it runs out of memory.
Why am I running out of memory? Shouldn't each PDF generation consume x memory, then release it when finished?
What's good practice in PHP for doing large operations like this to avoid strangling your server?
Some more info:
It generates many PDFs before running out of memory (some 100s)
The PDF it fails on is not abnormally large or special in any way. They're all between 4KB and ~500KB
Watching the memory usage as it processes it just slowly climbs in fits and starts until it runs out.
I can't get the wysiwyg to code-format properly, so here's a pastie for the snippets in question: http://pastie.org/3800751
Your problem probably that you running your code asynchronously. Try running it synchronously, it might take a really long time but it will work. Also, make sure to dispose of your objects at the end of each loop.
Ideas:
Increase memory limit ini_set
Use a cron job, or generally try and run each pdf generation
asynchronously with a queue.
Batch (split) processing into a number which
can be processed within one page load / Redirect after each batch to
same page using header / Keeping track of where you are on each script load with session / Beware of redirect limits on browsers
In general, large operations like this should be forked into child processes that each handle generating a single PDF. This should avoid the out of memory situation in case your PDF or DB libraries leak memory.
Try to change memory_limit at php.ini
I created a video transcoder using ffmpeg. User uploads RAW videos - very big, about 20GB - via FTP.
Currently, a php script is monitoring local paths every 5 seconds with below strategy.
Look up local filesystem.
If a 'new' file appears, add it to database with modified time and its size.
After 5 seconds, check the modified time and size again,
Not changed : Set status as [DONE], and encode the video into './output' directory. ( 'output' is explicitly excluded from monitoring )
Changed : Wait another 5 seconds.
It works very well, but it burns some cpu power to find 'new file'. Is there any way to get the 'exact timing' when file uploading is being completed?
if you can, install inotify then its super easy via a bash script. otherwise a bash script may still be more efficient.
update: php supports inotify with: php.net/manual/en/book.inotify.php
Try making a perl daemon that checks for new files, i think it would be less resource intensive.
Also, another more unix like alternative, and i think better overall:
http://en.wikipedia.org/wiki/File_Alteration_Monitor
Is it going to be a strain on my server if I am using PHP scripts to download a file while the file is very large, and it takes around 4 minutes to download it via PHP. Currently I'm executing the script in the browser, but when I switch to my linux server it will be executed via the shell. Right now I have put this in my script:
ini_set('max_execution_time', 5000);
Are there any negative factors to using ini_set I should be aware of when it takes quite a bit of time for a PHP script to execute because it is download a large .ZIP file? Should I have to worry about memory leaks?
Instead of just sending the file in one large chunk, why not split it up into lots of little chunks. The benefit of that is it stops php from timing out & lets you download larger files.
Take a look at this tutorial for more info http://teddy.fr/blog/how-serve-big-files-through-php
I have a question about how PHP handles filesystem operations. I'm running this code that depends on a file being created before it gets used, and it feels like when I run the code it becomes a race condition - sometimes it works, the file is created and php code uses it, sometimes it fails.
So I was wondering how php handles filesystem operations, does it send it off in the background or does it wait till the operation complete?
file_put_contents is equivalent to fopen, fwrite, fclose. fclose should ensure the file is fully flushed to disk.
Yes, unless you open a file handle and then set it to non-blocking mode: stream_set_blocking()
Year 2013, on my common garden variety linux vps with cpanel, with default settings, with php 5.2.17, file_put_contents always takes ~5ms for short string lengths.
Accidentally 5ms is about the full committed write time of a high quality hdd.
file_put_contents($filename,'abcdefghi...~100chars',FILE_APPEND);
This takes ~5ms consistently. That seems to include 'blocking' and 'flushing'. So for those wondering about speed of file_put_contents, at least 5ms/operation on common servers 2013 04.
If you need speed, for example for some logging, #Matthew Flaschen said:
file_put_contents is equivalent to fopen, fwrite, fclose.
fclose should ensure the file is fully flushed to disk.
Then one needs:
function file_put_contents_fast() {...no fclose...}
But it will take some research to find out what happens if file handles are left open. Php closes them at exit, but does it really do so all the time? Even if it crashes? What happens if a file is left open by php after a crash? etc etc. After 30 minutes of php manual reading and googling, there was no mention of such and its consequences.
PHP should wait until the process is completed. But not knowing how you are implementing the operations it is hard to say. If you can post an example code that you are using that would be helpful so we can help you figure out why it is not working properly.