Output of php is limited - php

I have a problem here, maybe someone has already gone through this before.
A system controller is serving php downloads, it reads information from files and sends the client as a download. The system works perfectly. The problem is that speed is always low, always less than the 300kb/se times less than 100kb / s for the user.
The server has a 100mbps link 6mbps free and the customer has, then it should be downloaded at 600kb / s. Something is holding the output of php. I've tried searching on the buffers of apache but found nothing on this issue.
Does anyone have any idea what might be happening?

PHP really isn't built for processing large files. It has to read that entire file into memory and then output it. It sounds like you're sending a reasonable amount of traffic through PHP, if 100kb/s - 300kb/s per user is too slow, via something like readfile() which is a bad idea. Instead, I suggest taking a look at mod_xsendfile (if you're using Apache) or it's equivalent for your web server of choice (e.g. I prefer nginx, and would use XSendFile for this).
In PHP then, you can just do this: header('X-Sendfile: ' . $file);. The server intercepts the header, and sends that file. It allows you the benefits of what you're doing with PHP, and the speed of the web server directly reading the file.

Related

Handling Very Large Uploads [duplicate]

I want to allow uploads of very large files into our PHP application (hundred of megs - 8 gigs). There are a couple of problems with this however.
Browser:
HTML uploads have crappy feedback, we need to either poll for progress (which is a bit silly) or show no feedback at all
Flash uploader puts entire file into memory before starting the upload
Server:
PHP forces us to set post_max_size, which could result in an easily exploitable DOS attack. I'd like to not set this setting globally.
The server also requires some other variables to be there in the POST vars, such as an secret key. We'd like to be able to refuse the request right away, instead of after the entire file is uploaded.
Requirements:
HTTP is a must.
I'm flexible with client-side technology, as long as it works in a browser.
PHP is not a requirement, if there's some other technology that will work well on a linux environment, that's perfectly cool.
upload_max_filesize can be set on a per-directory basis; the same goes for post_max_size
e.g.:
<Directory /uploadpath/>
php_value upload_max_filesize 10G
php_value post_max_size 10G
</IfModule>
Python Handler?
Using a Python POST handler instead of PHP. Generate a unique identifier from your PHP app that the client can put in the HTTP headers. With mod_python to reject or accept the large upload before the entire POST body is transmitted.
I think
http://www.modpython.org/live/current/doc-html/dir-handlers-hph.html
Allows you to check headers and decline the rest of the POST input. I haven't tried it but might be the right path?
Looking at the source of mod_python, the buffering of the input via read() seems to allow bit-at-a-time evaluation of the HTTP input. Headers are first.
https://svn.apache.org/repos/asf/quetzalcoatl/mod_python/trunk/src/filterobject.c
It's old I know, but maybe someone have this problem nowdays ,too.
Now you can do this with only Javascript and, say, PHP. No Flash or Java required on client side.
demo: http://dnduploader.filkor.org/
The idea is to slice the files with Javascript's Blob slice() method...
How about a Java applet? That's how we had to do it at a company I previously worked for. I know applets suck, especially in this day and age with all our options available, but they really are the most versatile solution to desktop-like problems encountered in web development. Just something to consider.
You can set the post_max_size for just scripts in 1 directory. Place your upload script there, and allow only that script to handle large sizes. It's still possible for that script to be attacked with large/useless files, but it avoids setting it globally.
Use that with APC and you might be able to work out something good:
IBM Developer works article on APC
Tried all of this... this is by far the best I have used yet...
http://www.uploadify.com/
Take a look at jumploader.com
A good java-applet for uploading.
I've used it for uploading images and it works fine. Haven't tried with bigger files than 10MB, but i should work for really big files too.
Have you looked into using APC to check the progress and total file size. Here is a good blog post about it. It might help.
Maybe you could use Webdav and Javascript in the browser
AJAX Big file upload, with progress, to WebDAV
http://www.webdavsystem.com/ajax/programming/upload_progress
A simple library
http://debris.demon.nl/projects/davclient.js/doc/README.html
You can then get the JS to redirect the user to a success page. Secret keys and what-not can be handled in a PHP prelude before handing off the JS Client->WebDAV
I would look into FTP, SSH or SCP this allows you to upload a large file and still have access control over the file as well. This might take a little longer to implement but its probably the most secure way I could think of.
I know it sucks to add another dependency but in my experience, most websites that are doing something like this are using flash on the client side, and uploading the large file as chunks
adobe as a howto on flash file uploads
I also found this tutorial on codeproject:
Multiple File Upload With Progress Bar Using Flash and ASP.NET
PS - I know you're using PHP and not .net, I figured the important part was the flash ;)
I've had success with uploadify, and I would recommend it. It's a jQuery/Flash script that handles large uploads, and you can pass extra parameters to it (like the secret key). To solve the server-side issues, simply use the following code. The changes take affect just for the script they're called in:
//Check to see if the key is there
if(!isset($_POST['secret_key']) || !isValid($_POST['secret_key']))
{
exit("Invalid request");
}
function isValid($key)
{
//Put your validation code here.
}
//This line changes the timeout.
//Give it a value in seconds (3600 = 1 hour)
set_time_limit(3600);
//Set these amounts to whatever you need.
ini_set("post_max_size","8192M");
ini_set("upload_max_filesize","8192M");
//Generally speaking, the memory_limit should be higher
//than your post size. So make sure that's right too.
ini_set("memory_limit","8200M");
EDIT In response to your comment:
Given what you've said, I'm afraid you may not be able to meet your requirements over http. All of the solutions out there are code that add features to http that it was never designed for.
Like you said yourself, it's a simple protocol. Apart from writing your own client software that runs outside of the browser, a java applet, or using a different protocol (like FTP, which was designed for this), you might not get what you want.
I've done the best I could within the given constraints. Sorry I couldn't do better.
Try this: http://www.simple2ftp.com uses a Java based FTP applet from within a clever PHP application wrapper.

Is downloading file through PHP or through direct link faster?

I need to get user download some file (for example, PDF). What will be longer:
send this file by PHP (with specific headers),
or put it in http public folder, and get user the public link to download it (without PHP help)?
In 1st case the original file could be in private zone.
But I'm thinking it will take some time to send this file by PHP.
So how I can measure PHP spent time to sending file and how much memory it can consumed?
P.S. in the 1st case, when PHP sends headers and browser (if pdf plugin is installed) will try to opening it inside browser, is PHP still working, or it push out whole file after headers sent immediately? Or if plugin not installed and browser will show "save as" dialog PHP still working ?
There will be very little in it if you are worried about download speeds.
I guess it comes down to how big your files are, how many downloads you expect to have, and if your documents should be publicly accessible, the download speed of the client.
Your main issue with PHP is the memory it consumes - each link will create a new process, which would be maybe 8M - 20M depending on what your script does, whether you use a framework etc.
Out of interest, I wrote a symfony application to offer downloads, and to do things like concurrency limiting, bandwidth limiting etc. It's here if you're interested in taking a look at the code. (I've not licensed it per se, but I'm happy to make it GPL3 if you like).

Redirect body of HTTP response to file

I am programming a service that has to force the download of a file.
I know the possibility of setting the HTTP-headers using PHP and then sending the file using the readfile-function. But I think this is not a good way for sending larger files because it would need a lot of server performance and the maximum execution time of the PHP-scripts would be exceeded.
So is it possible to send the HTTP-headers using PHP (I have to modify them depending on entries in a mysql-database.) and then let Apache send the file body?
I have to add that I could also use perl scripts but I also do not see a possibility for doing this in a cgi-script.
Thanks.
You can do this strictly with apache if the location and/or filetype of the download is known ahead of time:
<Location /downloads>
SetEnvIf Request_URI ".attachment-extension$" FILENAME=$0
Header set "Content-disposition" "attachment; filename=%{FILENAME}"
</Location>
because it would need a lot of server performance
I don't believe this is the case. As long as sending the file to the client is the last thing the script does before it terminates, the difference in CPU/RAM performance between sending the file from PHP and letting Apache handle it directly should be minimal, if there is one at all.
Unless you have a very large number of Gbps bandwidth on your server with an incredibly fast HDD setup, you would run into a bandwidth problem long before you ran into a server system resources problem.
Admittedly this discussion is based largely on conjecture (since I know nothing about your hosting setup), so YMMV.
the maximum execution time of the PHP-scripts would be exceeded
So just call set_time_limit(0);. That what it's for.
I found this link: http://jasny.net/articles/how-i-php-x-sendfile/. X-Sendfile is a module which seems to allow exactly what I want to do. It redirects the body of the HTTP-response and will Apache let handle everything. I just installed and tried it and it works fine.

Read files via php

You all know about restrictions that exist in shared environment, so with that in mind, please suggest me a php function or something with the help of which I could stream my videos and other files. I have a lot of videos on the server, unlimited bandwidth and disk space, but I am limited in ram and cpu.
Don't use php to stream the data. Use a header redirect to point to the URL of the actual file. This will offload the work onto the webserver which might run under a different user id and is better optimized for this task.
Hmm, there is XMoov that acts as a "streaming server" but does not much more than serve a file byte by byte, with a few additional options and settings. It promises random access (i.e. arbitrary skipping within a video) but I haven't used it myself yet.
As a server administrator, though, I would frown on anybody using PHP to serve huge files like that because of the strain it puts on the server. I would generally not regard this to be a good idea, and rent a streaming server instead if at all possible. Use at your own risk.
You can use a while loop to load bits of the file, and then sleep for some time, and then output more, and sleep... (that would be the only way to limit the CPU usage).
RAM shouldn't be a problem, as you will just dump parts of the file, so you don't need to load it into RAM.

Sending large files via HTTP

I have a PHP client that requests an XML file over HTTP (i.e. loads an XML file via URL). As of now, the XML file is only several KB in size. A problem I can foresee is that the XML becomes several MBs or Gbs in size. I know that this is a huge question and that there are probably a myriad of solutions, but What ideas do you have to transport this data to the client?
Thanks!
based on your use case i'd definitely suggest zipping up the data first. in addition, you may want to md5 hash the file and compare it before initiating the download (no need to update if the file has no changes), this will help with point #2.
also, would it be possible to just send a segment of XML that has been instead of the whole file?
Ignoring how well a browser may or may-not handle a GB-sized XML file, the only real concern I can think of off the top of my head is if the execution time to generate all the XML is greater than any execution time thresholds that are set in your environment.
PHP's max_execution_time setting
PHP's set_time_limit() function
Apache's TimeOut Directive
Given that the XML is created dynamically with your PHP, the simplest thing I can think of is to ensure that the file is gzipped automatically by the webserver, like described here, it offers a general PHP approach and an Apache httpd-specific solution.
Besides that, having a browser (what else can be a PHP-client?) do such a job every night for some data synchonizing sounds like there must be a far simpler solution somewhere else.
And, of course, at some point, transferring "a lot" of data is going to take "a lot" of time...
The problem is that he's syncing up two datasets. The problem is completely misstated.
You need to either a) keep a differential log of changes to dataset A to that you can send that log to dataset B, or b) keep two copies of the dataset (last nights and the current dataset), and then compare them so you can then send the differential log from A to B.
Welcome to the world of replication.
The problem with (a) is that it's potentially invasive to all of your code, though if you're using an RDBMS you could do some logging perchance via database triggers to keep track of inserts/updates/deletes, and write the information in to a table, then export the relevant rows as your differential log. But, that can be nasty too.
The problem with (b) is the whole "comparing the database" all at once. Fine for 100 rows. Bad for 10^9 rows. Nasty nasty.
In fact, it can all be nasty. Replication is nasty.
A better plan is to look into a "real" replication system designed for the particular databases that you're running (assuming you're running a database). Something that perhaps sends database log records over for synchronization rather than trying to roll your own.
Most of the modern DBMS systems have replication systems.
Gallery2, which allows you to upload photos over http, makes you set up a couple of php parameters, post_max_size and upload_max_filesize, to allow larger uploads. You might want to look into that.
It seems to me that posting large files has problems with browser time-outs and the like, but on the plus side it works with proxy servers and firewalls better than trying a different file upload protocol.
Thanks for the responses. I failed to mention that transferring the file should be relatively fast (few mintues max, is this even possible?). The XML that is requested will be parsed and inserted into a database every night. The XML may be the same from the night before, or it may be different. One solution that was proposed is to zip the xml file and then transfer it. So there are basically two requirements: 1. it has to relatively fast 2. it should minimize the number of writes to the database.
One solution that was proposed is to zip the xml file and then transfer it. but that only satisfies (1)
Any other ideas?
Are there any algorithms that I could apply to compress the XML? How are large files such as MP3s being downloaded in a matter of seconds?
PHP receiving GB's of data will take long and is overhead.
Even more perceptible to flaws.
I would - dispatch the assignment to a shellscript (wget with simple error catching) that is not bothered by execution time and on failure could perhaps even retry on its own merit.
Am not experienced with this, but though one could use exec() or alike, these sadly run modal.
Calling a script with **./test.sh &** makes it run in background and solves that problem / i guess. The script could easily let your PHP pick it back up via a wget `http://yoursite.com/continue-xml-stuff.php?id=1049381023&status=0ยด. The id could be a filename, if you don't need to backtrack lost requests. The status would indicate how the script ended up handling the request.
Have you thought about using some sort of version control system to handle this? You could leverage its ability to calculate and send just the differences in the files, plus you get the added benefits of maintaining a version history of your file.
Since I don't know the details of your situation I'll throw question out there. Just for sake of argument does it have to be HTTP? FTP is much better suited for large data transfer and can be automated easily via PHP or Perl.
If you are using Apache, you might also consider Apache mod_gzip. This should allow you to compress the file automatically and the decompression should also happen automatically, as long as both sides accept gzip compression.

Categories