We have files that are hosted on RapidShare which we would like to serve through our own website. Basically, when a user requests http://site.com/download.php?file=whatever.txt, the script should stream the file from RapidShare to the user.
The only thing I'm having trouble getting my head around is how to properly stream it. I'd like to use cURL, but I'm not sure if I can read the download from RapidShare in chunks and then echo them to the user. The best way I've thought of so far is to use a combination of fopen, fread, echo'ing the chunk of the file to the user, flushing, and repeating that process until the entire file is transferred.
I'm aware of the PHP readfile() function aswell, but would that be the best option? Bear in mind that these files can be several GB's in size, and although we have servers with 16GB RAM I want to keep the memory usage as low as possible.
Thank you for any advice.
HTTP has a Header called "Range" which basically allows you to fetch any chunk of a file (knowing that you already know the file size), but since PHP isn't multi-threaded aware, I don't see any benefit of using it.
Afaik, if you don't want to consume all your RAM, the only way to go is a two steps way.
First, stream the remote file using fopen()/fread() (or any php functions which allow you to use stream), split the read in small chunks (2048 bits may be enough), write/append the result to a tempfile(), then "echoing" back to your user by reading the temporary file.
That way, even a file 2To would, basically, consumes 2048 bits since only the chunk and the handle of the file is in memory.
You may also write some kind of proxy manager to cache and keep already downloaded files to avoid the remote reading process if a file is heavily downloaded (and keep it locally for a given time).
Related
I'm trying to convert a website to use S3 storage instead of local (expensive) disk storage. I solved the download problem using a stream wrapper interface on the S3Client. The upload problem is harder.
It seems to me that when I post to a PHP endpoint, the $_FILES object is already populated and copied to /tmp/ before I can even intercept it!
On top of that, the S3Client->upload() expects a file on the disk already!
Seems like a double-whammy against what I'm trying to do, and most advice I've found uses NodeJS or Java streaming so I don't know how to translate.
It would be better if I could intercept the code that populates $_FILES and then send up 5MB chunks from memory with the S3\ObjectUploader, but how do you crack open the PHP multipart handler?
Thoughts?
EDIT: It is a very low quantity of files, 0-20 per day, mostly 1-5MB sometimes hitting 40~70MB. Periodically (once every few weeks) a 1-2GB file will be uploaded. Hence the desire to move off an EC2 instance and into heroku/beanstalk type PaaS where I won't have much /tmp/ space.
It's hard to comment on your specific situation without knowing the performance requirements of the application and the volume of users needed to access it so I'll try to answer assuming a basic web app uploading profile avatars.
There are some good reasons for this, the file is streamed to the disk for multiple purposes one of which is to conserve memory use. If your file is not on the disk than it is in memory(think disk usage is expensive? bump up your memory usage and see how expensive that gets), which is fine for a single user uploading a small file, but not so great for a bunch of users uploading small files or worse: large files. You'll likely see the best performance if you use the defaults on these libraries and let them stream to and from the disk.
But again I don't know your use case and you may actually need to avoid the disk at all costs for some unknown reason.
I've a php code for a file download for specific user
I'm storing the content of the file in a database (using blob type).
<?php
//do stuffs to validate user
//do stuffs get the content from database;
//$r=mysql_fetch_object("$query");
header("Content-Type: $r->type");
header("Content-Disposition: attachment; filename=\"$r->name\"");
echo $r->content;
?>
In case of large files the file downloading takes long time.
How to improve the code?
Does the speed of download increased with multiple connections?
Assuming there's no artificial limits placed on the connection, an HTTP transfer will take up as much of the network pipe as it can.
Once the connection starts getting throttled (e.g. on a file download site like Rapidshare, 'free' users get limited bandwidth), then using parallel connections MAY increase speed. e.g. a single stream is limited to 50k/s, so opening 2 streams would make for an effective 100k/s.
But then you're going to have to support ranged download. Your script as it stands sends out the entire file, from beginning to end. So the user would download the whole file twice.
There's probably not that much you can do to speed up this specific process.
Server and client bandwidth are hard limits. Streaming the file through PHP will cause some additional overhead, but seeing as the data comes from a database, there is no straightforward way to improve that, either.
Moving to a faster server with more bandwidth may help things, but then also it might not. If the client's connection is slow, there is nothing you can do.
I created an simple web interface to allow various users to upload files. I set the upload limit to 100mb but now it turns out that the client occasionally wants to upload files 500mb+.
I know what to alter the php configuration to change the upload limit but I was wondering if there are any serious disadvantages to uploading files of this size via php?
Obviously ftp would be preferable but if possible i'd rather not have two different methods of uploading files.
Thanks
Firstly FTP is never preferable. To anything.
I assume you mean that you transferring the files via HTTP. While not quite as bad as FTP, its not a good idea if you can find another of solving the problem. HTTP (and hence the component programs) are optimized around transferring relatively small files around the internet.
While the protocol supports server to client range requests, it does not allow for the reverse operation. Even if the software at either end were unaffected by the volume, the more data you are pushing across the greater the interval during which you could lose the connection. But the biggest problem is that caveat in the last sentence.
Regardless of the server technology you use (PHP or something else) it's never a good idea to push that big file in one sweep in synchronous mode.
There are lots of plugins for any technology/framework that will do asynchronous upload for you.
Besides the connection timing out, there is one more disadvantage in that file uploading consumes the web server memory. You don't normally want that.
PHP will handle as many and as large a file as you'll allow it. But consider that it's basically impossible to resume an aborted upload in PHP, as scripts are not fired up until AFTER the upload is completed. The larger the file gets, the larger the chance of a network glitch killing the upload and wasting a good chunk of time and bandwidth. As well, without extra work with APC, or using something like uploadify, there's no progress report and users are left staring at a browser showing no visible signs of actual work except the throbber chugging away.
You all know about restrictions that exist in shared environment, so with that in mind, please suggest me a php function or something with the help of which I could stream my videos and other files. I have a lot of videos on the server, unlimited bandwidth and disk space, but I am limited in ram and cpu.
Don't use php to stream the data. Use a header redirect to point to the URL of the actual file. This will offload the work onto the webserver which might run under a different user id and is better optimized for this task.
Hmm, there is XMoov that acts as a "streaming server" but does not much more than serve a file byte by byte, with a few additional options and settings. It promises random access (i.e. arbitrary skipping within a video) but I haven't used it myself yet.
As a server administrator, though, I would frown on anybody using PHP to serve huge files like that because of the strain it puts on the server. I would generally not regard this to be a good idea, and rent a streaming server instead if at all possible. Use at your own risk.
You can use a while loop to load bits of the file, and then sleep for some time, and then output more, and sleep... (that would be the only way to limit the CPU usage).
RAM shouldn't be a problem, as you will just dump parts of the file, so you don't need to load it into RAM.
I need to upload potentially big (as in, 10's to 100's of megabytes) files from a desktop application to a server. The server code is written in PHP, the desktop application in C++/MFC. I want to be able to resume file uploads when the upload fails halfway through because this software will be used over unreliable connections. What are my options? I've found a number of HTTP upload components for C++, such as http://www.chilkatsoft.com/refdoc/vcCkUploadRef.html which looks excellent, but it doesn't seem to handle 'resume' of half done uploads (I assume this is because HTTP 1.1 doesn't support it). I've also looked at the BITS service but for uploads it requires an IIS server. So far my only option seems to be to cut up the file I want to upload into smaller pieces (say 1 meg each), upload them all to the server, reassemble them with PHP and run a checksum to see if everything went ok. To resume, I'd need to have some form of 'handshake' at the beginning of the upload to find out which pieces are already on the server. Will I have to code this by hand or does anyone know of a library that does all this for me, or maybe even a completely different solution? I'd rather not switch to another protocol that supports resume natively for maintenance reasons (potential problems with firewalls etc.)
I'm eight months late, but I just stumbled upon this question and was surprised that webDAV wasn't mentioned. You could use the HTTP PUT method to upload, and include a Content-Range header to handle resuming and such. A HEAD request would tell you if the file already exists and how big it is. So perhaps something like this:
1) HEAD the remote file
2) If it exists and size == local size, upload is already done
3) If size < local size, add a Content-Range header to request and seek to the appropriate location in local file.
4) Make PUT request to upload the file (or portion of the file, if resuming)
5) If connection fails during PUT request, start over with step 1
You can also list (PROPFIND) and rename (MOVE) files, and create directories (MKCOL) with dav.
I believe both Apache and Lighttpd have dav extensions.
You need a standard size (say 256k). If your file "abc.txt", uploaded by user x is 78.3MB it would be 313 full chunks and one smaller chunk.
You send a request to upload stating filename and size, as well as number of initial threads.
your php code will create a temp folder named after the IP address and filename,
Your app can then use MULTIPLE connections to send the data in different threads, so you could be sending chunks 1,111,212,313 at the same time (with separate checksums).
your php code saves them to different files and confirms reception after validating the checksum, giving the number of a new chunk to send, or to stop with this thread.
After all thread are finished, you would ask the php to join all the files, if something is missing, it would goto 3
You could increase or decrease the number of threads at will, since the app is controlling the sending.
You can easily show a progress indicator, either a simple progress bar, or something close to downthemall's detailed view of chunks.
libcurl (C api) could be a viable option
-C/--continue-at
Continue/Resume a previous file transfer at the given offset. The given offset is the exact number of bytes that will be skipped, counting from the beginning of the source file before it is transferred to the destination. If used with uploads, the FTP server command SIZE will not be used by curl.
Use "-C -" to tell curl to automatically find out where/how to resume the transfer. It then uses the given output/input files to figure that out.
If this option is used several times, the last one will be used
Google have created a Resumable HTTP Upload protocol. See https://developers.google.com/gdata/docs/resumable_upload
Is reversing the whole proccess an option? I mean, instead of pushing file over to the server make the server pull the file using standard HTTP GET with all bells and whistles (like accept-ranges, etc.).
Maybe the easiest method would be to create an upload page that would accept the filename and range in parameter, such as http://yourpage/.../upload.php?file=myfile&from=123456 and handle resumes in the client (maybe you could add a function to inspect which ranges the server has received)
# Anton Gogolev
Lol, I was just thinking about the same thing - reversing whole thing, making server a client, and client a server. Thx to Roel, why it wouldn't work, is clearer to me now.
# Roel
I would suggest implementing Java uploader [JumpLoader is good, with its JScript interface and even sample PHP server side code]. Flash uploaders suffer badly when it comes to BIIIGGG files :) , in a gigabyte scale that is.
F*EX can upload files up to TB range via HTTP and is able to resume after link failures.
It does not exactly meets your needs, because it is written in Perl and needs an UNIX based server, but the clients can be on any operating system. Maybe it is helpful for you nevertheless:
http://fex.rus.uni-stuttgart.de/
Exists the protocol called TUS for resumable uploads with some implementations in PHP and C++