Read files via php

Read files via php - php

You all know about restrictions that exist in shared environment, so with that in mind, please suggest me a php function or something with the help of which I could stream my videos and other files. I have a lot of videos on the server, unlimited bandwidth and disk space, but I am limited in ram and cpu.

Don't use php to stream the data. Use a header redirect to point to the URL of the actual file. This will offload the work onto the webserver which might run under a different user id and is better optimized for this task.

Hmm, there is XMoov that acts as a "streaming server" but does not much more than serve a file byte by byte, with a few additional options and settings. It promises random access (i.e. arbitrary skipping within a video) but I haven't used it myself yet.
As a server administrator, though, I would frown on anybody using PHP to serve huge files like that because of the strain it puts on the server. I would generally not regard this to be a good idea, and rent a streaming server instead if at all possible. Use at your own risk.

You can use a while loop to load bits of the file, and then sleep for some time, and then output more, and sleep... (that would be the only way to limit the CPU usage).
RAM shouldn't be a problem, as you will just dump parts of the file, so you don't need to load it into RAM.

Related

Serving large file downloads from remote server

We have files that are hosted on RapidShare which we would like to serve through our own website. Basically, when a user requests http://site.com/download.php?file=whatever.txt, the script should stream the file from RapidShare to the user.
The only thing I'm having trouble getting my head around is how to properly stream it. I'd like to use cURL, but I'm not sure if I can read the download from RapidShare in chunks and then echo them to the user. The best way I've thought of so far is to use a combination of fopen, fread, echo'ing the chunk of the file to the user, flushing, and repeating that process until the entire file is transferred.
I'm aware of the PHP readfile() function aswell, but would that be the best option? Bear in mind that these files can be several GB's in size, and although we have servers with 16GB RAM I want to keep the memory usage as low as possible.
Thank you for any advice.

HTTP has a Header called "Range" which basically allows you to fetch any chunk of a file (knowing that you already know the file size), but since PHP isn't multi-threaded aware, I don't see any benefit of using it.
Afaik, if you don't want to consume all your RAM, the only way to go is a two steps way.
First, stream the remote file using fopen()/fread() (or any php functions which allow you to use stream), split the read in small chunks (2048 bits may be enough), write/append the result to a tempfile(), then "echoing" back to your user by reading the temporary file.
That way, even a file 2To would, basically, consumes 2048 bits since only the chunk and the handle of the file is in memory.
You may also write some kind of proxy manager to cache and keep already downloaded files to avoid the remote reading process if a file is heavily downloaded (and keep it locally for a given time).

PHP File Upload Bandwidth

I have a page that uploads a file to my server, where it then gets copied to a permanent directory via move_uploaded_file. This all seems to work great, with the exception that in a real-life scenario I will be expecting much larger files than I have successfully sent up.
I have already tacked the timeout for the file upload by changing the connection timeout in my site settings in IIS - so the file continues to upload up to six hours ( -_- ) - but this is where I run into my current problem - It might just take six hours!
After getting the upload process to get past 10% or so ( on a 300 meg file ), I noticed that the file continues to push up, but my upload rate seems to be 'falling off' - as in, I observed faster speeds when I started the transfer, than I am seeing halfway through it. The numbers here aren't necessarily relevant, as I know that my upload ( while Im uploading, still 2 Mbps ) is capable of pushing faster than it is, and the server on the other end is on fiber.
I wonder if anyone has encountered this before, and if so, have you determined a work-around. Any help appreciated. Thanks.

You should not be using HTTP for this task. You may have observed that all the "file locker" services (and others which involve uploading files, such as Apple's online-music service) provide you with an "uploader" program rather than making use of the browser. There are reasons for this.
First off, the overhead of the transfer encoding is large. You take your (presumably binary) data, and Base64 encode it; that's 33% overhead. So if it would take four hours with HTTP, it would only take three with a binary protocol - and that's disregarding the chunked-transfer overhead, so the reality is probably more severe.
Second, there's no way to "resume" an upload in HTTP. So if your connection is broken, you'll either have to write application-specific code to handle the resumption, or start all over.
Third, HTTP servers are not designed for super-long-lived connections: they usually have a finite or small pool of workers to service the (usually seconds-long at the outset) client requests, and occasionally they have smallish limits on the size of request data (2GB is common, and PHP by default has only a few MB).
I strongly recommend using a file transfer protocol to transfer files (such as FTP). You don't have to give out a single username/password pair to everyone: you can have a gatekeeper which integrates with whatever authentication system you already have in place. FTP-over-TLS also exists and is relatively mature.
There is a fairly good summary of the differences between the two protocols here. Note that you gain nothing from any of the advantages of HTTP listed, due to your circumstances.
Don't feel limited to FTP - rsync is a great protocol for transferring files as well, especially if you only change part of the file (it even does binary deltas!). git can also efficiently transport large blobs over secure connections or even HTTP, if you insist on using that.

Is uploading very large files (eg 500mb) via php advisable?

I created an simple web interface to allow various users to upload files. I set the upload limit to 100mb but now it turns out that the client occasionally wants to upload files 500mb+.
I know what to alter the php configuration to change the upload limit but I was wondering if there are any serious disadvantages to uploading files of this size via php?
Obviously ftp would be preferable but if possible i'd rather not have two different methods of uploading files.
Thanks

Firstly FTP is never preferable. To anything.
I assume you mean that you transferring the files via HTTP. While not quite as bad as FTP, its not a good idea if you can find another of solving the problem. HTTP (and hence the component programs) are optimized around transferring relatively small files around the internet.
While the protocol supports server to client range requests, it does not allow for the reverse operation. Even if the software at either end were unaffected by the volume, the more data you are pushing across the greater the interval during which you could lose the connection. But the biggest problem is that caveat in the last sentence.

Regardless of the server technology you use (PHP or something else) it's never a good idea to push that big file in one sweep in synchronous mode.
There are lots of plugins for any technology/framework that will do asynchronous upload for you.
Besides the connection timing out, there is one more disadvantage in that file uploading consumes the web server memory. You don't normally want that.

PHP will handle as many and as large a file as you'll allow it. But consider that it's basically impossible to resume an aborted upload in PHP, as scripts are not fired up until AFTER the upload is completed. The larger the file gets, the larger the chance of a network glitch killing the upload and wasting a good chunk of time and bandwidth. As well, without extra work with APC, or using something like uploadify, there's no progress report and users are left staring at a browser showing no visible signs of actual work except the throbber chugging away.

mod_rewrite in apache

I have image hosting. All image requests redirect to a PHP script with mod_rewrite. PHP script uses function fread() and displays picture from another file. I want to know, does this use a lot of processor time or not?

It depends on how much you think "a lot of processor time" is, but from what you're describing, the processing time required by mod_rewrite and PHP is trivial compared to the I/O time to read the image from disk and send it over the network.
If you're concerned about speed, caching the images in memory will probably have the most benefit.

Yes, this is putting considerable strain on the web server, because the PHP interpreter has to be initialized for every small resource request, and passes through the data. The consensus is that this is not a good thing to do on a high-traffic website.
Why are you doing this, are you resizing images?

You will run out of memory before hitting CPU limit :-)
Reading/writing file is not CPU-intensive task, but each apache process created for that can eat up to 50 MB of RAM.

If you want to send images fast and secure, you should look into X-SendFile - this allows your php scripts to tell your Webserver to send files not directly accessible by url using something like header('X-SendFile: /path/to/the/file');
For apache there is mod_xsendfile (http://tn123.ath.cx/mod_xsendfile/) though labeled beta, it has proven to be very stable in production, and the its sourcecode is rather small and can be audited easily.

Sending large files via HTTP

I have a PHP client that requests an XML file over HTTP (i.e. loads an XML file via URL). As of now, the XML file is only several KB in size. A problem I can foresee is that the XML becomes several MBs or Gbs in size. I know that this is a huge question and that there are probably a myriad of solutions, but What ideas do you have to transport this data to the client?
Thanks!

based on your use case i'd definitely suggest zipping up the data first. in addition, you may want to md5 hash the file and compare it before initiating the download (no need to update if the file has no changes), this will help with point #2.
also, would it be possible to just send a segment of XML that has been instead of the whole file?

Ignoring how well a browser may or may-not handle a GB-sized XML file, the only real concern I can think of off the top of my head is if the execution time to generate all the XML is greater than any execution time thresholds that are set in your environment.
PHP's max_execution_time setting
PHP's set_time_limit() function
Apache's TimeOut Directive

Given that the XML is created dynamically with your PHP, the simplest thing I can think of is to ensure that the file is gzipped automatically by the webserver, like described here, it offers a general PHP approach and an Apache httpd-specific solution.
Besides that, having a browser (what else can be a PHP-client?) do such a job every night for some data synchonizing sounds like there must be a far simpler solution somewhere else.
And, of course, at some point, transferring "a lot" of data is going to take "a lot" of time...

The problem is that he's syncing up two datasets. The problem is completely misstated.
You need to either a) keep a differential log of changes to dataset A to that you can send that log to dataset B, or b) keep two copies of the dataset (last nights and the current dataset), and then compare them so you can then send the differential log from A to B.
Welcome to the world of replication.
The problem with (a) is that it's potentially invasive to all of your code, though if you're using an RDBMS you could do some logging perchance via database triggers to keep track of inserts/updates/deletes, and write the information in to a table, then export the relevant rows as your differential log. But, that can be nasty too.
The problem with (b) is the whole "comparing the database" all at once. Fine for 100 rows. Bad for 10^9 rows. Nasty nasty.
In fact, it can all be nasty. Replication is nasty.
A better plan is to look into a "real" replication system designed for the particular databases that you're running (assuming you're running a database). Something that perhaps sends database log records over for synchronization rather than trying to roll your own.
Most of the modern DBMS systems have replication systems.

Gallery2, which allows you to upload photos over http, makes you set up a couple of php parameters, post_max_size and upload_max_filesize, to allow larger uploads. You might want to look into that.
It seems to me that posting large files has problems with browser time-outs and the like, but on the plus side it works with proxy servers and firewalls better than trying a different file upload protocol.

Thanks for the responses. I failed to mention that transferring the file should be relatively fast (few mintues max, is this even possible?). The XML that is requested will be parsed and inserted into a database every night. The XML may be the same from the night before, or it may be different. One solution that was proposed is to zip the xml file and then transfer it. So there are basically two requirements: 1. it has to relatively fast 2. it should minimize the number of writes to the database.
One solution that was proposed is to zip the xml file and then transfer it. but that only satisfies (1)
Any other ideas?

Are there any algorithms that I could apply to compress the XML? How are large files such as MP3s being downloaded in a matter of seconds?

PHP receiving GB's of data will take long and is overhead.
Even more perceptible to flaws.
I would - dispatch the assignment to a shellscript (wget with simple error catching) that is not bothered by execution time and on failure could perhaps even retry on its own merit.
Am not experienced with this, but though one could use exec() or alike, these sadly run modal.
Calling a script with **./test.sh &** makes it run in background and solves that problem / i guess. The script could easily let your PHP pick it back up via a wget `http://yoursite.com/continue-xml-stuff.php?id=1049381023&status=0´. The id could be a filename, if you don't need to backtrack lost requests. The status would indicate how the script ended up handling the request.

Have you thought about using some sort of version control system to handle this? You could leverage its ability to calculate and send just the differences in the files, plus you get the added benefits of maintaining a version history of your file.

Since I don't know the details of your situation I'll throw question out there. Just for sake of argument does it have to be HTTP? FTP is much better suited for large data transfer and can be automated easily via PHP or Perl.

If you are using Apache, you might also consider Apache mod_gzip. This should allow you to compress the file automatically and the decompression should also happen automatically, as long as both sides accept gzip compression.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.