PHP Download script with multiple connections - php

I've a php code for a file download for specific user
I'm storing the content of the file in a database (using blob type).
<?php
//do stuffs to validate user
//do stuffs get the content from database;
//$r=mysql_fetch_object("$query");
header("Content-Type: $r->type");
header("Content-Disposition: attachment; filename=\"$r->name\"");
echo $r->content;
?>
In case of large files the file downloading takes long time.
How to improve the code?
Does the speed of download increased with multiple connections?

Assuming there's no artificial limits placed on the connection, an HTTP transfer will take up as much of the network pipe as it can.
Once the connection starts getting throttled (e.g. on a file download site like Rapidshare, 'free' users get limited bandwidth), then using parallel connections MAY increase speed. e.g. a single stream is limited to 50k/s, so opening 2 streams would make for an effective 100k/s.
But then you're going to have to support ranged download. Your script as it stands sends out the entire file, from beginning to end. So the user would download the whole file twice.

There's probably not that much you can do to speed up this specific process.
Server and client bandwidth are hard limits. Streaming the file through PHP will cause some additional overhead, but seeing as the data comes from a database, there is no straightforward way to improve that, either.
Moving to a faster server with more bandwidth may help things, but then also it might not. If the client's connection is slow, there is nothing you can do.

Related

By closing a socket after only reading the useful data, am I really saving bandwidth?

I mean, I have an app that in a first step only needs to get the size of some images in a webserver, to do that, I'm using fsockopen. After reading the content-length header, I close the socket.
The question may be silly but I know little to nothing about the tcp protocol and the whole data transmision process over the internet and how the file gets to my php app throught this socket, so what I want to know is: Am I saving bandwidth by closing the socket before reading the whole file or is it still transferred to my local machine in it's entirety anyway? What about the server that hosts the image, does it know the socket is closed and stops sending the data?
It depends on a bunch of stuff. If the image is 10 Terabytes, then yes. Absolutely. If it's 100k, then probably not.
It all has to do with buffers -- buffers all over the place. Buffers on each computer, on each device in the network between them.. as well as latency and available bandwidth.
But basically if the file is big, yes, you're saving bandwidth. If it's small, you're not. Figuring out exactly would be difficult and the number of variables involved would be large. And the break-even point would likely change over time unless you controlled the full end-to-end system (and even then lots of things you would have a hard time controlling would still impact the answer).
Basically yes, for large enough files, but you'd save a lot more if you used the HTTP HEAD request for this, not a full GET request. Then you would save for all files.

Serving large file downloads from remote server

We have files that are hosted on RapidShare which we would like to serve through our own website. Basically, when a user requests http://site.com/download.php?file=whatever.txt, the script should stream the file from RapidShare to the user.
The only thing I'm having trouble getting my head around is how to properly stream it. I'd like to use cURL, but I'm not sure if I can read the download from RapidShare in chunks and then echo them to the user. The best way I've thought of so far is to use a combination of fopen, fread, echo'ing the chunk of the file to the user, flushing, and repeating that process until the entire file is transferred.
I'm aware of the PHP readfile() function aswell, but would that be the best option? Bear in mind that these files can be several GB's in size, and although we have servers with 16GB RAM I want to keep the memory usage as low as possible.
Thank you for any advice.
HTTP has a Header called "Range" which basically allows you to fetch any chunk of a file (knowing that you already know the file size), but since PHP isn't multi-threaded aware, I don't see any benefit of using it.
Afaik, if you don't want to consume all your RAM, the only way to go is a two steps way.
First, stream the remote file using fopen()/fread() (or any php functions which allow you to use stream), split the read in small chunks (2048 bits may be enough), write/append the result to a tempfile(), then "echoing" back to your user by reading the temporary file.
That way, even a file 2To would, basically, consumes 2048 bits since only the chunk and the handle of the file is in memory.
You may also write some kind of proxy manager to cache and keep already downloaded files to avoid the remote reading process if a file is heavily downloaded (and keep it locally for a given time).

Is uploading very large files (eg 500mb) via php advisable?

I created an simple web interface to allow various users to upload files. I set the upload limit to 100mb but now it turns out that the client occasionally wants to upload files 500mb+.
I know what to alter the php configuration to change the upload limit but I was wondering if there are any serious disadvantages to uploading files of this size via php?
Obviously ftp would be preferable but if possible i'd rather not have two different methods of uploading files.
Thanks
Firstly FTP is never preferable. To anything.
I assume you mean that you transferring the files via HTTP. While not quite as bad as FTP, its not a good idea if you can find another of solving the problem. HTTP (and hence the component programs) are optimized around transferring relatively small files around the internet.
While the protocol supports server to client range requests, it does not allow for the reverse operation. Even if the software at either end were unaffected by the volume, the more data you are pushing across the greater the interval during which you could lose the connection. But the biggest problem is that caveat in the last sentence.
Regardless of the server technology you use (PHP or something else) it's never a good idea to push that big file in one sweep in synchronous mode.
There are lots of plugins for any technology/framework that will do asynchronous upload for you.
Besides the connection timing out, there is one more disadvantage in that file uploading consumes the web server memory. You don't normally want that.
PHP will handle as many and as large a file as you'll allow it. But consider that it's basically impossible to resume an aborted upload in PHP, as scripts are not fired up until AFTER the upload is completed. The larger the file gets, the larger the chance of a network glitch killing the upload and wasting a good chunk of time and bandwidth. As well, without extra work with APC, or using something like uploadify, there's no progress report and users are left staring at a browser showing no visible signs of actual work except the throbber chugging away.

Use PHP to sync large amounts of text

I have several laptops in the field that need to daily get information from our server. Each laptop has a server2go installation (basically Apache, PHP, MySQL running as an executable) that launches a local webpage. The webpage calls a URL on our server using the following code:
$handle = fopen( $downloadURL , "rb");
$contents = stream_get_contents( $handle );
fclose( $handle );
The $downloadURL fetches a ton of information from a MySQL database on our server and returns the results as output to the device. I am currently returning the results as their own SQL statements (ie. - if I query the database "SELECT name FROM names", I might return to the device the text string "INSERT INTO names SET names='JOHN SMITH'"). This takes the info from the online database and returns it to the device in a SQL statement ready for insertion into the laptop's database.
The problem I am running into is that the amount of data is too large. The laptop webpage keeps timing out when retrieving info from the server. I have set the PHP timeout limits very high, but still run into problems. Can anyone think of a better way to do this? Will stream_get_contents stay connected to the server if I flush the data to the device in smaller chunks?
Thanks for any input.
What if you just send over the data and generate the sql on the receiving side? This will save you a lot of bytes to transmit.
Is the data update incremental? I.e. can you just send over the changes since the last update?
If you do have to send over a huge chunk of data, you might want to look at ways to compress or zip and then unzip on the other side. (Haven't looked at how to do that but I think it's achievable in php)
Write a script that compiles a text file from the database on the server, and download that file.
You might want to consider using third-party file synchronization services, like Windows Live Sync or Dropbox to get the latest file synchronized across all the machines. Then, just have a daemon that loads up the file into the database whenever the file is changed. This way, you avoid having to deal with the synchronization piece altogether.
You are using stream_get_contents (or you could even use file_get_contents without the need of extra line to open stream) but if you amount of text is really large like the title says, you'll fill up your memory.
I came to this problem when writing a script for a remote server, where memory is limited, so that wouldn't work. The solution I found was to use stream_copy_to_stream instead and copy your files directly on the disk rather then into memory.
Here is the complete code for that piece of functionality.

Sending large files via HTTP

I have a PHP client that requests an XML file over HTTP (i.e. loads an XML file via URL). As of now, the XML file is only several KB in size. A problem I can foresee is that the XML becomes several MBs or Gbs in size. I know that this is a huge question and that there are probably a myriad of solutions, but What ideas do you have to transport this data to the client?
Thanks!
based on your use case i'd definitely suggest zipping up the data first. in addition, you may want to md5 hash the file and compare it before initiating the download (no need to update if the file has no changes), this will help with point #2.
also, would it be possible to just send a segment of XML that has been instead of the whole file?
Ignoring how well a browser may or may-not handle a GB-sized XML file, the only real concern I can think of off the top of my head is if the execution time to generate all the XML is greater than any execution time thresholds that are set in your environment.
PHP's max_execution_time setting
PHP's set_time_limit() function
Apache's TimeOut Directive
Given that the XML is created dynamically with your PHP, the simplest thing I can think of is to ensure that the file is gzipped automatically by the webserver, like described here, it offers a general PHP approach and an Apache httpd-specific solution.
Besides that, having a browser (what else can be a PHP-client?) do such a job every night for some data synchonizing sounds like there must be a far simpler solution somewhere else.
And, of course, at some point, transferring "a lot" of data is going to take "a lot" of time...
The problem is that he's syncing up two datasets. The problem is completely misstated.
You need to either a) keep a differential log of changes to dataset A to that you can send that log to dataset B, or b) keep two copies of the dataset (last nights and the current dataset), and then compare them so you can then send the differential log from A to B.
Welcome to the world of replication.
The problem with (a) is that it's potentially invasive to all of your code, though if you're using an RDBMS you could do some logging perchance via database triggers to keep track of inserts/updates/deletes, and write the information in to a table, then export the relevant rows as your differential log. But, that can be nasty too.
The problem with (b) is the whole "comparing the database" all at once. Fine for 100 rows. Bad for 10^9 rows. Nasty nasty.
In fact, it can all be nasty. Replication is nasty.
A better plan is to look into a "real" replication system designed for the particular databases that you're running (assuming you're running a database). Something that perhaps sends database log records over for synchronization rather than trying to roll your own.
Most of the modern DBMS systems have replication systems.
Gallery2, which allows you to upload photos over http, makes you set up a couple of php parameters, post_max_size and upload_max_filesize, to allow larger uploads. You might want to look into that.
It seems to me that posting large files has problems with browser time-outs and the like, but on the plus side it works with proxy servers and firewalls better than trying a different file upload protocol.
Thanks for the responses. I failed to mention that transferring the file should be relatively fast (few mintues max, is this even possible?). The XML that is requested will be parsed and inserted into a database every night. The XML may be the same from the night before, or it may be different. One solution that was proposed is to zip the xml file and then transfer it. So there are basically two requirements: 1. it has to relatively fast 2. it should minimize the number of writes to the database.
One solution that was proposed is to zip the xml file and then transfer it. but that only satisfies (1)
Any other ideas?
Are there any algorithms that I could apply to compress the XML? How are large files such as MP3s being downloaded in a matter of seconds?
PHP receiving GB's of data will take long and is overhead.
Even more perceptible to flaws.
I would - dispatch the assignment to a shellscript (wget with simple error catching) that is not bothered by execution time and on failure could perhaps even retry on its own merit.
Am not experienced with this, but though one could use exec() or alike, these sadly run modal.
Calling a script with **./test.sh &** makes it run in background and solves that problem / i guess. The script could easily let your PHP pick it back up via a wget `http://yoursite.com/continue-xml-stuff.php?id=1049381023&status=0ยด. The id could be a filename, if you don't need to backtrack lost requests. The status would indicate how the script ended up handling the request.
Have you thought about using some sort of version control system to handle this? You could leverage its ability to calculate and send just the differences in the files, plus you get the added benefits of maintaining a version history of your file.
Since I don't know the details of your situation I'll throw question out there. Just for sake of argument does it have to be HTTP? FTP is much better suited for large data transfer and can be automated easily via PHP or Perl.
If you are using Apache, you might also consider Apache mod_gzip. This should allow you to compress the file automatically and the decompression should also happen automatically, as long as both sides accept gzip compression.

Categories