Download multiple files simultaneously with PHP - Forking, Sockets - php

I'm using the following code to manage downloads from my site (the files are behind a captcha): http://www.richnetapps.com/php-download-script-with-resume-option/
Trouble is, when a file is being downloaded, it locks the rest of the site, and it's not possible to download another file simultaneously. ('Locks' as in trying to go to, say, the homepage when a download is in progress results in a long wait. The homepage appears only when the download is finished or cancelled. This is a problem because some of the files are several hundred MB).
I'd like two things to happen: 1- To be able to browse the site while a file is being downloaded, and 2- to be able to download another file (or two, or three, or ten...) simultaneously.
My gut feeling is I need to fork the process, create a new one, or open another socket. But I'm way out of my depth, and even if this was the right approach, I don't know how to do it. Any ideas guys?
Many thanks in advance....
EDIT----
I found it! I added session_write_close() right before setting the headers in the download script. Apparently this behaviour is due to PHP session handling - further info here: php simultaneous file downloads from the same browser and same php script (I searched and searched before asking, but obviously missed this post).
Many thanks....

A Content Delivery Network (CDN) will both offload from your server allowing your server to process homepage (or other) page requests, and allow many, many simultaneous downloads. It should be cheaper for bandwidth and perhaps faster for most users as well.
The key will be to configure to protect the files only after your Captcha, instead of being freely available like most CDN setups.

Related

Track hits over many domains

a quick question.
I'm looking at doing a multi-domain hit counter over many different domains, preferabbly in PHP.
What would the best way to track each hit be?
I was thinking storing a central database and updating the number in the database every time a page on any domain is loaded - but wouldn't that have major performance issues?
I was also thinking about 'basic number stored in text option' - but is it possible to edit a file from different servers/domains.
Any advice would be great!
if i get you right then you have different websites that sit on different servers?
in this case i'm not sure about editing a file from a different server and i wouldn't go there.
instead of editing a remote file, just update a remote DB (example)
best solution is using a non-blocking servers (like nodejs) which will update a DB on every page load (you can easily access remote DBs on other servers, or send a curl call to designated file on a master server). by using non-blocking web servers you will not slow down the page's load time.
google's analytics works a bit differently - it loads a script from google-analytics.com and this script gets all the info. the problem is that this only happens after the DOM has loaded.
if you are going for a solution like this - just put an AJAX call at the top of every page that you want to monitor.

Is downloading file through PHP or through direct link faster?

I need to get user download some file (for example, PDF). What will be longer:
send this file by PHP (with specific headers),
or put it in http public folder, and get user the public link to download it (without PHP help)?
In 1st case the original file could be in private zone.
But I'm thinking it will take some time to send this file by PHP.
So how I can measure PHP spent time to sending file and how much memory it can consumed?
P.S. in the 1st case, when PHP sends headers and browser (if pdf plugin is installed) will try to opening it inside browser, is PHP still working, or it push out whole file after headers sent immediately? Or if plugin not installed and browser will show "save as" dialog PHP still working ?
There will be very little in it if you are worried about download speeds.
I guess it comes down to how big your files are, how many downloads you expect to have, and if your documents should be publicly accessible, the download speed of the client.
Your main issue with PHP is the memory it consumes - each link will create a new process, which would be maybe 8M - 20M depending on what your script does, whether you use a framework etc.
Out of interest, I wrote a symfony application to offer downloads, and to do things like concurrency limiting, bandwidth limiting etc. It's here if you're interested in taking a look at the code. (I've not licensed it per se, but I'm happy to make it GPL3 if you like).

Dynamically blocking IPs in high-traffic site: best strategy?

I've got some bad bots targeting my website and I need to dynamically handle the IP addresses from which those bots come. It's a pretty high-traffic site, we get a couple of millions of pageviews per day and that's why we're using 4 servers (loadbalanced). We don't use any caching (besides assets) because most of our responses are unique.
Code-technically it's a pretty small PHP website, which does no database queries and one XML request per pageview. The XML request get's a pretty fast response.
I've developed a script to (very frequently) analyse which IP addresses are doing abusive requests and I want to handle requests from those IPs differently for a certain amount of time. The IPs that are abusive change a lot so I need to block different IPs every couple of minutes
So: I see IP xx.xx.xx.xx being abusive, I record this somewhere and then I want to give that IP a special treatment for the next x minutes it does requests. I need to do this in a fast way, because I don't want to slow down the server and have the legitimate users suffer for this.
Solution 1: file
Writing the abusive IPs down in a file and then reading that file for every request seems
too slow. Would you agree?
Solution 2:PHP include
I could let my analysis script write a PHP include file which the PHP engine then would include for every request. But: I can imagine that, while writing the PHP file, a lot of users that do a request right then get an error because the file is being used.
I could solve that potential problem by writing the file and then doing a symlink change (which might be faster).
Solution 3: htaccess
Another way to separate the abusers out would be to write an htacces that blocks or redirects them. This might be the most efficient way but I need to write an htaccess file every x minutes then.
I'd love to hear some thoughts/reactions on my proposed solutions, especially concerning speed.
What about dynamically configuring iptables to block the bad IPs? I don't see any reason to do the "firewalling" in PHP...
For the record I've finally decided to go for (my own proposed) solution number 2, generating a PHP file that is included on every page request.
The complete solution is as follows:
A Python script analyses the accesslog file every x minutes and doles out "punishments" to certain IP addresses. All currently running punishments are written into a fairly small (<1Kb) PHP file. This PHP file is included for every page request. Directly after generation of the PHP file an rsync job is started to push the new PHP file out to the other 3 servers behind the loadbalancer.
In the Python script that generates the PHP file I first concatenate the complete contents of the file. I then open, write and close the file sequentially to lock the file for the shortest possible period.
I would seriously consider putting up another server that holds the (constantly changing) block list in-memory and serves the front-end servers.
I implemented such a solution using Node.JS and found the implementation easy and performance very good.
memcached could also be used, but I never tried it.

Is uploading very large files (eg 500mb) via php advisable?

I created an simple web interface to allow various users to upload files. I set the upload limit to 100mb but now it turns out that the client occasionally wants to upload files 500mb+.
I know what to alter the php configuration to change the upload limit but I was wondering if there are any serious disadvantages to uploading files of this size via php?
Obviously ftp would be preferable but if possible i'd rather not have two different methods of uploading files.
Thanks
Firstly FTP is never preferable. To anything.
I assume you mean that you transferring the files via HTTP. While not quite as bad as FTP, its not a good idea if you can find another of solving the problem. HTTP (and hence the component programs) are optimized around transferring relatively small files around the internet.
While the protocol supports server to client range requests, it does not allow for the reverse operation. Even if the software at either end were unaffected by the volume, the more data you are pushing across the greater the interval during which you could lose the connection. But the biggest problem is that caveat in the last sentence.
Regardless of the server technology you use (PHP or something else) it's never a good idea to push that big file in one sweep in synchronous mode.
There are lots of plugins for any technology/framework that will do asynchronous upload for you.
Besides the connection timing out, there is one more disadvantage in that file uploading consumes the web server memory. You don't normally want that.
PHP will handle as many and as large a file as you'll allow it. But consider that it's basically impossible to resume an aborted upload in PHP, as scripts are not fired up until AFTER the upload is completed. The larger the file gets, the larger the chance of a network glitch killing the upload and wasting a good chunk of time and bandwidth. As well, without extra work with APC, or using something like uploadify, there's no progress report and users are left staring at a browser showing no visible signs of actual work except the throbber chugging away.

Best Practice for Uploading Many (2000+) Images to A Server

I have a general question about this.
When you have a gallery, sometimes people need to upload 1000's of images at once. Most likely, it would be done through a .zip file. What is the best way to go about uploading this sort of thing to a server. Many times, server have timeouts etc. that need to be accounted for. I am wondering what kinds of things should I be looking out for and what is the best way to handle a large amount of images being uploaded.
I'm guessing that you would allow a user to upload a zip file (assuming the timeout does not effect you), and this zip file is uploaded to a specific directory, lets assume in this case a directory is created for each user in the system. You would then unzip the directory on the server and scan the user's folder for any directories containing .jpg or .png or .gif files (etc.) and then import them into a table accordingly. I'm guessing labeled by folder name.
What kind of server side troubles could I run into?
I'm aware that there may be many issues. Even general ideas would be could so I can then research further. Thanks!
Also, I would be programming in Ruby on Rails but I think this question applies accross any language.
There's no reason why you couldn't handle this kind of thing with a web application. There's a couple of excellent components that would be useful for this:
Uploadify (based on jquery/flash)
plupload (from moxiecode, the tinymce people)
The reason they're useful is that in the first instance, it uses a flash component to handle uploads, so you can select groups of files from the file browser window (assuming no one is going to individually select thousands of images..!), and with plupload, drag and drop is supported too along with more platforms.
Once you've got your interface working, the server side stuff just needs to be able to handle individual uploads, associating them with some kind of user account, and from there it should be pretty straightforward.
With regards to server side issues, that's really a big question, depending on how many people will be using the application at the same time, size of images, any processing that takes place after. Remember, the files are kept in a temporary location while the script is processing them, and either deleted upon completion or copied to a final storage location by your script, so space/memory overheads/timeouts could be an issue.
If the images are massive in size, say raw or tif, then this kind of thing could still work with chunked uploads, but implementing some kind of FTP upload might be easier. Its a bit of a vague question, but should be plenty here to get you going ;)
For those many images it has to be a serious app.. thus giving you the liberty to suggest a piece of software running on the client (something like yahoo mail/picassa does) that will take care of 'managing' (network interruptions/resume support etc) the upload of images.
For the server side, you could process these one at a time (assuming your client is sending them that way)..thus keeping it simple.
take a peek at http://gallery.menalto.com
they have a dozen of methods for uploading pictures into galleries.
You can choose ones which suits you.
Either have a client app, or some Ajax code that sends the images one by one, preventing timeouts. Alternatively if this is not available to the public. FTP still works...
I'd suggest a client application (maybe written in AIR or Titanium) or telling your users what FTP is.
deviantArt.com for example offers FTP as an upload method for paying subscribers and it works really well.
Flickr instead has it's own app for this. The "Flickr Uploadr".

Categories