PHP File Upload in Sharded Server Configuration

PHP File Upload in Sharded Server Configuration - php

We use multiple servers to handle incoming web requests which are load-balanced in a round-robin fashion. I've run into an issue that I'm not sure how to solve.
Using AJAX (qqFileUploader), I am uploading a file. By default it goes into the /tmp folder which is fine. The problem is when I try to retrieve that file, that retrieval request gets handled by the next server in line which does not have the file I uploaded. If I keep repeating the request over and over again, it will eventually reach the original server (via round robin load balancing) where the file was stored and then I can open it. Obviously that is not a good solution.
Here is essentially the code: http://jsfiddle.net/Ap27Z/. I removed some of it for brevity. You will see that the uploader object makes a call to a PHP file to do the file upload and then after the file upload is complete, another AJAX call is made to a script to process the .csv file. This is where the process is getting lost in the round-robin.
I read a few questions here on SO relating to uploading files to memory and it seems that it is essentially not currently feasible. Is there another option I can use to upload a file and handle it all within the same request?

The classic solution to this type of problem is to use sticky sessions on your load balancer. That may not be a good solution for you as it would be modifying your whole setup to fix a small problem.
I would suggest adding a sub-domain prefix for each machine e.g. upload goes to www.example.com and then each server is allocated an additional subdomain www1.example.com, www2.example.com which are always passed directly to that server, rather than round robin DNS.
As part of the success result, you could pass back the server name that points to the exact server, rather than the load-balanced name, and then all subsequent Ajax calls that reference the uploaded data use that server specific domain name, rather than the generic load balanced domain name.
Is there another option I can use to upload a file and handle it all
within the same request?
Sure, why not? The code that handles the POSTing of the data can do whatever you want it to do.

There are (at least) 2 solutions to your problem:
You change the load-balancing.
There are several load balancing proxies out there which support session affinity a.k.a. "sticky sessions". That means that a user always gets the same server within a session.
Two programs that can act in this way are HAProxy (related question here on SO) and nginx with a custon module (tutorial here).
You chance the files' location.
The other choice would be to change the location of your stored files to some place that all of your servers can access via the same location. This could be, for example, an NFS mount or a database (with the files stored as BLOBS). This way, it doesn't matter which server processes the request, as all of them have access to the file.

Related

Inter-network File Transfers using PHP with polling

I am designing a web-based file-managment system that can be conceptualised as 3 different servers:
The server that hosts the system interface (built in PHP) where users 'upload' and manage files (no actual files are stored here, it's all meta).
A separate staging server where files are placed to be worked on.
A file-store where the files are stored when they are not being worked on.
All 3 servers will be *nix-based on the same internal network. Users, based in Windows, will use a web interface to create an initial entry for a file on Server 1. This file will be 'uploaded' to Server 3 either from the user's local drive (if the file doesn't currently exist anywhere on the network) or another network drive on the internal network.
My question relates to the best programmatic approach to achieve what I want to do, namely:
When a user uploads a file (selecting the source via a web form) from the network, the file is transferred to Server 3 as an inter-network transfer, rather than passing through the user (which I believe is what would happen if it was sent as a standard HTTP form upload). I know I could set up FTP servers on each machine and attempt to FXP files between locations, but is this preferable to PHP executing a command on Server 1 (which will have global network access), to perform a cross-network transfer that way?
The second problem is that these are very large files we're talking about, at least a gigabyte or two each, and so transfers will not be instant. I need some method of polling the status of the transfer, and returning this to the web interface so that the user knows what is going on.
Alternatively this upload could be left to run asyncrhonously to the user's current view, but I would still need a method to check the status of the transfer to ensure it completes.
So, if using an FXP solution, how could polling be achieved? If using a file move/copy command from the shell, is any form of polling possible? PHP/JQuery solutions would be very acceptable.
My final part to this question relates to windows network drive mapping. A user may map a drive (and select a file from), an arbitrarily specified mapped drive. Their G:\ may relate to \server4\some\location\therein, but presumably any drive path given to the server via a web form will only send the G:\ file path. Is there a way to determine the 'real path' of mapped network drives?
Any solution would be used to stage files from Server 3 to Server 2 when the files are being worked on - the emphasis being on these giant files not having to pass through the user's local machine first.
Please let me know if you have comments and I will try to make this question more coherant if it is unclear.

As far as I’m aware (and I could be wrong) there is no standard way to determine the UNC path of a mapped drive from a browser.
The only way to do this would be to have some kind of control within the web page. Could be ActiveX or maybe flash. I’ve seen ActiveX doing this, but not flash.
In the past when designing web based systems that need to know the UNC path of a user’s mapped drive I’ve had to have a translation of drive to UNC path stored server side. I did have a luxury though of knowing which drive would map to what UNC path. If the user can set arbitrary paths then this obviously won’t work.
Ok, as I’m procrastinating and avoiding real work I’ve given this some thought.
I’ll preface this by saying that I’m in no way a Linux expert and the system I’m about to describe has just been thought up off the top of my head and is not something you’d want to put into any kind of production. However, it might help you down the right path.
So, you have 3 servers, the Interface Server (LAMP stack I’m assuming?) your Staging Server and your File Store Server. You will also have Client Machines and Network Shares. For the purpose of this design your Network Shares are hosted on nix boxes that your File Store can scp from.
You’d create your frontend website that tracks and stores information about files etc. This will also hold the details about which files are being copied, which are in Staging and so on.
You’ll also need some kind of Service running on the File Store Server. I’ll call this the File Copy Service. This will be responsible for coping the files from your servers hosting the network shares.
Now, you’ve still got an issue with how you figure out what path the users file is actually on. If you can stop users from mapping their own drives and force them to use consistent drive letters then you could keep a translation of drive letter to UNC path on the server. If you can’t, well I’ll let you figure that out. If you’re in a windows domain you can force the drive mappings using Group Policies.
Anyway, the process for the system would work something like this.
User goes to system and selects a file
The Interface server take the file path and calls the File Copy Service on the File Store Server
The File Copy Service connects to the server that hosts the file and initiates the copy. If they’re all nix boxes you could easily use something like SCP. Now, I haven’t actually looked up how to do it but I’d be very surprised if you can’t get a running total of percentage complete from SCP as it’s copying. With this running total the File Copy Service will be updating the database on the Interface Server with how the copy is doing so the user can see this from the Interface Server.
The File Copy Service can also be used to move files from the File Store to the staging server.
As i said very roughly thought out. The above would work, but it all depends a lot on how your systems are set up etc.
Having said all that though, there must be software that would do this out there. Have you looked?

If iam right is this archtecture:
Entlarge image
1.)
First lets sove the issue of "inter server transfer"
I would solve this issue by mount the FileSystem from Server 2 and 3 to Server 1 by NFS.
https://help.ubuntu.com/8.04/serverguide/network-file-system.html
So PHP can direct store files on file system and dont need to know on which server the files realy is.
/etc/exports
of Server 2 + 3
/directory/with/files 192.168.IPofServer.1 (rw,sync)
exportfs -ra
/etc/fstab
of Server 1
192.168.IPofServer.2:/var/lib/data/server2/ /directory/with/files nfs rsize=8192,wsize=8192,timeo=14,intr
192.168.IPofServer.3:/var/lib/data/server3/ /directory/with/files nfs rsize=8192,wsize=8192,timeo=14,intr
mount -a
2.)
Get upload progress for realy large files,
here are some possibilitys to have a progress bar for http uploads.
But for a resume function you would have to use a flash plugin.
http://fineuploader.com/#demo
https://github.com/valums/file-uploader
or you can build it by your selfe using the apc extension
http://www.amwsites.com/blog/2011/01/use-a-combination-of-jquery-php-apc-uploadprogress-to-show-progress-bar-during-an-upload/
3.)
Lets Server load files from Network drive.
This i would try with a java applet to figurre out the real network path and send this to server, so the server can fetch the file in background.
But i never didt thinks like this before and have no further informations.

Uploading files to a storage server from a web app

I have a webapp where by people are allowed to upload files, the webapp and upload form run on VPS1(24GB) I have another server called VPS2(1TB). I want user to use the webapp to upload files and for the files to be stored on the VPS2. How ever I'm not sure the best way to do this, would I upload the file to VPS1 and then transfer it to VPS2 via FTP(or other methods)? Or should I upload it directly to VPS2 using a post method on a webserver running on VPS2? This has to be scalable, I will be adding more webservers in the future.
I had thoughts about putting all the storage VPS servers in an array in PHP an array and randomly selecting which one to post files to. But I'm not sure, really lost and would like some advanced help.

1.You can post your files to your vps2 php script and store files there, Thats a good option and for scalability you can choose which server to choose depending on nearest location of server for clients or randomly choose one. this is the best option i see here, And rest work is your database.
2.Also you can backup a certain amount of files to your vps2 server using linux script when the disk is full using their local ip in case you have a local ip to share with other server.
But still first option is better, you can have different subdomains for the different web server like vps1.domain.com/file01 and vps2.domain.com/file02 and similarly other, and obviously script on different servers depends on sessions ,cookies , database.

PHP redirect file post stream

I'm writing an API using php to wrap a website functionality and returning everything in json\xml. I've been using curl and so far it's working great.
The website has a standard file upload Post that accepts file(s) up to 1GB.
So the problem is how to redirect the file upload stream to the correspondent website?
I could download the file and after that upload it, but I'm limited by my server to just 20MG. And it seems a poor solution.
Is it even possible to control the stream and redirect it directly to the website?

I preserverd original at the bottom for posterity, but as it turns out, there is a way to do this
What you need to use is a combination of HTTP put method (which unfortunately isn't available in native browser forms), the PHP php://input wrapper, and a streaming php Socket. This gets around several limitations - PHP disallows php://input for post data, but it does nothing with regards to PUT filedata - clever!
If you're going to attempt this with apache, you're going to need mod_actions installed an activated. You're also going to need to specify a PUT script with the Script directive in your virtualhost/.htaccess.
http://blog.haraldkraft.de/2009/08/invalid-command-script-in-apache-configuration/
This allows put methods only for one url endpoint. This is the file that will open the socket and forward its data elsewhere. In my example case below, this is just index.php
I've prepared a boilerplate example of this using the python requests module as the client sending the put request with an image file. If you run the remote_server.py it will open a service that just listens on a port and awaits the forwarded message from php. put.py sends the actual put request to PHP. You're going to need to set the hosts put.py and index.php to the ones you define in your virtual host depending on your setup.
Running put.py will open the included image file, send it to your php virtual host, which will, in turn, open a socket and stream the received data to the python pseudo-service and print it to stdout in the terminal. Streaming PHP forwarder!
There's nothing stopping you from using any remote service that listens on a TCP port in the same way, in another language entirely. The client could be rewritten the same way, so long as it can send a PUT request.
The complete example is here:
https://github.com/DeaconDesperado/php-streamer
I actually had a lot of fun with this problem. Please let me know how it works and we can patch it together.
Begin original answer
There is no native way in php to pass a file asynchronously as it comes in with the request body without saving its state down to disc in some manner. This means you are hard bound by the memory limit on your server (20MB). The manner in which the $_FILES superglobal is initialized after the request is received depends upon this, as it will attempt to migrate that multipart data to a tmp directory.
Something similar can be acheived with the use of sockets, as this will circumvent the HTTP protocol at least, but if the file is passed in the HTTP request, php is still going to attempt to save it statefully in memory before it does anything at all with it. You'd have the tail end of the process set up with no practical way of getting that far.
There is the Stream library comes close, but still relies on reading the file out of memory on the server side - it's got to already be there.
What you are describing is a little bit outside of the HTTP protocol, especially since the request body is so large. HTTP is a request/response based mechanism, and one depends upon the other... it's very difficult to accomplish a in-place, streaming upload at an intermediary point since this would imply some protocol that uploads while the bits are streamed in.
One might argue this is more a limitation of HTTP than PHP, and since PHP is designed expressedly with the HTTP protocol in mind, you are moving about outside its comfort zone.
Deployments like this are regularly attempted with high success using other scripting languages (Twisted in Python for example, lot of people are getting onboard with NodeJS for its concurrent design patter, there are alternatives in Ruby or Java that I know much less about.)

How to implement a distributed file upload solution?

I have a file uploading site which is currently resting on a single server i.e using the same server for users to upload the files to and the same server for content delivery.
What I want to implement is a CDN (content delivery network). I would like to buy a server farm and somehow if i were to have a mechanism to have files spread out across the different servers, that would balance my load a whole lot better.
However, I have a few questions regarding this:
Assuming my server farm consists of 10 servers for content delivery,
Since at the user end, the script to upload files will be one location only, i.e <form action=upload.php>, It has to reside on a single server, correct? How can I duplicate the script across multiple servers and direct the user's file upload data to the server with the least load?
How should I determine which files to be sent to which server? During the upload process, should I randomize all files to go to random servers? If the user sends 10 files should i send them to a random server? Is there a mechanism to send them to the server with the least load? Is there any other algorithm which can help determine which server the files need to be sent to?
How will the files be sent from the upload server to the CDN? Using FTP? Wouldn't that introduce additional overhead and need for error checking capability to check for FTP connection break, and to check if file was transferred successfully etc.?

Assuming you're using an Apache server, there is a module called mod_proxy_balancer. It handles all of the load-balancing work behind the scenes. The user will never know the difference -- except when their downloads and uploads are 10 times faster.
If you use this, you can have a complete copy on each server.
mod_proxy_balancer will handle this for you.
Each server can have its own sub-domain. You will have a database on your 'main' server, which matches up all of your download pages to the physical servers they are located on. Then a on-the-fly URL is passed based on some hash encryption algorithm, which prevents using a hard link to the download and increases your page hits. It could be a mix of personal and miscellaneous information, e.g., the users IP and the time of day. The download server then checks the hashes, and either accepts or denies the request.
If everything checks out, the download starts; your load is balanced; and the users don't have to worry about any of this behind the scenes stuff.
note: I have done Apache administration and web development. I have never managed a large CDN, so this is based on what I have seen in other sites and other knowledge. Anyone who has something to add here, or corrections to make, please do.
Update
There are also companies that manage it for you. A simple Google search will get you a list.

Send file from PHP page to another

I need to send a file from one PHP page (on which client uploads their files) to another PHP page on another server were files will be finaly stored.
To comunicate now I use JSON-RPC protocol; is it wise to send the file this way?
$string = file_get_contents("uploaded_file_path");
send the string to remote server and then
file_put_contents("file_name", $recieved_string_from_remte);
I understand that this approach takes twice the time than uploading directly to the second server.
Thanks
[edit]
details:
i need to write a service allowing some php (may be joomla) user to use a simple api to upload files and send some other data to my server which analyze them , put in a db and send back a response
[re edit]
i need to create a Simple method allowing the final user to do that, who will use this the interface on server 1 (the uploading) use the php and stop, so remote ssh mount ore strange funny stuff

If I were you, I'd send the file directly to the second server and store its file name and/or some hash of the file name (for easier retrieval) in a database on the first server.
Using this approach, you could query the second server from the first one for the status of the operation. This way, you can leave the file processing to the second machine, and assign user interaction to the first machine.

As i said in my comment, THIS IS NOT RECOMMENDABLE but anyway....
You can use sockets reading byte by byte:
http://php.net/manual/en/book.sockets.php
or you can use ftp:
http://php.net/manual/en/book.ftp.php
Anyway, the problem in your approuch is doing the process async or sync with the user navigation? I really suggest you passed it by sql or ftp and give the user a response based on another event (like a file watching, then email, etc) or using sql (binary, blob, etc)

Use SSHFS on machine 1 to map a file path to machine 2 (using SSH) and save the uploaded file to machine 2. After the file is uploaded, trigger machine 2 to do the processing and report back as normal.
This would allow you to upload to machine 1, but actually stream it to machine 2's HD so it can be processed faster on that machine.
This will be faster than any SQL or manual file copy solution, because the file transfer happens while the user is uploading the file.

If you don't need the files immediately after receiving them (for processing etc), then you can save them all in one folder on Server 1 and set up a cron to scp the contents of the folder to Server 2. All this assuming you are using linux servers, this is one of the most secure and efficient ways to do it.
For more info please take a look at http://en.wikipedia.org/wiki/Secure_copy or google scp.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.