How to implement a distributed file upload solution? - php

I have a file uploading site which is currently resting on a single server i.e using the same server for users to upload the files to and the same server for content delivery.
What I want to implement is a CDN (content delivery network). I would like to buy a server farm and somehow if i were to have a mechanism to have files spread out across the different servers, that would balance my load a whole lot better.
However, I have a few questions regarding this:
Assuming my server farm consists of 10 servers for content delivery,
Since at the user end, the script to upload files will be one location only, i.e <form action=upload.php>, It has to reside on a single server, correct? How can I duplicate the script across multiple servers and direct the user's file upload data to the server with the least load?
How should I determine which files to be sent to which server? During the upload process, should I randomize all files to go to random servers? If the user sends 10 files should i send them to a random server? Is there a mechanism to send them to the server with the least load? Is there any other algorithm which can help determine which server the files need to be sent to?
How will the files be sent from the upload server to the CDN? Using FTP? Wouldn't that introduce additional overhead and need for error checking capability to check for FTP connection break, and to check if file was transferred successfully etc.?

Assuming you're using an Apache server, there is a module called mod_proxy_balancer. It handles all of the load-balancing work behind the scenes. The user will never know the difference -- except when their downloads and uploads are 10 times faster.
If you use this, you can have a complete copy on each server.
mod_proxy_balancer will handle this for you.
Each server can have its own sub-domain. You will have a database on your 'main' server, which matches up all of your download pages to the physical servers they are located on. Then a on-the-fly URL is passed based on some hash encryption algorithm, which prevents using a hard link to the download and increases your page hits. It could be a mix of personal and miscellaneous information, e.g., the users IP and the time of day. The download server then checks the hashes, and either accepts or denies the request.
If everything checks out, the download starts; your load is balanced; and the users don't have to worry about any of this behind the scenes stuff.
note: I have done Apache administration and web development. I have never managed a large CDN, so this is based on what I have seen in other sites and other knowledge. Anyone who has something to add here, or corrections to make, please do.
Update
There are also companies that manage it for you. A simple Google search will get you a list.

Related

How to balance webserver bandwith usage?

I have a drupal commerce website in which users upload a lot of images all the time. Each commerce order has n images.
I would like balance network traffic in order to save bandwidth (bandwidth is limited for each server). I cannot use a conventional load balancing solution because the balancer server also will have a limited bandwidth. My database will be on the separated server.
I would like to find a solution for handle request directly in each server and persist the connections by session in order to get all user's uploads on the same server, I think DNS round robin balancing isn't a good solution because the requests will be received in any server and all the files will not be at the same.
I had thought that I can put one subdomain for each server and redirect from my main drupal instance to another server, then all subsequent requests will be received for this server... but I'm not secure it is a good solution.. and I don't know if is possible and practical.
Can anyone suggest me an alternative?
My site runs on PHP 5.x
Excuse me for my weak English. To give you a better understanding of the picture
Making a Sub-domain is not a good solution. Because it uses the bandwidth of the same domain.
so
This solution has the least bandwidth consumption on the main site
You can use Ajax technology to upload to multiple servers or servers (with unlimited bandwidth)
And in those servers, after storing the image, use the API (REST or SOAP) to store the URL in the original server or get the registered number from (Web server). (image )
This method creates a very small amount for the original server and your images will be displayed from another server for display on the website.
or use other solution : image
Please see the pictures

Inter-network File Transfers using PHP with polling

I am designing a web-based file-managment system that can be conceptualised as 3 different servers:
The server that hosts the system interface (built in PHP) where users 'upload' and manage files (no actual files are stored here, it's all meta).
A separate staging server where files are placed to be worked on.
A file-store where the files are stored when they are not being worked on.
All 3 servers will be *nix-based on the same internal network. Users, based in Windows, will use a web interface to create an initial entry for a file on Server 1. This file will be 'uploaded' to Server 3 either from the user's local drive (if the file doesn't currently exist anywhere on the network) or another network drive on the internal network.
My question relates to the best programmatic approach to achieve what I want to do, namely:
When a user uploads a file (selecting the source via a web form) from the network, the file is transferred to Server 3 as an inter-network transfer, rather than passing through the user (which I believe is what would happen if it was sent as a standard HTTP form upload). I know I could set up FTP servers on each machine and attempt to FXP files between locations, but is this preferable to PHP executing a command on Server 1 (which will have global network access), to perform a cross-network transfer that way?
The second problem is that these are very large files we're talking about, at least a gigabyte or two each, and so transfers will not be instant. I need some method of polling the status of the transfer, and returning this to the web interface so that the user knows what is going on.
Alternatively this upload could be left to run asyncrhonously to the user's current view, but I would still need a method to check the status of the transfer to ensure it completes.
So, if using an FXP solution, how could polling be achieved? If using a file move/copy command from the shell, is any form of polling possible? PHP/JQuery solutions would be very acceptable.
My final part to this question relates to windows network drive mapping. A user may map a drive (and select a file from), an arbitrarily specified mapped drive. Their G:\ may relate to \server4\some\location\therein, but presumably any drive path given to the server via a web form will only send the G:\ file path. Is there a way to determine the 'real path' of mapped network drives?
Any solution would be used to stage files from Server 3 to Server 2 when the files are being worked on - the emphasis being on these giant files not having to pass through the user's local machine first.
Please let me know if you have comments and I will try to make this question more coherant if it is unclear.
As far as I’m aware (and I could be wrong) there is no standard way to determine the UNC path of a mapped drive from a browser.
The only way to do this would be to have some kind of control within the web page. Could be ActiveX or maybe flash. I’ve seen ActiveX doing this, but not flash.
In the past when designing web based systems that need to know the UNC path of a user’s mapped drive I’ve had to have a translation of drive to UNC path stored server side. I did have a luxury though of knowing which drive would map to what UNC path. If the user can set arbitrary paths then this obviously won’t work.
Ok, as I’m procrastinating and avoiding real work I’ve given this some thought.
I’ll preface this by saying that I’m in no way a Linux expert and the system I’m about to describe has just been thought up off the top of my head and is not something you’d want to put into any kind of production. However, it might help you down the right path.
So, you have 3 servers, the Interface Server (LAMP stack I’m assuming?) your Staging Server and your File Store Server. You will also have Client Machines and Network Shares. For the purpose of this design your Network Shares are hosted on nix boxes that your File Store can scp from.
You’d create your frontend website that tracks and stores information about files etc. This will also hold the details about which files are being copied, which are in Staging and so on.
You’ll also need some kind of Service running on the File Store Server. I’ll call this the File Copy Service. This will be responsible for coping the files from your servers hosting the network shares.
Now, you’ve still got an issue with how you figure out what path the users file is actually on. If you can stop users from mapping their own drives and force them to use consistent drive letters then you could keep a translation of drive letter to UNC path on the server. If you can’t, well I’ll let you figure that out. If you’re in a windows domain you can force the drive mappings using Group Policies.
Anyway, the process for the system would work something like this.
User goes to system and selects a file
The Interface server take the file path and calls the File Copy Service on the File Store Server
The File Copy Service connects to the server that hosts the file and initiates the copy. If they’re all nix boxes you could easily use something like SCP. Now, I haven’t actually looked up how to do it but I’d be very surprised if you can’t get a running total of percentage complete from SCP as it’s copying. With this running total the File Copy Service will be updating the database on the Interface Server with how the copy is doing so the user can see this from the Interface Server.
The File Copy Service can also be used to move files from the File Store to the staging server.
As i said very roughly thought out. The above would work, but it all depends a lot on how your systems are set up etc.
Having said all that though, there must be software that would do this out there. Have you looked?
If iam right is this archtecture:
Entlarge image
1.)
First lets sove the issue of "inter server transfer"
I would solve this issue by mount the FileSystem from Server 2 and 3 to Server 1 by NFS.
https://help.ubuntu.com/8.04/serverguide/network-file-system.html
So PHP can direct store files on file system and dont need to know on which server the files realy is.
/etc/exports
of Server 2 + 3
/directory/with/files 192.168.IPofServer.1 (rw,sync)
exportfs -ra
/etc/fstab
of Server 1
192.168.IPofServer.2:/var/lib/data/server2/ /directory/with/files nfs rsize=8192,wsize=8192,timeo=14,intr
192.168.IPofServer.3:/var/lib/data/server3/ /directory/with/files nfs rsize=8192,wsize=8192,timeo=14,intr
mount -a
2.)
Get upload progress for realy large files,
here are some possibilitys to have a progress bar for http uploads.
But for a resume function you would have to use a flash plugin.
http://fineuploader.com/#demo
https://github.com/valums/file-uploader
or you can build it by your selfe using the apc extension
http://www.amwsites.com/blog/2011/01/use-a-combination-of-jquery-php-apc-uploadprogress-to-show-progress-bar-during-an-upload/
3.)
Lets Server load files from Network drive.
This i would try with a java applet to figurre out the real network path and send this to server, so the server can fetch the file in background.
But i never didt thinks like this before and have no further informations.

Uploading files to a storage server from a web app

I have a webapp where by people are allowed to upload files, the webapp and upload form run on VPS1(24GB) I have another server called VPS2(1TB). I want user to use the webapp to upload files and for the files to be stored on the VPS2. How ever I'm not sure the best way to do this, would I upload the file to VPS1 and then transfer it to VPS2 via FTP(or other methods)? Or should I upload it directly to VPS2 using a post method on a webserver running on VPS2? This has to be scalable, I will be adding more webservers in the future.
I had thoughts about putting all the storage VPS servers in an array in PHP an array and randomly selecting which one to post files to. But I'm not sure, really lost and would like some advanced help.
1.You can post your files to your vps2 php script and store files there, Thats a good option and for scalability you can choose which server to choose depending on nearest location of server for clients or randomly choose one. this is the best option i see here, And rest work is your database.
2.Also you can backup a certain amount of files to your vps2 server using linux script when the disk is full using their local ip in case you have a local ip to share with other server.
But still first option is better, you can have different subdomains for the different web server like vps1.domain.com/file01 and vps2.domain.com/file02 and similarly other, and obviously script on different servers depends on sessions ,cookies , database.

Splitting form submissions to speed up transfer time

I have a simple CRM system that allows sales to put in customer info and upload appropriate files to create a project.
The system is already being hosted in the cloud. But the office internet upload speed is horrendous. One file may take up to 15 minutes or more to finish, causing a bottleneck in the sales process.
Upgrading our office internet is not an option; what other good solutions are out there?
I propose splitting the project submission form into 2 parts. Project info fields are posted directly to our cloud server webapp and stored in the appropriate DB table, the file submission will actually be submitted to a LAN server with a simple DB and api that will allow the cloud-hosted server webapp to communicate with to retrieve the file if ever needed again via a download link. Details need to be worked out for this set-up. But this is what I want to do in general.
Is this a good approach to solving this slow upload problem? I've never done this before, so are there also any obstacles to this implementation (cross-domain restrictions is something that comes into mind, but I believe that can be fixed with using an iFrame)?
If bandwidth is the bottleneck, then you need a solution that doesn't chew up all your bandwidth. You mentioned that you can't upgrade your bandwidth - what about putting in a second connection?
If not, the files need to stay on the LAN a little longer. It sounds like your plan would be to keep the files on the LAN forever, but you can store them locally initially and then push them later.
When you do copy the files out to the cloud, be sure to compress them and also setup rate limiting (so they take up maybe 10% of your available bandwidth during business hours).
Also put some monitoring in place to make sure the files are being sent in a timely manner.
I hope nobody needs to download those files! :(

Getting data with out scraping

I've got a directory where people submit data. It's stored and pending while it's moderated to make sure it's o.k.
Once approved I'd like another couple of sites that I control and a few I won't (on different servers) to be able to grab that data. This would be on a cron or something so there wouldn't be any human interaction. Moderation is fully dependent on that first moderation.
How do I go about doing this securely.
I've thought about grabbing it as rss, parsing and storing. I've thought about doing soap requests, grabbing xml files, etc....
What would YOU do?
A logical means of securely distributing the data would be to use (S)FTP, ideally with a firewall that only permits access the various permitted machines by IP, etc.
To enable this, once you have the file on the "source" machine, you could simply:
Move the file into a local FTP folder. (You'll quite possibly have to FTP it in (even though it's on the same machine) depending on user rights, etc.) As a tip, FTP is into a temp directory in the FTP folder and then move (rename in FTP parlance) it into a "for collection" folder once the FTP has completed. By doing this, you'll ensure that no partial files are collected.)
Periodically check (via cron) the "for collection" folder from the various permitted machines.
Grab the file(s) if there are any new files awaiting collection.
There are a variety of PHP functions to assist with this, including ftp_ssl_connect which uses SSL-FTP.
However, all that aside, it might be a lot less hassle to use something like rsync over ssh.
Why not have the storage site shoot a request over to the "subscribing" sites indicating new information is available (push notification)?
IE - just make a page request to a "newinfo.php?newinfo=true" or whatever on each of the sites. Then, each of those sites can do whatever they like knowing there's more information available.

Categories