We have two apache/php web servers load balanced with squid page caching proxy. The squid caching is not active on image submission pages. We have a form where users can submit images.
It's a two step process. They first upload the images. The second step they can enter details about the images and images are then moved over to correct folders once they submit the image details.
Problem is when there is high traffic the second step might be served from a different server then the one with the uploaded images. So the second step might not find the uploaded images and upload fails to complete.
We have thousands of image files on these servers so the syncing between them is slow. Is there anyway that we can force a specific page to always to be served from a specific server? Basically to bypass the load balancing feature.
There are a few solutions to this.
Switch to nginx as a reverse proxy and you can stick clients to the host
Make the upload directory a NFS share mounted on both hosts
Upload the file into a mysql table (probarbly best to use a hash table) so both servers can access it.
Personally I would go with option 1 as you still get round robin load balancing, but each connection is stuck to the host that it was initially connected to.
Option 2 has the benefit of still equally balancing requests, but the downside is the NFS share is a single point of failure.
Option 3 can cause issues if there is not enough ram on the DB server if you use a hash table.
I can see two options close to Geoffrey answer.
1. Upload images to an upload directory syncronized through rsync.
Then the # of images would be much smaller and they would sync much faster.
After you go through the whole process you can move the image to the right folder.
2. DB: Storing not the image itself but the url of the image, so you always will know which server is holding it, having access to it.
Just two options that came reading Geogrey's answer and looking for info related to this topic.
Related
I have a drupal commerce website in which users upload a lot of images all the time. Each commerce order has n images.
I would like balance network traffic in order to save bandwidth (bandwidth is limited for each server). I cannot use a conventional load balancing solution because the balancer server also will have a limited bandwidth. My database will be on the separated server.
I would like to find a solution for handle request directly in each server and persist the connections by session in order to get all user's uploads on the same server, I think DNS round robin balancing isn't a good solution because the requests will be received in any server and all the files will not be at the same.
I had thought that I can put one subdomain for each server and redirect from my main drupal instance to another server, then all subsequent requests will be received for this server... but I'm not secure it is a good solution.. and I don't know if is possible and practical.
Can anyone suggest me an alternative?
My site runs on PHP 5.x
Excuse me for my weak English. To give you a better understanding of the picture
Making a Sub-domain is not a good solution. Because it uses the bandwidth of the same domain.
so
This solution has the least bandwidth consumption on the main site
You can use Ajax technology to upload to multiple servers or servers (with unlimited bandwidth)
And in those servers, after storing the image, use the API (REST or SOAP) to store the URL in the original server or get the registered number from (Web server). (image )
This method creates a very small amount for the original server and your images will be displayed from another server for display on the website.
or use other solution : image
Please see the pictures
I'm taking into consideration now, as I have few more steps upon completion on my portal website, the hosting of images.
Most of my 1-2 first years, the images + http daemon (nginx) + mysql database will be hosted on 1 VPS. But after that, while traffic increases, I will need to move to other solution, including scaling (mysql as well as balancing nginx).
My 1st thought which i'm implementing right now in the website is add a variable like $global_server_pictures_address in front of the "/folder/1/123.jpg", where this is one of the images uploaded, which will change from $global_server_pictures_address = ""; to
$global_server_pictures_address = "http://195.22.31.14".
This means (nginx) will be balanced with few more VPS'es which will server local content, and for each nginx VPS, when it's a query for an image, it will load from $global_server_pictures_address.
Another ideea that came to me would be, in the case of multiple VPS serving the website (nginx balanced), each time a user uploads an image, he would do it via curl php function (FTP_UPLOAD), on each server I have, this way reducing some bandwidth stress on the main 50Mbps VPS image server, now if we have say 3 VPS with each 50Mbps, and all holding the images, each with same stuff, balancing would be not good for nginx but also for bandwidth.
In this case, my $global_server_pictures_address will go away, we don't need it anymore.
I'm waiting for some other ideeas (if you have any) and also comments on my ideeas, what do you think of them.
You could also use AWS S3 to to store the images, that way all of your front-end server would have access to them, there would be a cost associated for image storage and bandwidth.
You also have CloudFront (AWS CDN) if you wanted better performance.
http://aws.amazon.com/s3/
I have a webapp where by people are allowed to upload files, the webapp and upload form run on VPS1(24GB) I have another server called VPS2(1TB). I want user to use the webapp to upload files and for the files to be stored on the VPS2. How ever I'm not sure the best way to do this, would I upload the file to VPS1 and then transfer it to VPS2 via FTP(or other methods)? Or should I upload it directly to VPS2 using a post method on a webserver running on VPS2? This has to be scalable, I will be adding more webservers in the future.
I had thoughts about putting all the storage VPS servers in an array in PHP an array and randomly selecting which one to post files to. But I'm not sure, really lost and would like some advanced help.
1.You can post your files to your vps2 php script and store files there, Thats a good option and for scalability you can choose which server to choose depending on nearest location of server for clients or randomly choose one. this is the best option i see here, And rest work is your database.
2.Also you can backup a certain amount of files to your vps2 server using linux script when the disk is full using their local ip in case you have a local ip to share with other server.
But still first option is better, you can have different subdomains for the different web server like vps1.domain.com/file01 and vps2.domain.com/file02 and similarly other, and obviously script on different servers depends on sessions ,cookies , database.
I have a website right now that is currently utilizing 2 servers, a application server and a database server, however the load on the application server is increasing so we are going to add a second application server.
The problem I have is that the website has users upload files to the server. How do I get the uploaded files on both of the servers?
I do not want to store images directly in a database as our application is database intensive already.
Is there a way to sync the servers across each other or is there something else I can do?
Any help would be appreciated.
Thanks
EDIT: I am adding the following links for people that helped me understand this question more:
Synchronize Files on Multiple Servers
and
Keep Uploaded Files in Sync Across Multiple Servers - LAMP
For all Reading this post NFS seems to be the better of the 2.
NFS will keep files in sync but you could also use ftp to upload the files across all servers as well but NFS looks like the way to go.
This is a question for serverfault.
Anyway I think you should definitely consider getting in the "cloud".
Syncing uploads from one server to another is simply unreliable - you have no idea what kind of errors you can get and why you can get them. Also the syncing process will load both servers. For me the proper solution is going in the cloud.
Should you chose the syncing method you have a couple of solutions:
Use rsync to sync the files you need between the servers.
Use crontab to sync the files every X minutes/hours/days.
Copy the files upon some event (user login etc)
I got this answer from server fault:
The most appropriate course of action in a situation like this is to break the file share into a separate service of its own. Don't duplicate files if you have a network that can let the files be "everywhere (almost) at once." You can do this through NFS/CIFS or through a proper storage protocol like iSCSI. Mount as local storage in the appropriate directory. Depending on the performance of your network and your storage needs, this could add a couple of undetectable milliseconds to page load time.
So using NFS to share server files would work OR
as stated by #kgb you could specify one single server to hold all uploaded files and have other servers pull from that (just make sure you run a cron or something to back up the file)
Most sites solve this problem by using a 3rd party designated file server like Amazon S3 for the user uploads.
Another answer could be to use a piece of software called BTSync, it is very easy to install and use and could allow you to easily keep files in sync accross as many servers as you need to. It takes only 3 terminal commands to install and is very efficient.
Take a look here
and here
You can use db server for storage... Not in the db i mean, have a web server running there too. It is not going to increase cpu load much, but is going to require a better channel.
you could do it with rsync.. people have suggested using nfs.. but that way you create one point of failure... if the nfs server goes down.. both your servers are screwed... correct me if im wrong
I have a file uploading site which is currently resting on a single server i.e using the same server for users to upload the files to and the same server for content delivery.
What I want to implement is a CDN (content delivery network). I would like to buy a server farm and somehow if i were to have a mechanism to have files spread out across the different servers, that would balance my load a whole lot better.
However, I have a few questions regarding this:
Assuming my server farm consists of 10 servers for content delivery,
Since at the user end, the script to upload files will be one location only, i.e <form action=upload.php>, It has to reside on a single server, correct? How can I duplicate the script across multiple servers and direct the user's file upload data to the server with the least load?
How should I determine which files to be sent to which server? During the upload process, should I randomize all files to go to random servers? If the user sends 10 files should i send them to a random server? Is there a mechanism to send them to the server with the least load? Is there any other algorithm which can help determine which server the files need to be sent to?
How will the files be sent from the upload server to the CDN? Using FTP? Wouldn't that introduce additional overhead and need for error checking capability to check for FTP connection break, and to check if file was transferred successfully etc.?
Assuming you're using an Apache server, there is a module called mod_proxy_balancer. It handles all of the load-balancing work behind the scenes. The user will never know the difference -- except when their downloads and uploads are 10 times faster.
If you use this, you can have a complete copy on each server.
mod_proxy_balancer will handle this for you.
Each server can have its own sub-domain. You will have a database on your 'main' server, which matches up all of your download pages to the physical servers they are located on. Then a on-the-fly URL is passed based on some hash encryption algorithm, which prevents using a hard link to the download and increases your page hits. It could be a mix of personal and miscellaneous information, e.g., the users IP and the time of day. The download server then checks the hashes, and either accepts or denies the request.
If everything checks out, the download starts; your load is balanced; and the users don't have to worry about any of this behind the scenes stuff.
note: I have done Apache administration and web development. I have never managed a large CDN, so this is based on what I have seen in other sites and other knowledge. Anyone who has something to add here, or corrections to make, please do.
Update
There are also companies that manage it for you. A simple Google search will get you a list.