I have a drupal commerce website in which users upload a lot of images all the time. Each commerce order has n images.
I would like balance network traffic in order to save bandwidth (bandwidth is limited for each server). I cannot use a conventional load balancing solution because the balancer server also will have a limited bandwidth. My database will be on the separated server.
I would like to find a solution for handle request directly in each server and persist the connections by session in order to get all user's uploads on the same server, I think DNS round robin balancing isn't a good solution because the requests will be received in any server and all the files will not be at the same.
I had thought that I can put one subdomain for each server and redirect from my main drupal instance to another server, then all subsequent requests will be received for this server... but I'm not secure it is a good solution.. and I don't know if is possible and practical.
Can anyone suggest me an alternative?
My site runs on PHP 5.x
Excuse me for my weak English. To give you a better understanding of the picture
Making a Sub-domain is not a good solution. Because it uses the bandwidth of the same domain.
so
This solution has the least bandwidth consumption on the main site
You can use Ajax technology to upload to multiple servers or servers (with unlimited bandwidth)
And in those servers, after storing the image, use the API (REST or SOAP) to store the URL in the original server or get the registered number from (Web server). (image )
This method creates a very small amount for the original server and your images will be displayed from another server for display on the website.
or use other solution : image
Please see the pictures
Related
So, for a simple test game, I'm working on generating user images based on their current in-game avatar. I got this idea from Club Penguin and GTA V. They both generate images of the current in-game avatar.
I created a script to simply put a few images together and print out the final image to the client. It's similar to how Club Penguin does it, I believe: http://cdn.avatar.clubpenguin.com/%7B13bcb2a5-2e21-442c-b8e4-10516be6abc6%7D/cp?size=300
As you can see, the penguin is wearing multiple clothing items. The items are each different images located at http://mobcdn.clubpenguin.com/game/items/images/paper/image/300/ (ex: http://mobcdn.clubpenguin.com/game/items/images/paper/image/300/210.png)
Anyway, I've already made the script and all, but I have a few questions.
When going to Club Penguin's or Grand Theft Auto's avatar generator, you'll notice it finishes the request so fast. Even when it's a new user, (so before it has a chance to cache the image since it hasn't been generated yet), it finishes in under a second.
How could I possibly speed up the image generation process? Right now I'm just using PHP, but I could definitely switch over to another language. I know a few others too and I'm willing to learn. Which language can provide the fastest web-image generator (it has to connect to a database first to grab the user avatar info)?
For server specs, how much RAM and all that fun stuff would be an okay amount? Right now I'm using an OVH cloud server (VPS Cloud 2) to test it and it's fine and all. But, if someone with experience with this could help, what might happen if I started getting a lot more traffic and there were people with 100+ image requests being made per client when they first log in (relationship system that shows their friend's avatar). I'll probably use Cloudflare and other caching tools to help so that most of them get cached for a maximum of 24 hours, but I can't completely rely on that.
tl;dr:
Two main questions:
What's the fastest way to generate avatars on the web (right now I'm using PHP)?
What are some good server specs for around 100+ daily unique clients (at minimum) using this server for generating these avatars?
Edit: Another question, which webserver could process more requests for this? Right now I'm using Apache for this server, but my other servers are using nginx for other API things (like logging users in, getting info, etc).
IMHO, language is not the bottleneck. PHP is fast enough for real-time small images processing. You just need right algorithm. Also, check out bytecode caching engines such as APC, or XCache, or even HHVM. They can significantly improve PHP performance.
I think, any VPS can do the job until you have >20 concurrent requests. The more clients use service at the same time the more RAM you need. You can easily determine your script memory needs and other performance info by using profiler, such as XHProf.
Nginx or Lighttpd in FastCGI mode use less RAM than Apache http server and they can handle more concurrent connections. But is's not important until you have many concurrent connections.
Yes, PHP is can do this job fast and flexible(example generate.php?size=32)
I know only German webspaces, but they have also an English interface. www.nitrado.net
I'm taking into consideration now, as I have few more steps upon completion on my portal website, the hosting of images.
Most of my 1-2 first years, the images + http daemon (nginx) + mysql database will be hosted on 1 VPS. But after that, while traffic increases, I will need to move to other solution, including scaling (mysql as well as balancing nginx).
My 1st thought which i'm implementing right now in the website is add a variable like $global_server_pictures_address in front of the "/folder/1/123.jpg", where this is one of the images uploaded, which will change from $global_server_pictures_address = ""; to
$global_server_pictures_address = "http://195.22.31.14".
This means (nginx) will be balanced with few more VPS'es which will server local content, and for each nginx VPS, when it's a query for an image, it will load from $global_server_pictures_address.
Another ideea that came to me would be, in the case of multiple VPS serving the website (nginx balanced), each time a user uploads an image, he would do it via curl php function (FTP_UPLOAD), on each server I have, this way reducing some bandwidth stress on the main 50Mbps VPS image server, now if we have say 3 VPS with each 50Mbps, and all holding the images, each with same stuff, balancing would be not good for nginx but also for bandwidth.
In this case, my $global_server_pictures_address will go away, we don't need it anymore.
I'm waiting for some other ideeas (if you have any) and also comments on my ideeas, what do you think of them.
You could also use AWS S3 to to store the images, that way all of your front-end server would have access to them, there would be a cost associated for image storage and bandwidth.
You also have CloudFront (AWS CDN) if you wanted better performance.
http://aws.amazon.com/s3/
I have a simple CRM system that allows sales to put in customer info and upload appropriate files to create a project.
The system is already being hosted in the cloud. But the office internet upload speed is horrendous. One file may take up to 15 minutes or more to finish, causing a bottleneck in the sales process.
Upgrading our office internet is not an option; what other good solutions are out there?
I propose splitting the project submission form into 2 parts. Project info fields are posted directly to our cloud server webapp and stored in the appropriate DB table, the file submission will actually be submitted to a LAN server with a simple DB and api that will allow the cloud-hosted server webapp to communicate with to retrieve the file if ever needed again via a download link. Details need to be worked out for this set-up. But this is what I want to do in general.
Is this a good approach to solving this slow upload problem? I've never done this before, so are there also any obstacles to this implementation (cross-domain restrictions is something that comes into mind, but I believe that can be fixed with using an iFrame)?
If bandwidth is the bottleneck, then you need a solution that doesn't chew up all your bandwidth. You mentioned that you can't upgrade your bandwidth - what about putting in a second connection?
If not, the files need to stay on the LAN a little longer. It sounds like your plan would be to keep the files on the LAN forever, but you can store them locally initially and then push them later.
When you do copy the files out to the cloud, be sure to compress them and also setup rate limiting (so they take up maybe 10% of your available bandwidth during business hours).
Also put some monitoring in place to make sure the files are being sent in a timely manner.
I hope nobody needs to download those files! :(
I have a file uploading site which is currently resting on a single server i.e using the same server for users to upload the files to and the same server for content delivery.
What I want to implement is a CDN (content delivery network). I would like to buy a server farm and somehow if i were to have a mechanism to have files spread out across the different servers, that would balance my load a whole lot better.
However, I have a few questions regarding this:
Assuming my server farm consists of 10 servers for content delivery,
Since at the user end, the script to upload files will be one location only, i.e <form action=upload.php>, It has to reside on a single server, correct? How can I duplicate the script across multiple servers and direct the user's file upload data to the server with the least load?
How should I determine which files to be sent to which server? During the upload process, should I randomize all files to go to random servers? If the user sends 10 files should i send them to a random server? Is there a mechanism to send them to the server with the least load? Is there any other algorithm which can help determine which server the files need to be sent to?
How will the files be sent from the upload server to the CDN? Using FTP? Wouldn't that introduce additional overhead and need for error checking capability to check for FTP connection break, and to check if file was transferred successfully etc.?
Assuming you're using an Apache server, there is a module called mod_proxy_balancer. It handles all of the load-balancing work behind the scenes. The user will never know the difference -- except when their downloads and uploads are 10 times faster.
If you use this, you can have a complete copy on each server.
mod_proxy_balancer will handle this for you.
Each server can have its own sub-domain. You will have a database on your 'main' server, which matches up all of your download pages to the physical servers they are located on. Then a on-the-fly URL is passed based on some hash encryption algorithm, which prevents using a hard link to the download and increases your page hits. It could be a mix of personal and miscellaneous information, e.g., the users IP and the time of day. The download server then checks the hashes, and either accepts or denies the request.
If everything checks out, the download starts; your load is balanced; and the users don't have to worry about any of this behind the scenes stuff.
note: I have done Apache administration and web development. I have never managed a large CDN, so this is based on what I have seen in other sites and other knowledge. Anyone who has something to add here, or corrections to make, please do.
Update
There are also companies that manage it for you. A simple Google search will get you a list.
I am designing a file download network.
The ultimate goal is to have an API that lets you directly upload a file to a storage server (no gateway or something). The file is then stored and referenced in a database.
When the file is requsted a server that currently holds the file is selected from the database and a http redirect is done (or an API gives the currently valid direct URL).
Background jobs take care of desired replication of the file for durability/scaling purposes.
Background jobs also move files around to ensure even workload on the servers regarding disk and bandwidth usage.
There is no Raid or something at any point. Every drive ist just hung into the server as JBOD. All the replication is at application level. If one server breaks down it is just marked as broken in the database and the background jobs take care of replication from healthy sources until the desired redundancy is reached again.
The system also needs accurate stats for monitoring / balancing and maby later billing.
So I thought about the following setup.
The environment is a classic Ubuntu, Apache2, PHP, MySql LAMP stack.
An url that hits the currently storage server is generated by the API (thats no problem far. Just a classic PHP website and MySQL Database)
Now it gets interesting...
The Storage server runs Apache2 and a PHP script catches the request. URL parameters (secure token hash) are validated. IP, Timestamp and filename are validated so the request is authorized. (No database connection required, just a PHP script that knows a secret token).
The PHP script sets the file hader to use apache2 mod_xsendfile
Apache delivers the file passed by mod_xsendfile and is configured to have the access log piped to another PHP script
Apache runs mod_logio and an access log is in Combined I/O log format but additionally estended with the %D variable (The time taken to serve the request, in microseconds.) to calculate the transfer speed spot bottlenecks int he network and stuff.
The piped access log then goes to a PHP script that parses the url (first folder is a "bucked" just as google storage or amazon s3 that is assigned one client. So the client is known) counts input/output traffic and increases database fields. For performance reasons i thought about having daily fields, and updating them like traffic = traffic+X and if no row has been updated create it.
I have to mention that the server will be low budget servers with massive strage.
The can have a close look at the intended setup in this thread on serverfault.
The key data is that the systems will have Gigabit throughput (maxed out 24/7) and the fiel requests will be rather large (so no images or loads of small files that produce high load by lots of log lines and requests). Maby on average 500MB or something!
The currently planned setup runs on a cheap consumer mainboard (asus), 2 GB DDR3 RAM and a AMD Athlon II X2 220, 2x 2.80GHz tray cpu.
Of course download managers and range requests will be an issue, but I think the average size of an access will be around at least 50 megs or so.
So my questions are:
Do I have any sever bottleneck in this flow? Can you spot any problems?
Am I right in assuming that mysql_affected_rows() can be directly read from the last request and does not do another request to the mysql server?
Do you think the system with the specs given above can handle this? If not, how could I improve? I think the first bottleneck would be the CPU wouldnt it?
What do you think about it? Do you have any suggestions for improvement? Maby something completely different? I thought about using Lighttpd and the mod_secdownload module. Unfortunately it cant check IP adress and I am not so flexible. It would have the advantage that the download validation would not need a php process to fire. But as it only runs short and doesnt read and output the data itself i think this is ok. Do you? I once did download using lighttpd on old throwaway pcs and the performance was awesome. I also thought about using nginx, but I have no experience with that. But
What do you think ab out the piped logging to a script that directly updates the database? Should I rather write requests to a job queue and update them in the database in a 2nd process that can handle delays? Or not do it at all but parse the log files at night? My thought that i would like to have it as real time as possible and dont have accumulated data somehwere else than in the central database. I also don't want to keep track on jobs running on all the servers. This could be a mess to maintain. There should be a simple unit test that generates a secured link, downlads it and checks whether everything worked and the logging has taken place.
Any further suggestions? I am happy for any input you may have!
I am also planning to open soure all of this. I just think there needs to be an open source alternative to the expensive storage services as amazon s3 that is oriented on file downloads.
I really searched a lot but didnt find anything like this out there that. Of course I would re use an existing solution. Preferrably open source. Do you know of anything like that?
MogileFS, http://code.google.com/p/mogilefs/ -- this is almost exactly thing, that you want.