Load Balancing , Purely Redirect - php

I was reading up on Load balancing today. From what i know, Rackspace Load Balancing would handle all data. The question i have with me is, why cant the load balancer just redirect connections? Instead , it is handling all the connection, it feels like a bottleneck because One is trying to scale but at the same time it has to go through the load balancer, all the data.
I read that , the session would be lost. Is there anyway for it to just redirect only? Different server has different IP but public has access to one only ... hmmm , I know that session is store in server... Not DB.
So It just has to be that all data just has to go through Load balancer? It's like paying for Outgoing Charges for Server + Load Balancing. Data sort of doubld in size.

Your question is a theoretical one, so it doesn't really fits SO
But to answer your question, the Load Balancer has to have the user reach the same server all the time otherwise the session information may not be maintained. Think of it this way, your PHP application on Server A issues a session, the same PHP application on Server B doesn't know about this session unless you are syncing the session information between the two servers.
If you are just offering static information, that you aren't looking for a Load Balancer, but rather a CDN (Content delivery network), and CDN doesn't need to do anything like what you described, it can redirect you to any available server.

Related

Using Laravel behind a load balancer

I have been working on a Laravel 4 site for awhile now and the company just put it behind a load balancer. Now when I try to login it basically just refreshes the page. I tried using fideloper's proxy package at https://github.com/fideloper/proxy but see no change. I even opened it up to allow all IP addresses by doing proxies => '*'. I need some help with knowing what needs to be done to get Laravel to work behind a load balancer, especially with sessions. Please note that I am using the database Laravel session driver.
The load balancer is a KEMP LM-3600.
Thank you to everyone for the useful information you provided. After further testing I found that the reason this wasn't working is because we are forcing https through the load balancer, but allowing http when not going through the load balancer. The login form was actually posting to http instead of https. This allowed the form to post but the session data never made it back to the client. Changing the form to post to https fixed this issue.
We use a load balancer where I work and I ran into similiar problems with accessing cPanel dashboards where the page would just reload every time I tried accessing a section and log me off as my IP address was changing to them. The solution was to find which port cPanel was using and configure the load balancer to bind that port to one WAN. Sorry, I am not familiar with laravel and if it just using port 80 then this might not be a solution.
Note that the session handling in Laravel 4 uses Symfony 2 code, which lacks proper session locking in all self-coded handlers that do not use the PHP provided session save handlers like "files", "memcached" etc.
This will create errors when used in a web application with parallel requests like Ajax, but this should occur unrelated to any load balancer.
You really should do some more investigation. HTTP load balancers do have some impact on the information flow, but the only effect on a PHP application would be that a single user surfing the site will randomly send the requests to any one of the connected servers, and not always to the same.
Do you also use any fancy database setup, like master-slave replication? This would affect sessions more likely, if the writing is only done on the master, the reading is done only on a slave, and this slave is behind the master with updating the last write operation. Such a configuration is not recommended as a session storage. I'd rather use Memcached instead. The PHP session save handler does implement proper locking as well...
Using fideloper's proxy does not make sense. A load balancer should be transparent to the web server, i.e. it should not act as a reverse proxy unless configured to do so.
Use a shared resource to store the session data. File and memcached will surely not work. DB should be OK. That's what I'm using on a load balanced setup with a common database.
I have been using TrustedProxy for a while now and its working fine.
the main issue with load balancers is proxy routing. the next is from the readme file and its what I was looking for.
If your site sits behind a load balancer, gateway cache or other
"reverse proxy", each web request has the potential to appear to
always come from that proxy, rather than the client actually making
requests on your site.
To fix that, this package allows you to take advantage of Symfony's
knowledge of proxies. See below for more explanation on the topic of
"trusted proxies".

Load Balancing - How to set it up correctly?

Here it gets a little complicated. I'm in the last few months to finish a larger Webbased Project, and since I'm trying to keep the budget low (and learn some stuff myself) I'm not touching an Issue that I never touched before: load balancing with NGINX, and scalability for the future.
The setup is the following:
1 Web server
1 Database server
1 File server (also used to store backups)
Using PHP 5.4< over fastCGI
Now, all those servers should be 'scalable' - in the sense that I can add a new File Server, if the Free Disk Space is getting low, or a new Web Server if I need to handle more requests than expected.
Another thing is: I would like to do everything over one domain, so that the access to differend backend servers isnt really noticed in the frontend (some backend servers are basically called via subdomain - for example: the fileserver, over 'http://file.myserver.com/...' where a load balancing only between the file servers happens)
Do I need an additional, separate Server for load balancing? Or can I just use one of the web servers? If yes:
How much power (CPU / RAM) do I require for such a load-balancing server? Does it have to be the same like the webserver, or is it enough to have a 'lighter' server for that?
Does the 'load balancing' server have to be scalable too? Will I need more than one if there are too many requests?
How exactly does the whole load balancing work anyway? What I mean:
I've seen many entries stating, that there are some problems like session handling / synchronisation on load balanced systems. I could find 2 Solutions that maybe would fit my needs: Either the user is always directed to the same machine, or the data is stored inside a databse. But with the second, I basically would have to rebuild parts of the $_SESSION functionality PHP already has, right? (How do I know what user gets wich session, are cookies really enough?)
What problems do I have to expect, except the unsynchronized sessions?
Write scalable code - that's a sentence I read a lot. But in terms of PHP, for example, what does it really mean? Usually, the whole calculations for one user happens on one server only (the one where NGINX redirected the user at) - so how can PHP itself be scalable, since it's not actually redirected by NGINX?
Are different 'load balancing' pools possible? What I mean is, that all fileservers are in a 'pool' and all web servers are in a 'pool' and basically, if you request an image on a fileserver that has too much to do, it redirects to a less busy fileserver
SSL - I'll only need one certificate for the balance loading server, right? Since the data always goes back over the load balancing server - or how exactly does that work?
I know it's a huge question - basically, I'm really just searching for some advices / and a bit of a helping hand, I'm a bit lost in the whole thing. I can read snippets that partially answer the above questions, but really 'doing' it is completly another thing. So I already know that there wont be a clear, definitive answer, but maybe some experiences.
The end target is to be easily scalable in the future, and already plan for it ahead (and even buy stuff like the load balancer server) in time.
You can use one of web servers for load balacing. But it'll be more reliable to set the balacing on a separate machine. If your web servers responds not very quickly and you're getting many requests then load balancer will set the requests in the queue. For the big queue you need a sufficient amount of RAM.
You don't generally need to scale a load balancer.
Alternatively, you can create two or more A (address) records for your domain, each pointing to different web server's address. It'll give you a 'DNS load-balancing' without a balancing server. Consider this option.

Server Load Balancer Algorithm

What all I know about a load balancer is that:-
When we have high traffic on our site, or we are using multiple servers, then a load balancer is established in front of all the servers. and any http request directly hits the load balancer and from there it reaches the respective server, depending the server loads.
Q1 Can Someone exactly explain the Algorithm that the load balancer uses to balance the load in multiple servers.
Q2 Can we create our own load balancer or do we have to take it from some one else like Barracuda networks or others.
1) there are various ways to do this (round-robin, least-connection, weighted, ...)
a good overview: http://www.centos.org/docs/5/html/Virtual_Server_Administration/s2-lvs-sched-VSA.html
2) "create our own" - you probably don't want to reinvent the wheel, there are lots of existing products around, both commercial and open-source/freeware. some of them are specialized on http requests, others support all sorts of protocols.
Q1: The most simple algoritm is round-robin. It just goes trough every existing sever and takes the next one for the next request.
Q2: Of course you can create your own one, or you install one of the available open-source o commercial products on one of you servers.

Caching, CDN - are they the same for PHP Yii site? How to use it for a dynamic php site?

I have a dynamic php (Yii framework based) site. User has to login to do anything on the site. I am trying to understand how caching and CDN work; and I am a bit confused.
Caching (memcache):
My site has a good amount of css, js, and images. I've been given to understand that enabling caching ("memcache"?) will GREATLY speed up my site. But this has me confused. How does caching help? I mean, how can you cache something that's coming out of DB for each user separately? For instance, user-1 logs-in, he sees his control panel. User-2 logs-in, user 2 will see their control panel.
How do I determine what to cache? Plus, how do I enable caching (memcaching)?
CDN:
I have been told to use a content delivery network like CloudFlare. It is suppose to automatically cache my site. So, when my user-1 logs in, what will it cache? Will it cache only the homepage CSS, JS, and homepage images? Because everything else requires login? What happens when user logs-out? I mean, do "sessions" interfere with working of a CDN?
Does serving up images via CDN reduce significant load on my server? I don't have much cash for getting a clustered-server configuration. So, I just want my (shared) server to be able to devote all its resources in processing PHP code. So, how much load can I save by using "caching" (something like memcache) and/or "CDN" (something like CloudFlare)?
Finally,
What would be general strategy to implement in this scenario for caching, cdn, and basic performance optimization? do I need to make any changes to my php-code to enable CDN like CloudFlare and to enable/implement/configure caching? What can I do that would take least amount of developer/coding time and will make my site run much much faster?
Oh wait, some of my pages like "about us" page etc. are going to be static html too. But they won't get as many hits. Except for maybe the iFrame page that will be used for my Facebook Page.
I actually work for CloudFlare & thought I would hop in to address some of the concerns.
"do I need to make any changes to my php-code to enable CDN like
CloudFlare and to enable/implement/configure caching? What can I do
that would take least amount of developer/coding time and will make my
site run much much faster?"
No, nothing like a need to re-write urls, etc. We automatically cache static content by file extension. This does require changing your DNS to point to us, however.
Does serving up images via CDN reduce significant load on my server?
Yes, and it should also help most visitors access the site faster and save you a fair amount on bandwidth.
"Oh wait, some of my pages like "about us" page etc. are going to be
static html too."
CloudFlare doesn't cache HTML by default. You use PageRules to setup more advanced caching options for things like static HTML.
Caching helps because instead of performing disk io for each user the data is stored in the memory, ie memcached. This provides a SIGNIFICANT increase in performance.
Memcache is generally used for cacheing data ie query results.
http://pureform.wordpress.com/2008/05/21/using-memcache-with-mysql-and-php/
There are lots of tutorials.
I have only ever used amazon s3 which is is not quite a cdn. It is more of a storage platform but still it helps to take the load off of my own servers when serving media.
I would put all of your static resources on a CDN so your own server would not have to serve these. It would not require any modifcation to your php code. This includes JS, and CSS.
For your static pages (your about page) I'd make sure that php isn't processing that since there is no reason for it. Your web server should serve it directly.
Cacheing will require changes to your code. For cacheing a normal flow is:
1) user makes a request
2) check if data is in cache
3) if it is not in cache do the DB query and put it in cache
4) if it is in cache retrieve it
5) return data.
You can cache anything that requires disk io and you should see a speed up.
Memcached works by storing database information (usually from a remote server or even a database engine on the same server) in a flat file format in the filesystem of the web server. Accessing a flat file directly to retrieve data stored in a regulated format is much much muuuuuch faster than accessing that data from a remote query each time. This is typically useful when you have data that can be safely stored for certain periods of time as it is not subject to regular changes.
The way this works is that if you want to store a user's account information in a cache to speed up loading pages where that user is logged in. You would load the information and cache it locally. On any subsequent requests for that data, it will load in a fraction of the time it normally would take to load that information from the database itself. Obviously you will need to make sure that you update/recache that information if the user changes it while logged in, but you will greatly reduce the time it takes to serve up pages if you implement a caching system that can minimize the time spent waiting on the database.
I'm personally not familiar with CloudFlare so I can't offer any advice to that effect, but in terms of implementing caching in your application, you should check out:
http://code.google.com/p/memcached/wiki/NewOverview
And read the rest of the Wiki entries there which cover installation/implementation/etc. That should get you started on the right track.

Load balancing and APC

I am interested in a scenario where webservers serving a PHP application is set up with a load balancer.
There will be multiple webservers with APC behind the load balancer. All requests will have to go through the load balancer, which then sends it to one of the web servers to process.
I understand that memcached should be used for distributed caching, but I think having the APC cache on each machine cache things like application configurations and other objects that will NOT be different across any of the servers would yield even better performance.
There is also an administrator area for this application. It is also accessed via the load balancer (for example, site.com/admin). In a case like this, how can I call apc_clear_cache to clear the APC object cache on ALL servers?
Externally in your network you have a public IP you use to route all your requests to your load balancer that distributes load round robin so outside you cannot make a request to clear your cache on each server one at a time because you don't know which one is being used at any given time. However, within your network, each machine has its own internal IP and can be called directly. Knowing this you can do some funny/weird things that do work externally.
A solution I like is to be able to hit a single URL and get everything done such as http://www.mywebsite/clearcache.php or something like that. If you like that as well, read on. Remember you can have this authenticated if you like so your admin can hit this or however you protect it.
You could create logic where you can externally make one request to clear your cache on all servers. Whichever server receives the request to clear cache will have the same logic to talk to all servers to clear their cache. This sounds weird and a bit frankenstein but here goes the logic assuming we have 3 servers with IPs 10.232.12.1, 10.232.12.2, 10.232.12.3 internally:
1) All servers would have two files called "initiate_clear_cache.php" and "clear_cache.php" that would be the same copies for all servers.
2) "initiate_clear_cache.php" would do a file_get_contents for each machine in the network calling "clear_cache.php" which would include itself
for example:
file_get_contents('http://10.232.12.1/clear_cache.php');
file_get_contents('http://10.232.12.2/clear_cache.php');
file_get_contents('http://10.232.12.3/clear_cache.php');
3) The file called "clear_cache.php" is actually doing the cache clearing for its respective machine.
4) You only need to make a single request now such as http://www.mywebsite/initial_clear_cache.php and you are done.
Let me know if this works for you. I've done this in .NET and Node.js similar but haven't tried this in PHP yet but I'm sure the concept is the same. :)

Categories