I had reason this week to test our WISDOM DMF products operation in a Load Balanced SharePoint environment.
The outcome was very successful and surprising easy to achieve, especially given my previous level or Load Balancing – which was VERY limited, but there were a few thing I learnt that I thought the story was interesting enough to share…
Because I didn't have two physical machines that I felt like re-purposing for a bit of testing I created two fresh Windows 2003 Server Virtual Server machines, called mvwssloadbalanced1 and mvwssloadbalanced2 on our Virtual Server machine.
I configured two Virtual Networks on the servers which I believe is required to use the Unicast Cluster Operation Mode as opposed to Multicast which can be used with one Network Interface.
It really amounts to the same outcome, just different physical operation (which you can google if you've a mind) but I wanted to replicate a certain configuration. In reality I actually tested both Unicast and Multicast configuration with the same results.
Once I had Windows installed I installed SharePoint Services on one of the machines and created a new Farm. Then I extended a SharePoint web application onto the Default IIS web site (root) and created a root site collection on it. When extending the web application there is an option to set the load balanced URL which you need to set (in our case http://mvcluster.macroview.com.au) because by default it uses the machine URL – which I imagine will not work because there will not be any alternate access mapping (see the previous Blog entry on AAM's) created for the load balanced URL.
Next I installed WSS onto the second machine using the default settings – which was a mistake. The defaults do not install the Central Admin web site onto additional servers in the Farm. I fell for this one, because whenever I've come to the last screen of the SharePoint Products and Technologies Configuration Wizard which has the Advanced button on it I've ignored it (because it didn't contain anything interesting)! So, click the button and check the box that installs the central admin site on the additional servers in the farm.
This might not be an issue for some (perhaps your) situations, but for us, we have an admin operation that we have added to the Operations tab of Central Admin that needs access to the physical server. If you don't have Central Admin installed on all of the servers then you'll always end up physically on the first server in the farm.
In practice, adding the Central Admin Console to all the servers (or at least more than 1) is not a bad idea, because if you don't put it on other servers, then it would'nt be Load Balanced and if the primary server dies, you won't be able to get there! ![]()
So then I installed our application, which consists of Features, Pages, Web Services etc and is packaged in a SharePoint Solution. This was very cool, because the installation on the Farm deployment was exactly the same as single server:
stsadm -o addsolution -filename .
Then go to Manage Solutions in the Operations tab of Central Admin and click Deploy! Viola – solution deployed to all servers in the Farm with a single click.
Our client application connects to a custom web service in _vti_bin to communicate with SharePoint and so pointing it at the Load Balanced URL worked fine.
Shutting down one of the servers in the cluster from the NLB console resulted in the other server processing the requests as expected. Of course if the server was shutdown in the middle of a call (which you can actually avoid by using a shutdown option to drain all current requests), or just as you made the call some errors were seen, but once the general result was fine.
Note that this result is dependent to some degree on configuration in that it is possible to set Affinity on the cluster such that a client's requests are serviced by the same server each time and if that actual server goes down or is shut down using the NLB console, then future requests will fail. However, I believe that Affinity set to none is best for performance and fine when you don't need to manage state.
When I shut down a web site or application pool using the IIS console I did get intermittent connection and usage errors caused by failure to connect to the web service. Shutting down services and pools like this causes symptoms similar to a failure that might happen if the web application causes too many errors (say for example there is some faulty memory or something, but not catastrophic hardware failure) and is shut down by IIS. It should be noted that this behaviour is consistent with SharePoint and SharePoint Web Services in general, in that you get intermittent browser connection failures in this situation.
This is because NLB does not provide failover for applications or servers only for failure of the server itself - meaning it will not poll a machine to check that a particular service or facility is available and drop the box from the NLB cluster if it does not respond. Therefore requests are still sent to the stopped web site/application pool as they would normally unless a monitoring application is configured to remove servers from the cluster under certain circumstances. NLB provides the services required by monitoring applications to remove servers from the cluster remotely. For example a monitor could be set up to check that a certain web site or application pool responds and remove the server from the cluster if it does not. I think that Microsoft Operations Manager with the SharePoint MOM Pack might assist in this regard and I am about to find out…
So, I'm pretty happy with the ease with which SharePoint (and our application) was able to be configured by someone who has never configured a NLB SharePoint environment before.
For more details on the architecture, configurations and features in relation to NLB there is some good information about NLB on TechNet.