Building a Sun Cluster using Solaris 10, on VMware Server
One of the things I’ve done on my week off is get Sun Cluster working on VMware Server. There’s a few small tricks to it, but generally it hasn’t been as horrible as the first time I did it many months ago on VMware (a 4 node cluster, with panics galore – no fun).
NOTE: To run Sun Cluster on Solaris 10, you will need to be running your VMs as 64-bit guests (Sun Cluster, on Solaris 10, on x86/x64 DOES NOT run on Solaris 10 32-bit – found this out the hard way a long time ago!).
To do 64-bit VMs, only some CPUs are supported. In this case I am using a relatively new AMD 64 x2, and thats perfect for this purpose. In this build I’m using Solaris u3 is because u4 seems to have a few issues on VMware (lots of kernel panics on boot I’ve found), and Sun Cluster 3.1 8/05 (u4), even though Solaris Cluster 3.2 is out, Sun Cluster 3.1u4 is still what most things are certified against. I will build a 3.2 cluster at some point later on.
So, lets start with configuring VMware:
-Configure at least 2 additional host based networks, on Linux you will need to run
vmware-config.pl
I have configured several more (seeing as it is easy to do it all at once).
When it asks you about networking, you want to configure additional host based networks. The scheme I have used is:
172.16.0.0/255.255.255.0
172.16.1.0/255.255.255.0
.
.
172.16.11.0/255.255.255.0
NOTE: What subnets you specify here isn’t that relevant as your host will never talk on these networks, the host to host cluster interconnects will use the interfaces, and will likely use different subnets. Each subnet just needs to be different for each vmnet adapter. You need not use /24’s either, you could go down to a very small subnet size (/30 for example). If this makes no sense to you, do not worry too much, keep going anyway just following what I’ve done above.
Now, create 2 VMs – Typical, Solaris 10 64-bit (will not work if 64-bit is not selected), and in my case I preallocated 32GB of disk.
For each VM:
-Add at least 2 additional ethernet interfaces (e1000gXs) – these will be used for interconnects. Put each one on a different vmnet adapter. I used vmnet1 and vmnet2 in this case. Make sure when you do this, you do the interfaces identically on both VMs as they will need to talk to each other across these interfaces, but nothing else.
-Disable snapshots (for performance)
-Add a single disk in a different location for quorom, to one host only. I added quorom in /virtuals/cluster-disks/mail-store-quorom.vmdk. Quorom should be as small as possible, I believe the smallest disk you can build is ~100meg, so do that (0.1GB)
Boot each node for a tick, then shut it down straight away. This is to build the initial vmx config additional information that is not populated till you bring up a VM (ethernet address for e1000g0, is what we really want to be populated here).
-With each VM shut down, edit the .vmx file for each VM and add in the lines:
Configure shared quorom device:
scsi0:1.present = "TRUE" scsi0:1.fileName = "/virtuals/cluster-disks/mail-store-quorom.vmdk" scsi0.sharedBus = "virtual" disk.locking = "false"
Obviously there’s no need to re-add the filename line for the host you configured the disk on initially.
Next for each VM configure CPU (core) binding – if you don’t do this and you’re using a 64 bit AMD chip, you’ll get some interesting behaviour because the timestamping on each core of these CPUs is different, it messes with Solaris which expects to be on one CPU. Cluster will panic more often if you don’t do this 🙂
processor0.use = "TRUE" processor1.use = "FALSE"
And I do the reverse of the above for the other host’s vmx file:
processor0.use = "FALSE" processor1.use = "TRUE"
-Now kick off a Solaris build using the minimal profile from my last post, or a SUNWCXall (all packages) will do if you don’t mid the extra build time wait and you have disk space up your sleeve. You could of course just do a straight install off the CD, just make sure you use a custom partitioning scheme ensuring there is a 512MB /globaldevices slice of your disk.
-Install Sun Cluster from anything Java Enterprise System 2005Q4 or above (JES 5 at time of writing). Obtain it for free from http://www.sun.com/software/javaenterprisesystem/getit.jsp. All the JES’ from (and including) 2005Q4 have Sun Cluster 3.1u4 (that is 8/05). It’s worth noting Sun Cluster 3.2 is out and has been for a while, but I’m not sure how much stuff is certified against it. I will try it out later, for the moment I’ll go with 3.1u4.
Note: there is nearly no initial config when you install Sun Cluster from the JES installer. I noticed in the most recent release of JES (and possibly some previous releases, I’ve missed a few) it asks if you want to allow Sun Cluster to be configured remotely. For simplicity, answer yes. It makes the cluster config very easy from there.
-Add /usr/cluster/bin to your path for convenience on both hosts
-run scinstall on ONE host (/usr/cluster/bin/scinstall if you did not follow the above step).
You’ll get a menu… with the first item where we want to be.
* 1) Install a cluster or cluster node
So select 1, then..
1) Install all nodes of a new cluster
1 again, then yes to continue
Please select from one of the following options:
1) Typical
2) Custom
Select 1
Then select a cluster name, in this case I’ve gone with mail-store-clus as this is to become a cluster of Sun Messaging Server 6.3 Mail Stores
Next you are asked for other nodes in the cluster, in this case the only other node for me is mail-store1, so I type that in
Node name (Control-D to finish): mail-store1 Node name (Control-D to finish):
This is the complete list of nodes:
mail-store0 mail-store1
Is it correct (yes/no) [yes]?
and then ctrl-D, then yes it is correct
Attempting to contact "mail-store1" ... done
Searching for a remote install method ... done
The Sun Cluster framework software is already installed on each of the new nodes of this cluster. And, it is able to complete the configuration process without remote shell access.
Looking good so far! Enter to continue.
Select the first cluster transport adapter for "mail-store0":
1) e1000g1
2) e1000g2
3) Other
Go with 1, then the next transport adapter, 2. NOTE: If you have plumb’d these devices, they will not work. These cards need to be unplumb’d in that case.
Searching for any unexpected network traffic on "e1000g1" ... done
Verification completed. No traffic was detected over a 10 second
sample period.
Next up, quorom. This is why we setup the shared disk earlier:
Do you want to disable automatic quorum device selection (yes/no) [no]?
(go with the default, no)
Is it okay to begin the installation (yes/no) [yes]?
yes, it sure is!
During the installation process, sccheck(1M) is run on each of the new cluster nodes. If sccheck(1M) detects problems, you can either interrupt the installation process or check the log files after installation has completed. Interrupt the installation for sccheck errors (yes/no) [no]?
default is fine, no
and off we go:
Installation and Configuration Log file - /var/cluster/logs/install/scinstall.log.630 Testing for "/globaldevices" on "mail-store0" ... done Testing for "/globaldevices" on "mail-store1" ... done Starting discovery of the cluster transport configuration. The following connections were discovered: mail-store0:e1000g1 switch1 mail-store1:e1000g1 mail-store0:e1000g2 switch2 mail-store1:e1000g2 Completed discovery of the cluster transport configuration. Started sccheck on "mail-store0". Started sccheck on "mail-store1".
sccheck completed with no errors or warnings for "mail-store0". sccheck completed with no errors or warnings for "mail-store1". Configuring "mail-store1" ... done Rebooting "mail-store1" ...
And the second node reboots, then the first
Rebooting "mail-store1" ... done Configuring "mail-store0" ... done Rebooting "mail-store0" ... Log file - /var/cluster/logs/install/scinstall.log.630 Rebooting ... updating /platform/i86pc/boot_archive...this may take a minute Connection to mail-store0 closed by remote host. Connection to mail-store0 closed.
Let the first node boot, and you’ll see a bunch of stuff on the console. Don’t stress, it’s (probably) normal. It is normal to see a few errors at first boot.
Let the cluster sort it’s stuff out (give it a couple of minutes) then run scstat to check the status of the cluster. It should look something like:
-bash-3.00$ scstat ------------------------------------------------------------------ -- Cluster Nodes -- Node name Status --------- ------ Cluster node: mail-store1 Online Cluster node: mail-store0 Online ------------------------------------------------------------------ -- Cluster Transport Paths -- Endpoint Endpoint Status -------- -------- ------ Transport path: mail-store1:e1000g2 mail-store0:e1000g2 Path online Transport path: mail-store1:e1000g1 mail-store0:e1000g1 Path online ------------------------------------------------------------------ -- Quorum Summary -- Quorum votes possible: 3 Quorum votes needed: 2 Quorum votes present: 3 -- Quorum Votes by Node -- Node Name Present Possible Status --------- ------- -------- ------ Node votes: mail-store1 1 1 Online Node votes: mail-store0 1 1 Online -- Quorum Votes by Device -- Device Name Present Possible Status ----------- ------- -------- ------ Device votes: /dev/did/rdsk/d2s2 1 1 Online ------------------------------------------------------------------ -- Device Group Servers -- Device Group Primary Secondary ------------ ------- --------- -- Device Group Status -- Device Group Status ------------ ------ -- Multi-owner Device Groups -- Device Group Online Status ------------ ------------- ------------------------------------------------------------------ ------------------------------------------------------------------ -- IPMP Groups -- Node Name Group Status Adapter Status --------- ----- ------ ------- ------ IPMP Group: mail-store1 sc_ipmp0 Online e1000g0 Online IPMP Group: mail-store0 sc_ipmp0 Online e1000g0 Online ------------------------------------------------------------------
And we have a basic, working cluster!
Discovered Problems
Interconnect (“Cluster Transport”) is marked faulted
For example, if you do an scstat, or an scstat -W you see:
Transport path: mail-store1:e1000g2 mail-store0:e1000g2 faulted
Transport path: mail-store1:e1000g1 mail-store0:e1000g1 Path online
(at boot it might be “waiting” for quite some time)
In some cases you can disconnect and reconnect the adapter in VMware. However, in others you may have to be more drastic.
Check you can ping the other node via this path – if you can, then you should be all good to run the following commands:
scconf -c -m endpoint=mail-store0:e1000g2,state=disabled
where mail-store0 is your current node, and e1000g2 is the failed adapter. After you’ve done this, you can re-enable it:
scconf -c -m endpoint=mail-store0:e1000g2,state=enabled
And you should now have an online path shortly afterwards:
bash-3.00# scstat -W
-- Cluster Transport Paths --
Endpoint Endpoint Status
-------- -------- ------
Transport path: mail-store1:e1000g2 mail-store0:e1000g2 Path online
Transport path: mail-store1:e1000g1 mail-store0:e1000g1 Path online
All good!
Cluster Panics with pm_tick delay [number] exceeds [another number]
Try the following:
- Stop VMs being paged to disk in VMWare (only use physical memory for your VMs). This is a VMWare server, host setting from memory
- Ensure Memory Trimming is disabled for your VMware Server Sun Cluster Guests
- On each Cluster node, in order, configure the heartbeats to be father apart, and have a longer timeout:
scconf -c -w heartbeat_timeout=60000 scconf -c -w heartbeat_quantum=10000
Hopefully this will leave you with a much more stable cluster on VMware.
November 27th, 2007 at 3:25 am
It’s I again, Geoff. Could I also add this blog to the Sun Cluster Wiki? If not add it, could I link to it from the Sun Cluster Wiki?
November 29th, 2007 at 9:39 pm
Absolutely! I’ve shot you an email regarding this.
February 3rd, 2008 at 11:17 am
Hello.
Great article. What version of VMWare was used for this? I am using VMWare Server B2 and I’m having a few issues with the /globaldevices filesystem being mounted properly on both nodes….
Thanks!
-BW
February 19th, 2008 at 8:47 pm
Hi Barth.
Sorry for the late reply.
I’m using VMWare 1.0.4, mostly because the console works properly over X11 (to my Mac), but I couldn’t get the VMWare Server 2 Beta to work, even using firefox over X11 from the Linux host, to my Mac. So I went back.
Nonetheless, whats the trouble you’re having? I did have some trouble with global devices mounting at boot, when I built a cluster earlier on. More info from the console or /var/adm/messages would be a start and maybe we can figure out whats going wrong.
Is it having trouble just at boot? If so, can you mount it manually directly afterwards?
April 17th, 2008 at 8:09 am
Geoff, This is a great write up. I seem to be having a number of problems with the quorum device I believe.
The nodes run for about 30 minutes and finally panic and die an awful death. Not sure what Im doing wrong in the setup of the quorum device. Any help would be great, and I realize Ive not provided much info.
Running vmware server 1.04, Solaris 10 U4. Sun Cluster 3.2 Booting to IDE VM disk. Quorum device is a VMware SCSI device on lsilogic bus. Device is on both hosts…but something is not quite right. What outputs can I send you?
April 17th, 2008 at 1:13 pm
Hi Jeff,
Thanks for your kind feedback. To help with your problem, lets start here:
Ensure Crash Dumps are Enabled following instructions to enable at:
http://docs.sun.com/app/docs/doc/801-7039/6i1cgngff?a=view
When the boxes panic, it will write out 2 files into /var/crash/[hostname].
Get yourself a copy of Solaris Crashdump Analysis Tool (SUNWscat) http://wwws.sun.com/software/download/products/3fce7df0.html
Run /opt/SUNWscat/bin/scat /var/crash/[hostname]/vmcore.[whatever dump number]
Type in analyze when you get a scat prompt like:
SolarisCAT(/var/tmp/vmcore.0)> analyze
and send me the output of that to start, we may wish to do some more work using SUNWscat later.
Also provide me a copy of your vmx config files for each node.
And last but not least, a /var/adm/messages from both hosts might give us some info too.
The other thing is I haven’t tested this config with Sun Cluster 3.2 🙂 Maybe we’ve run into some other grief I have not seen as yet. There is certainly some interesting behaviour with cluster in VMWare.
Email these to geoff at unixsysadmin.net and I’ll see what I can find.
Alternatively, I can provide some ftp space so you can upload the crashdump(s) to me and I can look at them for you.
April 29th, 2008 at 10:52 pm
I found this stuff very good. I am going to do it.
May 29th, 2008 at 4:29 am
Great posting…. But I am having a problem when I try to configure an nfs resource. here is what I get, any ideas?:
***Error***
nfs1b – Failed to analyze the device special file associated with file system mountpoint /nfs1: No such file or directory.
(C189917) VALIDATE on resource nfs-stor, resource group nfs-rg, exited with non-zero exit status.
(C720144) Validation of resource nfs-stor in resource group nfs-rg on node nfs1b failed.
June 18th, 2008 at 5:20 am
I got a laptop with windows vista AMD 64 bit dual core processor. Now If I use VMware to install two solaris 10 guest node for Sun Cluster will it work?
Please give me an advice.
June 18th, 2008 at 10:59 am
Mohammad – If you follow the instructions, I hope it will work for you 🙂 The AMD Processor will need to be of a certain revision for it to work with 64 bit guests.
If it is a relatively new processor, you should be fine.
Let me know if you have any troubles.
August 13th, 2008 at 3:59 pm
How do you configure the additional interface for the VM?
I installed the solaris guest VM, but can only “see” one interface e1000g0.
Should I select “Host-only” or “Specific virtual network” for the VM Network Connection settings?
Appreciate your advice. Thanks.
August 14th, 2008 at 6:43 pm
You should have at least 3 NICs. If you have configured 3 NICs in VMWare for your guest, but can’t see them in the host, they’re probably not plumbed.
The easy way to plumb them all is to type in (in Solaris):
ifconfig -a plumb
However, from memory, cluster would prefer them not to be plumbed, it will configure the interfaces in the installation of the cluster software.
You will need at least 2 NICs setup for the Virtual Network between hosts as interconnects (which is one of the first things mentioned you need to configure in the documentation above).
You should additionally configure a NIC for the public network.
The 2 NICs on each host that connect to 2 NICs on the other host should not all be on the same network, they will need to be NIC 1 -> NIC 1, NIC 2 -> NIC 2 between the hosts. They are used for heartbeats, global file systems and other magic.
August 19th, 2008 at 5:05 pm
I realized that the problem was because there no vmnet assigned to the ethernet interface. It is working now, thanks for your help and advice.
August 21st, 2008 at 3:30 am
Hi,
I’m using the VMWare Server 2.0 RC1, and my node Solaris 10 release 11/06 with pacthes of DVD EIS at 24/Jun/2008.
When the node 1 start, it display messages,
Aug 20 13:29:20 zeus genunix: [ID 313806 kern.notice] NOTICE: pm_tick delay of 5149 ms exceeds 2147 ms
Aug 20 13:29:21 zeus last message repeated 1 time
Aug 20 13:29:21 zeus genunix: [ID 313806 kern.notice] NOTICE: pm_tick delay of 5150 ms exceeds 2147 ms
Aug 20 13:29:21 zeus genunix: [ID 313806 kern.notice] NOTICE: pm_tick delay of 5149 ms exceeds 2147 ms
And very, very slow.
Please, are You know resolve this question?
September 18th, 2008 at 2:59 am
Thanks.
I have been suffer the pm_tick delay problem too for several years.
Today I proved your solution and it seems work fine.
Eternal gratitude.