Installing Java Enterprise System 5 on Solaris Express or Open Solaris

April 5th, 2008

This has annoyed me before but I never bothered figuring out how to fix the error:

PSPERR:++++++++++++++++++++++++++  {Missing Resource Exception in the loading of the
PkgRelativePaths{Can't find bundle for base name
com.sun.entsys.installer.common.resources.PkgRelativePaths5_11, locale en_AU}}

And hence the failure of installation of JES software on Solaris Express Nevada Builds. I’ve spent some time this evening figuring it out (thanks to truss + find) because tonight I *really* wanted to install JES on Solaris Express b81 (with Crossbow Integrated Bits) and I’ve found a way that sorts the issue. You need not rename directories or do anything else but the instructions that follow.

NOTE: Sun probably don’t want you to do this and it is unsupported, do so at your own peril.

Unzip a fresh copy of JES.
Run the installer (I’ve used text mode):

Unable to access a usable display on the remote system. Continue in command-line mode?(Y/N)
  Y

Once you say “Y”, I’ve found it goes off creates a directory called /tmp/.entsys_CaChE

Inside this directory you will find the following seemingly relevant files:

./Solaris_x86/.install/config/PPXMLS/Clusters/EntsysCluster_SUNOS_SPARC_5_10.xml
./Solaris_x86/.install/config/PPXMLS/Clusters/EntsysCluster_SUNOS_X86_5_10.xml
./Solaris_x86/.install/config/PPXMLS/Clusters/OrionCluster_SUNOS_SPARC_5_10.xml
./Solaris_x86/.install/config/PPXMLS/Clusters/OrionCluster_SUNOS_X86_5_10.xml
./Solaris_x86/.install/config/com/sun/entsys/installer/common/resources/PkgRelativePaths5_10.properties

In another terminal window, copy the relevant files for your platform to the same file but with 5_11 instead of 5_10

cd /tmp/.entsys_CaChE
cp ./Solaris_x86/.install/config/PPXMLS/Clusters/EntsysCluster_SUNOS_X86_5_10.xml \
./Solaris_x86/.install/config/PPXMLS/Clusters/EntsysCluster_SUNOS_X86_5_11.xml
cp ./Solaris_x86/.install/config/PPXMLS/Clusters/OrionCluster_SUNOS_X86_5_10.xml \
./Solaris_x86/.install/config/PPXMLS/Clusters/OrionCluster_SUNOS_X86_5_11.xml
 cp ./Solaris_x86/.install/config/com/sun/entsys/installer/common/resources/PkgRelativePaths5_10.properties \
./Solaris_x86/.install/config/com/sun/entsys/installer/common/resources/PkgRelativePaths5_11.properties

Then continue the installer as normal.

Then you should get to here:

1. Install
2. Start Over
3. Exit Installation
What would you like to do [1] {”<”!” exits}?

And if you continue now things are happy:

Java Enterprise System 5
-1%————–25%—————–50%—————–75%————–100%
Installation Complete
Software installation has completed successfully. You can view the installation
summary and log by using the choices below. Summary and log files are available
in /var/sadm/install/logs/.

Sorted!

Random bits about Solaris Express on the ASUS eeePC

April 2nd, 2008

Out of the box, the eeePC’s got a lot of stuff packed onto that 4GB disk, a custom Xandros (Linux) install, with a bunch of applications - and it works well for the most part. But I’m really not that into Linux (personal bias) and I wanted to try and get a working, useable Solaris install on the device.

Given how far ahead Solaris Express is on the Desktop over Solaris 10u4, I was definitely going to be using an OpenSolaris derived build. I tried a couple of options, including SXCE b82, the indiana preview 2 and nexenta 1.0 (I also tried eeeXubuntu, but it wasn’t for me, I wanted Solaris on this puppy not linux).

Note: You will need to upgrade the memory in your eeePC to reliably install and run a newer release of Solaris Express (b72 was fine, but when I tried 79 and above, the installer was unreliable. I have upgraded to 2GB of memory and a 8GB SD card, and this makes the eee significantly more useable, and give you some head room with the disk).

Also, please note you will require a USB keyboard to install the thing, but there is a dodgy fix below that gets around this problem once Solaris is installed.

Installing Solaris Express

My first goal was to get a working GUI, then to get the thing on the internet (ideally using 3G).

First attempted using Jumpstart - no good, Solaris doesn’t support the atheros 10/100Mb on board adapter in the jumpstart environment

Second attempt - boot from USB DVD - works (make sure you attach a USB keyboard for the install and first boot!)

Install questions:

1. Use Console Session Jumpstart Interactive

2. When it gets to disk layout, do it manually and create your own partition layout, making / the entire disk (the only way you’ll fit all this on the 4GB drive) on s0 - unless you use additional storage, in which case you don’t have to be so harsh with the “all root” or nothing approach.

3. Use the “End User” Cluster (Claims to require over 4GB, it doesn’t), and preferably remove some packages (there’s only so much room on this thing!) I will post a stripped down profile to assist with this once I have one created, so maybe we can get some swap space even on the base 4GB drive.

Once you get through this, it will take its time and to the install.

Once its up, the first thing you’ll want to do is fix the keyboard, as it does not work reliably out of the box.

Fixing the in built keyboard not working on the eeePC

Write a quick script to fix the keyboard issue at boot (Found a hint that guided me in the right direction after doing some mucking around myself and finding the issue and that a keyboard driver was happily attached [grrr], turns out a modunload and a modload fixes the problem - thanks for the hint from timf http://forum.eeeuser.com/viewtopic.php?pid=146353):

#!/bin/sh
modunload -i `modinfo|awk '/kb8042/ {print$1}'
devfsadm -i kb8042
modload /kernel/drv/kb8042

I saved this quick script as /etc/init.d/fixKB and linked it in /etc/rc2.d/S99fixKB, this fixes it on each boot.

Now onto making the nokia 6120 modem work via USB with Virgin Mobile…

Using a USB cable to a Nokia 6120 Classic (and many other nokia phones) for Internet Access (3G)

-Attach the DVD drive again, with Solaris install CD inside

-Install the SUNWpppd* packages:

pkgadd -d /media/SOL11_X86_1/Solaris_11/packages SUNWpppd SUNWpppdr SUNWpppdu

-Plug in the nokia and link the device in the dev tree:

ln -s /dev/term/0 /dev/nokia

-Edit /etc/ppp/peers/nokia:

modem
nokia
460800
noauth
nodetach
noipdefault
usepeerdns
defaultroute
hide-password
connect '/usr/bin/chat -V -t15 -f /etc/ppp/nokia-chat'

-Edit /etc/ppp/nokia-chat:

'' 'ATZ'
'OK' 'ATE0V1'
'OK' 'AT+CGDCONT=,,"virgininternet"'
'OK' 'ATD*99#'
CONNECT ''

-Reboot (SUNWpppd needs the ppp driver loaded in the kernel, you could probably load this manually)

-Edit /etc/resolv.conf:

nameserver 61.88.88.88

-Edit /etc/nsswitch.conf, adding “dns” to the lines “hosts”:

hosts: files dns

Now, to start using the internet:

pppd call nokia

Compiz Fusion on the eeePC

You can install Compiz quite easily thanks to Erwann Chénedé at Sun http://blogs.sun.com/erwann/entry/new_easy_install_bundle_for

However, though it basically works, it is a bit buggy in my experience on both my laptop and on the eeePC. Never the less, I have posted a quick you tube video so you can see compiz in action on the eeePC.


Building a Sun Cluster using Solaris 10, on VMware Server

November 22nd, 2007

One of the things I’ve done on my week off is get Sun Cluster working on VMware Server. There’s a few small tricks to it, but generally it hasn’t been as horrible as the first time I did it many months ago on VMware (a 4 node cluster, with panics galore - no fun).

NOTE: To run Sun Cluster on Solaris 10, you will need to be running your VMs as 64-bit guests (Sun Cluster, on Solaris 10, on x86/x64 DOES NOT run on Solaris 10 32-bit - found this out the hard way a long time ago!).

To do 64-bit VMs, only some CPUs are supported. In this case I am using a relatively new AMD 64 x2, and thats perfect for this purpose. In this build I’m using Solaris u3 is because u4 seems to have a few issues on VMware (lots of kernel panics on boot I’ve found), and Sun Cluster 3.1 8/05 (u4), even though Solaris Cluster 3.2 is out, Sun Cluster 3.1u4 is still what most things are certified against. I will build a 3.2 cluster at some point later on.

So, lets start with configuring VMware:

-Configure at least 2 additional host based networks, on Linux you will need to run

vmware-config.pl

I have configured several more (seeing as it is easy to do it all at once).

When it asks you about networking, you want to configure additional host based networks. The scheme I have used is:

172.16.0.0/255.255.255.0

172.16.1.0/255.255.255.0

.

.

172.16.11.0/255.255.255.0

NOTE: What subnets you specify here isn’t that relevant as your host will never talk on these networks, the host to host cluster interconnects will use the interfaces, and will likely use different subnets. Each subnet just needs to be different for each vmnet adapter. You need not use /24’s either, you could go down to a very small subnet size (/30 for example). If this makes no sense to you, do not worry too much, keep going anyway just following what I’ve done above.

Now, create 2 VMs - Typical, Solaris 10 64-bit (will not work if 64-bit is not selected), and in my case I preallocated 32GB of disk.

For each VM:

-Add at least 2 additional ethernet interfaces (e1000gXs) - these will be used for interconnects. Put each one on a different vmnet adapter. I used vmnet1 and vmnet2 in this case. Make sure when you do this, you do the interfaces identically on both VMs as they will need to talk to each other across these interfaces, but nothing else.

-Disable snapshots (for performance)

-Add a single disk in a different location for quorom, to one host only. I added quorom in /virtuals/cluster-disks/mail-store-quorom.vmdk. Quorom should be as small as possible, I believe the smallest disk you can build is ~100meg, so do that (0.1GB)

Boot each node for a tick, then shut it down straight away. This is to build the initial vmx config additional information that is not populated till you bring up a VM (ethernet address for e1000g0, is what we really want to be populated here).

-With each VM shut down, edit the .vmx file for each VM and add in the lines:

Configure shared quorom device:

scsi0:1.present = "TRUE"
scsi0:1.fileName = "/virtuals/cluster-disks/mail-store-quorom.vmdk"
scsi0.sharedBus = "virtual"
disk.locking = "false"

Obviously there’s no need to re-add the filename line for the host you configured the disk on initially.

Next for each VM configure CPU (core) binding - if you don’t do this and you’re using a 64 bit AMD chip, you’ll get some interesting behaviour because the timestamping on each core of these CPUs is different, it messes with Solaris which expects to be on one CPU. Cluster will panic more often if you don’t do this :)

processor0.use = "TRUE"
processor1.use = "FALSE"

And I do the reverse of the above for the other host’s vmx file:

processor0.use = "FALSE"
processor1.use = "TRUE"

-Now kick off a Solaris build using the minimal profile from my last post, or a SUNWCXall (all packages) will do if you don’t mid the extra build time wait and you have disk space up your sleeve. You could of course just do a straight install off the CD, just make sure you use a custom partitioning scheme ensuring there is a 512MB /globaldevices slice of your disk.

-Install Sun Cluster from anything Java Enterprise System 2005Q4 or above (JES 5 at time of writing). Obtain it for free from http://www.sun.com/software/javaenterprisesystem/getit.jsp. All the JES’ from (and including) 2005Q4 have Sun Cluster 3.1u4 (that is 8/05). It’s worth noting Sun Cluster 3.2 is out and has been for a while, but I’m not sure how much stuff is certified against it. I will try it out later, for the moment I’ll go with 3.1u4.

Note: there is nearly no initial config when you install Sun Cluster from the JES installer. I noticed in the most recent release of JES (and possibly some previous releases, I’ve missed a few) it asks if you want to allow Sun Cluster to be configured remotely. For simplicity, answer yes. It makes the cluster config very easy from there.

-Add /usr/cluster/bin to your path for convenience on both hosts

-run scinstall on ONE host (/usr/cluster/bin/scinstall if you did not follow the above step).

You’ll get a menu… with the first item where we want to be.

* 1) Install a cluster or cluster node

So select 1, then..

1) Install all nodes of a new cluster

1 again, then yes to continue

Please select from one of the following options:
1) Typical
2) Custom

Select 1

Then select a cluster name, in this case I’ve gone with mail-store-clus as this is to become a cluster of Sun Messaging Server 6.3 Mail Stores

Next you are asked for other nodes in the cluster, in this case the only other node for me is mail-store1, so I type that in

Node name (Control-D to finish):  mail-store1
 Node name (Control-D to finish):
This is the complete list of nodes:
mail-store0
mail-store1
Is it correct (yes/no) [yes]?

and then ctrl-D, then yes it is correct

Attempting to contact "mail-store1" ... done
Searching for a remote install method ... done
The Sun Cluster framework software is already installed on each of
the new nodes of this cluster. And, it is able to complete the
configuration process without remote shell access.

Looking good so far! Enter to continue.

Select the first cluster transport adapter for "mail-store0":
1) e1000g1
2) e1000g2
3) Other

Go with 1, then the next transport adapter, 2. NOTE: If you have plumb’d these devices, they will not work. These cards need to be unplumb’d in that case.

    Searching for any unexpected network traffic on "e1000g1" ... done
    Verification completed. No traffic was detected over a 10 second
    sample period.

Next up, quorom. This is why we setup the shared disk earlier:

Do you want to disable automatic quorum device selection (yes/no) [no]?

(go with the default, no)

    Is it okay to begin the installation (yes/no) [yes]?

yes, it sure is!

    During the installation process, sccheck(1M) is run on each of the
    new cluster nodes. If sccheck(1M) detects problems, you can either
    interrupt the installation process or check the log files after
    installation has completed.

    Interrupt the installation for sccheck errors (yes/no) [no]?

default is fine, no

and off we go:

  Installation and Configuration

    Log file - /var/cluster/logs/install/scinstall.log.630

    Testing for "/globaldevices" on "mail-store0" ... done
    Testing for "/globaldevices" on "mail-store1" ... done

    Starting discovery of the cluster transport configuration.

    The following connections were discovered:

        mail-store0:e1000g1  switch1  mail-store1:e1000g1
        mail-store0:e1000g2  switch2  mail-store1:e1000g2

    Completed discovery of the cluster transport configuration.

    Started sccheck on "mail-store0".
    Started sccheck on "mail-store1".
    sccheck completed with no errors or warnings for "mail-store0".
    sccheck completed with no errors or warnings for "mail-store1".

    Configuring "mail-store1" ... done
    Rebooting "mail-store1" ...

And the second node reboots, then the first

    Rebooting "mail-store1" ... done

    Configuring "mail-store0" ... done
    Rebooting "mail-store0" ... 

Log file - /var/cluster/logs/install/scinstall.log.630

Rebooting ... 

updating /platform/i86pc/boot_archive...this may take a minute
Connection to mail-store0 closed by remote host.
Connection to mail-store0 closed.

Let the first node boot, and you’ll see a bunch of stuff on the console. Don’t stress, it’s (probably) normal. It is normal to see a few errors at first boot.

Let the cluster sort it’s stuff out (give it a couple of minutes) then run scstat to check the status of the cluster. It should look something like:

-bash-3.00$ scstat
------------------------------------------------------------------

-- Cluster Nodes --

                    Node name           Status
                    ---------           ------
  Cluster node:     mail-store1         Online
  Cluster node:     mail-store0         Online

------------------------------------------------------------------

-- Cluster Transport Paths --

                    Endpoint               Endpoint               Status
                    --------               --------               ------
  Transport path:   mail-store1:e1000g2    mail-store0:e1000g2    Path online
  Transport path:   mail-store1:e1000g1    mail-store0:e1000g1    Path online

------------------------------------------------------------------

-- Quorum Summary --

  Quorum votes possible:      3
  Quorum votes needed:        2
  Quorum votes present:       3

-- Quorum Votes by Node --

                    Node Name           Present Possible Status
                    ---------           ------- -------- ------
  Node votes:       mail-store1         1        1       Online
  Node votes:       mail-store0         1        1       Online

-- Quorum Votes by Device --
                    Device Name         Present Possible Status
                    -----------         ------- -------- ------
  Device votes:     /dev/did/rdsk/d2s2  1        1       Online

------------------------------------------------------------------

-- Device Group Servers --

                         Device Group        Primary             Secondary
                         ------------        -------             ---------

-- Device Group Status --

                              Device Group        Status
                              ------------        ------              

-- Multi-owner Device Groups --

                              Device Group        Online Status
                              ------------        -------------

------------------------------------------------------------------
------------------------------------------------------------------

-- IPMP Groups --

              Node Name           Group   Status         Adapter   Status
              ---------           -----   ------         -------   ------
  IPMP Group: mail-store1         sc_ipmp0 Online         e1000g0   Online

  IPMP Group: mail-store0         sc_ipmp0 Online         e1000g0   Online

------------------------------------------------------------------

And we have a basic, working cluster!

Discovered Problems

Interconnect (”Cluster Transport”) is marked faulted

For example, if you do an scstat, or an scstat -W you see:

  Transport path:   mail-store1:e1000g2    mail-store0:e1000g2    faulted
  Transport path:   mail-store1:e1000g1    mail-store0:e1000g1    Path online

(at boot it might be “waiting” for quite some time)

In some cases you can disconnect and reconnect the adapter in VMware. However, in others you may have to be more drastic.

Check you can ping the other node via this path - if you can, then you should be all good to run the following commands:

 scconf -c -m endpoint=mail-store0:e1000g2,state=disabled

where mail-store0 is your current node, and e1000g2 is the failed adapter. After you’ve done this, you can re-enable it:

scconf -c -m endpoint=mail-store0:e1000g2,state=enabled

And you should now have an online path shortly afterwards:

bash-3.00# scstat -W
-- Cluster Transport Paths --
                    Endpoint               Endpoint               Status
                    --------               --------               ------
  Transport path:   mail-store1:e1000g2    mail-store0:e1000g2    Path online
  Transport path:   mail-store1:e1000g1    mail-store0:e1000g1    Path online

All good!

Cluster Panics with pm_tick delay [number] exceeds [another number]

Try the following:

  1. Stop VMs being paged to disk in VMWare (only use physical memory for your VMs). This is a VMWare server, host setting from memory
  2. Ensure Memory Trimming is disabled for your VMware Server Sun Cluster Guests
  3. On each Cluster node, in order, configure the heartbeats to be father apart, and have a longer timeout:
scconf -c -w heartbeat_timeout=60000 

scconf -c -w heartbeat_quantum=10000

Hopefully this will leave you with a much more stable cluster on VMware.