Wednesday, April 29, 2009

Ganglia 3.1.2 for Windows HPC2008

Recent tests of the windows ported ganglia on Microsoft Windows HPC 2008, obtained from APR Consulting web page, showed a problem.
After a few minutes of runtime, the ganglia executable eats up more and more memory till the systems starts to swap, finally becomes unstable and crashes or is no longer reachable.
Not able to deploy ganglia to the cluster I tested different releases from APR and none of them had the problem running on Win2003 x64, however all showed the same memory leak problem on HPC2008x64 or just didn't work at all.
So finally we compiled our own Cygwin based gmond.exe binary and came up with a pretty stable version, with just one flaw:
Till now the installation as a service doesn't work, neither with gmondservice.exe from APR Consulting nor with the windows native tool sc.exe.
However the installation with schtasks.exe as a scheduled task to run once on startup and then daemonize (thats what Linux calls a service), works fine.
In addition a pure swap of the executables or the config file, will now result in an updated ganglia once the node reboots or a task restart is triggered instead of removing and reinstalling a service.
All steps of deployment can be easily done with the clusrun extension, which is essential for cluster administration.

Small tutorial

(all links are below, drop a comment if something is missing/wrong)

  • Download a ganglia version (3.1.2 Langley worked indeed very well)

  • Download and install cygwin with a gcc and g++ compiler and the additional packages mentioned in the README.WIN file of the ganglia package

  • currently:
    libapr1, expat, diffutils, gcc, make, python, sharutils, sunrpc
    and for libconfuse:
  • Do: ./configure make make install in the root directory of the confuse lib

  • Perhaps you have to exclude the examples from the build:
    replace line: SUBDIRS = m4 po src examples tests doc with
    SUBDIRS = m4 po src tests doc
    They throwed an error on my system.

  • Do: ./configure --with-libconfuse=/usr/local --enable-static-build and make in the root of ganglia

  • With some additional dll files from cygwin, your release is now runnable. Just start the gmond.exe and look into the Event viewer which dll is missing and place them in the same folder or in a folder which is in the PATH.

Please note, that this a x86_32 binary and not x64, due to the fact that cygwin is not x64.
It should however be possible to build ganglia with the Windows Services for Unix to native x64.


Corresponding discussion in HPC2008 MS Forum
confuse library
APR Consulting web page