Wednesday, May 20, 2009

OpenMP Fortran

I'm currently investigating, that a scalar (integer 4 byte) variable cannot be defined as FIRSTPRIVATE inside an !$OMP Parallel section.
This will cause abnormal program abortion, seg faults and undefined behavihor.
However, defining the varibale as PRIVATE works and SHARED of course, too.
Hopefully a small code snippet will provide more insight.

Personal

My personal Homepage

HPC Links

RRZE HPC Group Page

RRZE HPC lattice Boltzmann activities

KONWIHR 1 and 2

Wednesday, April 29, 2009

Ganglia 3.1.2 for Windows HPC2008


Recent tests of the windows ported ganglia on Microsoft Windows HPC 2008, obtained from APR Consulting web page, showed a problem.
After a few minutes of runtime, the ganglia executable eats up more and more memory till the systems starts to swap, finally becomes unstable and crashes or is no longer reachable.
Not able to deploy ganglia to the cluster I tested different releases from APR and none of them had the problem running on Win2003 x64, however all showed the same memory leak problem on HPC2008x64 or just didn't work at all.
So finally we compiled our own Cygwin based gmond.exe binary and came up with a pretty stable version, with just one flaw:
Till now the installation as a service doesn't work, neither with gmondservice.exe from APR Consulting nor with the windows native tool sc.exe.
However the installation with schtasks.exe as a scheduled task to run once on startup and then daemonize (thats what Linux calls a service), works fine.
In addition a pure swap of the executables or the config file, will now result in an updated ganglia once the node reboots or a task restart is triggered instead of removing and reinstalling a service.
All steps of deployment can be easily done with the clusrun extension, which is essential for cluster administration.




Small tutorial

(all links are below, drop a comment if something is missing/wrong)

  • Download a ganglia version (3.1.2 Langley worked indeed very well)

  • Download and install cygwin with a gcc and g++ compiler and the additional packages mentioned in the README.WIN file of the ganglia package

  • currently:
    libapr1, expat, diffutils, gcc, make, python, sharutils, sunrpc
    and for libconfuse:
    libiconv
  • Do: ./configure make make install in the root directory of the confuse lib

  • Perhaps you have to exclude the examples from the build:
    replace line: SUBDIRS = m4 po src examples tests doc with
    SUBDIRS = m4 po src tests doc
    They throwed an error on my system.

  • Do: ./configure --with-libconfuse=/usr/local --enable-static-build and make in the root of ganglia

  • With some additional dll files from cygwin, your release is now runnable. Just start the gmond.exe and look into the Event viewer which dll is missing and place them in the same folder or in a folder which is in the PATH.



Please note, that this a x86_32 binary and not x64, due to the fact that cygwin is not x64.
It should however be possible to build ganglia with the Windows Services for Unix to native x64.



Links:

Corresponding discussion in HPC2008 MS Forum
Cygwin
Ganglia
confuse library
APR Consulting web page

Thursday, January 22, 2009

Were is my "My Desktop" button

In order to get the "My Desktop" button back , e.g.on Windows Terminal servers, just execute the following command:



regsvr32 /n /i:U shell32



With the next reboot or upon restart of the Quick launch bar, the icon should appear.

Tuesday, December 16, 2008

Windows CCS Cluster Upgrade

Recently the Windows CCS Cluster of the RRZE got a small upgrade.

One of the initial nodes rejoined the cluster and there are now 28 Opteron Cores available again.
Due to the usage of CFD for production runs, the user home was recently upgraded and the quota was extended to 10 GB per user.
Furthermore for special purposes and a limited amount of time there is an extra project home available with up to 120 GB space for extensive usage.

Monday, December 15, 2008

PCI express revisited

Test results with the new generation, i.e. GT 200 based and PCIe Generation 2.0 with doubled performance, show that general naive implemented copys do not get any speedups.
Blocked copys however, climb up to 4.5 GB/s when writing data to GPU memory.
Data copy back to the host is still relatively low at 2 GB/s.

pcix bandwidth measurements 8800 gtx vs. gtx 280



Link to first article

Monday, December 8, 2008

Fast Network, Fast disconnects (Linksys WRT610N )

Fast Network, Fast disconnects (Linksys WRT610N )
Looking forward to fast streaming HD Media over my new wireless router (WRT610N) I got into serious trouble on having a stable connection at all.

Having my network set up for WPA2 and TKIP for compatibility reasons, I got random disconnects of the whole 5 GHz band, while 2.4 GHz performed flawlessly. Searching the internet I stumbled across some serious accusations, that the WRT610N is a flawed design and overheats a lot.
Whether this is right or not I cannot say for sure, however I expected much more from Linksys and a home premium line product.

Searching a little more I came across another users experience that a change from the TKIP encryption to AES solved the problem of occuring disconnects.

And voila the problems seems to be solved.

So for everyone who can live with an AES only encryption on the 5 GHz 11N band and TKIP or AES on the 2.4 GHz 11g band the router is a perfect catch in both performance and appearance.

Monday, November 24, 2008

Yeehhaa: NVIDIA GT200 rocks

An exemplar of the new NVIDIA Series GT200 based GTX280 Graphics card arrived at our Computing Center last Friday . The card was installed and set up right away and the first benchmark ran on Saturday 22nd of November and finished today.

Some preliminary figures show the great improvement of this new generation as I expected from the data sheets. Soon I will post some verified results here and some about the changes from the G80 generation to the current GT200 chip.

Friday, November 7, 2008

Running MPI Jobs on Windows CCS

In order to run only one MPI process per allocated node on the Windows CCS Cluster, you have to tweak the system variable set by the scheduler. For each allocated processor the system variable (CCP_NODES) contains the associated hostname once.
As a consequence, four MPI processes are started.

In order to remove the redundant hostnames you call your program the following way from inside the scheduler:
mpiexec.exe -hosts %CCP_NODES: 4= 1%

%CCP_NODES: 4= 1% removes three out of four lines, which reduces each hostname down to one occurence, as the same hostnames are always consecutive.