Johannes Habich

Tuesday, December 1, 2009

Windows HPC2008 Cluster Operational

Today the Windows HPC2008 Cluster of RRZE successfully got operational.

If you are interested in getting access to the system, contact hpc@rrze.uni-erlangen.de

Initial information for Login and usage can be found here:

Windows HPC2008 Cluster Launch Slides

Thursday, November 19, 2009

Java; A quest with unattended installation

Some guidelines for unattended Java installation in an Win2008 HPC Cluster environment:

Deactivate UAC on all nodes; Otherwise the nodes will simple hang, and wait for the UAC acceptance that will never happen. You can omit this by doing the first Java installation by hand via RDesktop login. Afterwards all successive unattended installation will succeed. We have currently no clue why. Perhaps some kind of adaptive UAC?
Best practice is to deactive UAC via a registry key:
%windir%\system32\reg.exe ADD HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System /v EnableLUA /t REG_DWORD /d 1 /f

Reboot the nodes, so that the registry change becomes effective.

All will run smoothly now, if the user installing java was logged in to the nodes at least once. This poses a problem with 20++ cluster nodes. however. The basic point is, that there is no User directory created yet and neither are all temp and AppData paths.
Java kindly ignores any variable defined by the OS, e.g. TEMP or TMP. And gathers its own temp dirs which leads to C:\Users\Username\AppData\LocalLow\Temp and many more.
So the installation fails once more, unless these directories are there.
So you have to create them yourself:
mkdir C:\Users\%USERNAME%\AppData\LocalLow\Temp\

After that the usual JRE unattended deployment should proceed

Note, that any login to the nodes to be installed and any prior to that, java installation can change all of the above experiences

Wednesday, November 18, 2009

Windows HPC 2008 Cluster Launch

RRZE recently extended its Windows High-Performance-Computing Ressources.
Along with upgrading to the latest Windows HPC Server Release 2008, the hardware has been upgraded significantly:

16 dual-socket hexa-core AMD Istanbul Opteron processors (Dell Blade Center enclosure) equipped with 32GB of RAM service a peak performance of 2 TFLOP/s.
AMD Istanbul Die

Interested users are invited to join the official launch on December 1st. 2009 at RRZE room 1.026.
After a quick tour of the new Job Scheduler, the main part is organized as an hands-on session were everyone can make themselves comfortable with the new environment.

A Registration via email to hpc@rrze.uni-erlangen.de is necessary for attending.

WindowsCluster

Designated trademarks and brands are the property of their respective owners

Monday, October 5, 2009

Ganglia 3.1.2 Running as as a service After all

With the help of the srvany.exe from the Windows Ressource Kit Tools 2003 you can run any executable in Win2008 and Win2008 R2 either as a service.

You create yourself a service running solely srvany.exe

sc create GMOND binpath= c:\programme\ganglia\srvany.exe

Edit the service specs in the registry:
LocalMachine--> System\\CurrentControlSet\\Services\\GMond

Add a subkey named Parameters
Inside "Parameters" create a String value named Application.
Edit Application and put the call to ganglia into the value data field.
E.g. c:\programme\ganglia\gmond.exe -c "c:\programme\ganglia\gmond-node.conf "

Start the service over mmc or by sc start GMOND and it should be running.

(There should also be a way to do this with the cygwin service creation tool cygrunsrv. Thanks to Nigel for pointing that out.)

Thursday, September 24, 2009

PCI express pinned Host Memory

Retesting my benchmarks with the current release of Cuda 2.3 I finally incorporated new features like pinned host memory allocation. Specs say that this improves the host to device transfers and vice versa.
Due to the special allocation the arrays will stay at the same location in memory , will not be swapped and are faster available for DMA transfers. In the other case, most data is first copied to a pinned memory buffer and then to the ordinarily allocated memory space. This detour is omitted in this case here.

The performance plot shows, that pinned memory now offers a performance of up to 5.9 GB/s on the fastest currently available PCIe X16 Gen 2 Interface which has a peak transfer rate of 8 GB/s. This corresponds to 73% of Peak performance with almost no optimization applied. In contrast, optimization such as a blocked data transfer, which prooved to increase performance some time ago [PCIe revisited] have no positive effect on performance anymore.

Using only the blocked optimzations without pinned memory still is better then doing an unblocked transfer from unpinned memory, but it only transfers about 4.5 GB/s which corresponds to 56 % of peak to the device.
Reading from the device is far worst with only 2.3 GB/s.

PCIe Bandwidth Measurements GTX280 using pinned Memory

PCIe Bandwidth Measurements GTX280 using pinned Memory

Thursday, August 20, 2009

Drag and Drop not Working in Vista and Windows7

It might occur to you that your favorite Media Player (any other program might do the same) is not able to accept files added per Drag and Drop.
The most probable time for this behavior is right after install, when the installer starts the program the first time.
The installer of course has elevated rights for setup purposes and the program itself, too.
In this case Windows Vista + 7 forbid the drag and drop functionality for security reasons.

In most cases it is enough to just close and start the program again, now in non elevated mode.

You can reproduce this behavior by simply starting a program with elevated rights.

Thursday, July 23, 2009

Cuda 2.3 released

NVIDIA just released Cuda Version 2.3 with the corresponding driver.
F22 @RRZE has already been updated to support this Version.

Thursday, July 16, 2009

Tracing of MPI Programs

Overview

STILL UNDER CONSTRUCTION

To trace MPI programs with the intel mpi tracing capabilities the following steps are at least necessary.
(Note that his guide demands not to be the only way nor to be complete and error proof!)

Tutorial

module load itac

env LD_PRELOAD=/apps/intel/itac/7.2.0.011/itac/slib_impi3/libVT.so mpirun -pernode ./bin/solver ./examples/2X_2Y_2Z_200X200X200c_file.prm

e.g: env LD_PRELOAD=/apps/rrze/lib/ptoverride-ubuntu64.so:/apps/intel/itac/7.2.0.011/itac/slib_impi3/libVT.so mpirun -npernode 2 $MPIPINNING ./bin/solver ./examples/8X_8Y_4Z_800X800X400c_file.prm

Another way of doing this is to run mpiexec -trace ..... (remember this is true for intel MPI)

env LD_PRELOAD=/apps/rrze/lib/ptoverride-ubuntu64.so:/apps/intel/itac/7.2.0.011/itac/slib_impi3/libVT.so mpirun -npernode 2 $MPIPINNING ./bin/solver ./examples/8X_8Y_4Z_800X800X400c_file.prm

Official Intel Docu on that matter

Intel® Trace Analyzer and Collector for Linux* OS
Getting Started Guide
Overview
To simplify the use of the Intel® Trace Analyzer and Collector, a set of environmental scripts is provided to you. Source/execute the appropriate script (/bin/itacvars.[c]sh) in your shell before using the software. For example, if using the Bash shell:

$ source /bin/itacvars.sh # better added to $HOME/.profile or similar

The typical use of the Trace Analyzer and Collector is as follows:

* Let your application run together with the Trace Collector to generate one (or more) trace file(s).
* Start the Trace Analyzer and to load the generated trace for analysis.

Generating a Trace File
Generating a trace file from an MPI application can be as simple as setting just one environment variable or adding an argument to mpiexec. Assume you start your application with the following command:

$ mpiexec -n 4 myApp

Then generating a trace can be accomplished by adding:

$ LD_PRELOAD=/slib/libVT.so mpiexec -n 4 myApp

or even simpler (for the Intel® MPI Library)

$ mpiexec -trace -n 4 myApp

This will create a set of trace files named myApp.stf* containing trace information for all MPI calls issued by the application.

If your application is statically linked against the Intel® MPI Library you have to re-link your binary like this:

$ mpiicc -trace -o myApp # when using the Intel® C++ Compiler

or

$ mpiifort -trace -o myApp # when using the Intel® Fortran Compiler

Normal execution of your application:

$ mpiexec -n 4 myApp

will then create the trace files named myApp.stf*.
Analyzing a Trace File
To analyze the generated trace, invoke the graphical user interface:

$ traceanalyzer myApp.stf

Read section For the Impatient in the Trace Analyzer Reference Guide to get guidance on the first steps with this tool.

Wednesday, July 8, 2009

Cuda Machines @ RRZE

This information will not be updated any more. Please visit our official page as we provide GPU computing now as a cluster ressource:

RRZE HPC Services

Currently the available CUDA test systems @ RRZE are:

lightning: (available with upgraded hardware)
Ubuntu 8.04 x86_64
2x Quadcore Intel Clovertown (2,33 GHz), 4 MB L2 pro 2 Cores,
GeForce 8800 Ultra (768 MB) (G80 core)
Cuda Driver Version: 180.22
Cuda Toolkit: 2.0

f22: (Last Update 29.09.09)
Ubuntu 8.04 x86_64
2x Quadcore Intel Xeon L5420 (2.5 GHz)
GeForce GTX 280 SC (1 GB) (GT200 Core)
Current: Cuda Driver Version: 190.29(Cuda2.3) --> with OpenCL Support!
Before: Cuda Driver Version: 190.16 (Cuda2.3)
Cuda Toolkit: 2.3

Tuesday, July 7, 2009

Cuda Tutorial @ RRZE

Currently we have two test systems running different GPUs from NVIDIA inside the testcluster environment.

Please apply for a HPC account at RRZE (ask your local administrator) .

You get access to one of the machines by issuing either a job script or by requesting an interactive shell, e.g.:

qsub -I -lnodes=f22:ppn=8,walltime=01:00:00

Note, that interactive sessions are limited to one hour, but it is the recommended way to try things out in the beginning

The module system now supplies you with various versions of compilers and CUDA Versions, e.g.

module load cuda/2.2

Next thing you wanna try is compiling the SDK examples.
- Therefore, download the SDK matching the CUDA version you want to use (please chek wether it is available too!) and extract it to some directory by running it.
- The cuda path you have to specify (not the install path!) is /usr/local/cudaXX were XX is the version and the architecture (e.g. -32 ).
- Then enter the directory you extracted to and type make. It should compile, if it doesn't please look to /usr/local/cudaXX/bin/linux/release/. If you find executables in there and you can acutally run them, Then somewhere in your settings is a mistake. If you are trying to compile in 32bit mode, please contact us at hpc@rrze.uni-erlangen.de because then you would need further assistance.

Assuming compilation went well (went well = no errors; We neglect the warnings here), you should have runable SDK examples in /bin/release/linux/

Now your basic CUDA environment is set up and ready to go for your own codes.