Johannes Habich

Thursday, August 20, 2009

Drag and Drop not Working in Vista and Windows7

It might occur to you that your favorite Media Player (any other program might do the same) is not able to accept files added per Drag and Drop.
The most probable time for this behavior is right after install, when the installer starts the program the first time.
The installer of course has elevated rights for setup purposes and the program itself, too.
In this case Windows Vista + 7 forbid the drag and drop functionality for security reasons.

In most cases it is enough to just close and start the program again, now in non elevated mode.

You can reproduce this behavior by simply starting a program with elevated rights.

Thursday, July 23, 2009

Cuda 2.3 released

NVIDIA just released Cuda Version 2.3 with the corresponding driver.
F22 @RRZE has already been updated to support this Version.

Thursday, July 16, 2009

Tracing of MPI Programs

Overview

STILL UNDER CONSTRUCTION

To trace MPI programs with the intel mpi tracing capabilities the following steps are at least necessary.
(Note that his guide demands not to be the only way nor to be complete and error proof!)

Tutorial

module load itac

env LD_PRELOAD=/apps/intel/itac/7.2.0.011/itac/slib_impi3/libVT.so mpirun -pernode ./bin/solver ./examples/2X_2Y_2Z_200X200X200c_file.prm

e.g: env LD_PRELOAD=/apps/rrze/lib/ptoverride-ubuntu64.so:/apps/intel/itac/7.2.0.011/itac/slib_impi3/libVT.so mpirun -npernode 2 $MPIPINNING ./bin/solver ./examples/8X_8Y_4Z_800X800X400c_file.prm

Another way of doing this is to run mpiexec -trace ..... (remember this is true for intel MPI)

env LD_PRELOAD=/apps/rrze/lib/ptoverride-ubuntu64.so:/apps/intel/itac/7.2.0.011/itac/slib_impi3/libVT.so mpirun -npernode 2 $MPIPINNING ./bin/solver ./examples/8X_8Y_4Z_800X800X400c_file.prm

Official Intel Docu on that matter

Intel® Trace Analyzer and Collector for Linux* OS
Getting Started Guide
Overview
To simplify the use of the Intel® Trace Analyzer and Collector, a set of environmental scripts is provided to you. Source/execute the appropriate script (/bin/itacvars.[c]sh) in your shell before using the software. For example, if using the Bash shell:

$ source /bin/itacvars.sh # better added to $HOME/.profile or similar

The typical use of the Trace Analyzer and Collector is as follows:

* Let your application run together with the Trace Collector to generate one (or more) trace file(s).
* Start the Trace Analyzer and to load the generated trace for analysis.

Generating a Trace File
Generating a trace file from an MPI application can be as simple as setting just one environment variable or adding an argument to mpiexec. Assume you start your application with the following command:

$ mpiexec -n 4 myApp

Then generating a trace can be accomplished by adding:

$ LD_PRELOAD=/slib/libVT.so mpiexec -n 4 myApp

or even simpler (for the Intel® MPI Library)

$ mpiexec -trace -n 4 myApp

This will create a set of trace files named myApp.stf* containing trace information for all MPI calls issued by the application.

If your application is statically linked against the Intel® MPI Library you have to re-link your binary like this:

$ mpiicc -trace -o myApp # when using the Intel® C++ Compiler

or

$ mpiifort -trace -o myApp # when using the Intel® Fortran Compiler

Normal execution of your application:

$ mpiexec -n 4 myApp

will then create the trace files named myApp.stf*.
Analyzing a Trace File
To analyze the generated trace, invoke the graphical user interface:

$ traceanalyzer myApp.stf

Read section For the Impatient in the Trace Analyzer Reference Guide to get guidance on the first steps with this tool.

Wednesday, July 8, 2009

Cuda Machines @ RRZE

This information will not be updated any more. Please visit our official page as we provide GPU computing now as a cluster ressource:

RRZE HPC Services

Currently the available CUDA test systems @ RRZE are:

lightning: (available with upgraded hardware)
Ubuntu 8.04 x86_64
2x Quadcore Intel Clovertown (2,33 GHz), 4 MB L2 pro 2 Cores,
GeForce 8800 Ultra (768 MB) (G80 core)
Cuda Driver Version: 180.22
Cuda Toolkit: 2.0

f22: (Last Update 29.09.09)
Ubuntu 8.04 x86_64
2x Quadcore Intel Xeon L5420 (2.5 GHz)
GeForce GTX 280 SC (1 GB) (GT200 Core)
Current: Cuda Driver Version: 190.29(Cuda2.3) --> with OpenCL Support!
Before: Cuda Driver Version: 190.16 (Cuda2.3)
Cuda Toolkit: 2.3

Tuesday, July 7, 2009

Cuda Tutorial @ RRZE

Currently we have two test systems running different GPUs from NVIDIA inside the testcluster environment.

Please apply for a HPC account at RRZE (ask your local administrator) .

You get access to one of the machines by issuing either a job script or by requesting an interactive shell, e.g.:

qsub -I -lnodes=f22:ppn=8,walltime=01:00:00

Note, that interactive sessions are limited to one hour, but it is the recommended way to try things out in the beginning

The module system now supplies you with various versions of compilers and CUDA Versions, e.g.

module load cuda/2.2

Next thing you wanna try is compiling the SDK examples.
- Therefore, download the SDK matching the CUDA version you want to use (please chek wether it is available too!) and extract it to some directory by running it.
- The cuda path you have to specify (not the install path!) is /usr/local/cudaXX were XX is the version and the architecture (e.g. -32 ).
- Then enter the directory you extracted to and type make. It should compile, if it doesn't please look to /usr/local/cudaXX/bin/linux/release/. If you find executables in there and you can acutally run them, Then somewhere in your settings is a mistake. If you are trying to compile in 32bit mode, please contact us at hpc@rrze.uni-erlangen.de because then you would need further assistance.

Assuming compilation went well (went well = no errors; We neglect the warnings here), you should have runable SDK examples in /bin/release/linux/

Now your basic CUDA environment is set up and ready to go for your own codes.

Monday, July 6, 2009

Taming Literature, Zotero Firefox Plugin

Again and again I felt the urge to have a real tool organize my literature instead of crawling inside bibtex files with every new paper or report again.

So recently I heard something about SCOPUS from the publisher ELSEVIER, Citavi is free for members of our University but all lack some features.

The largest problem however is, how to import bibliographys from webpages and similar "free form" sources.

Finally I stumbled across the Firefox extension Zotero.
Once installed its active in the background and is only visible by clicking its logo in the lower window bar. If you now visit a site which has bibliography similar input, a small icon appears in the address bar which exhibits all literature found on this page. With a selection and by hitting ok, you now import one or more books, articles aso. Into the Zotero DB. Furthermore you can search in the DB by tags and other fields. Can organize categories and subcategories and output to RTF aso. and of course bibtex files.
You can as well import bibtex files to appreciate your hard work generating these files by hand in the past :-).

Zotero is of course not free from flaws and some webpages, even that ones which Zotero has optimized parsers for, provide information with errors till the next parser release comes out.

I've not figured out how to efficiently use this on multiple Computers at the same time, but I'm sure there is a solution as well.

Friday, July 3, 2009

Archive test

Wednesday, June 24, 2009

EIHECS roundup

Geballte HPC-Expertise in Erlangen
I myself got the nice little Lenovo S10e Ideapad for being second in the Call for Innovative Multi- and Many-Core Programming.

Currently I'm running Windows 7 RC1 with no problems at all but getting away from it.

Lectures

G.Wellein G. Hager, J. Habich: Lecture on Programming Techniques for Supercomputers PTfS, Summer Term 2011.

G. Hager, J. Habich: Tutorials for the lecture on Programming Techniques for Supercomputers PTfS, Summer Term 2011.

G.Wellein G. Hager, J. Habich: Lecture on Programming Techniques for Supercomputers PTfS, Summer Term 2010.

G. Hager, J. Habich: Tutorials for the lecture on Programming Techniques for Supercomputers PTfS, Summer Term 2010.

G. Hager, J. Habich: Tutorials for the lecture on Programming Techniques for Supercomputers PTfS, Summer Term 2009.

Friday, June 19, 2009

MPI/OpenMP Hybrid pinning or no pinning that is the question

Intro

Recently performed benchmarks of a hybrid-parallelized flow solver showed what one has to consider in order to get best performance.
On the theoretical side, hybrid implementations are thought to be most flexible and still maintain high performance. This is because, one thinks that OpenMP is perfect for intranode communication and faster than MPI there.
Between nodes, MPI anyway is the choice for portable distributed memory parallelization.

In reality however not few MPI implementations already use shared memory buffers when communicating with other ranks in the same shared memory system. So basically there is no advantage between a parallelization of MPI and OpenMP on the same level, when using MPI nevertheless for internode communication.

Quite contrary, it means apart from the additional implementation, a lot of more understanding of processor and memory hierarchy layout, thread and process affinity to cores than the pure MPI implementation.

Nevertheless there are scenarios, where hybrid really can pay off, as MPI lacks the OpenMP feature of accessing shared data in shared caches for example.

Finally if you want to run your hybrid code on RRZE systems, there are the following features available.

Pinning of MPI/OpenMP hybrids

I assume you use the mpirun wrapper provided

mpirun -pernode issues just one MPI process per node regardless of the nodefile content

mpirun -npernode issues just n MPI processes per node regardless of the nodefile content

mpirun -npernode 2 -pin 0,1_2,3 env LD_PRELOAD=/apps/rrze/lib/ptoverride-ubuntu64.so Issues 2 MPI processes per node and gives threads of the MPI process 0 just access to core 0 and 1 and threads of MPI process 1 access to cores 2 and 3 (of course the MPI processes themselves are also limited to that cores). Furthermore the , e.g. OpenMP, threads are pinned to one core only, so that migration is no longer an issue

mpirun -npernode 2 -pin 0_1_2_3 is your choice if you would like to test 1 OpenMP thread per MPI process and 4 MPI processes in total per node. Adding the LD_PRELOAD from above however decreases performance a lot. This is currently under investigation.

export PINOMP_MASK=2 changes the skip mask of the pinning tool

OpenMP spawns not only worker threads but also threads for administrative business as synchronization etc. Usually you would only pin the threads, contributing to the computation. The default skip mask, skipping the non computationally intensive threads, might not be correct in the case of hybrid programming, as MPI as well spawns non-worker threads. The PINOMP_MASK variable is hereby interpreted like a bitmask, e.g. 2 --> 10 and 6 --> 110. A zero means to pin the thread and a 1 means to skip the pinning of the thread. The least significant bit hereby corresponds to thread zero (bit 0 is 0 in the examples above ) .

6 was used in the algorithm under investigation as soon as one MPI process and 4 OpenMP worker threads were used per node, to have the correct thread pinning.

The usage of the rankfile for pinning hybrid jobs is described in Thomas Zeisers Blog

Thanks to Thomas Zeiser and Michael Meier for their help in resolving this issue.

Keyowords: Thread Pinning Hybrid Affinity

Incorporated Comments of Thomas Zeiser

Thomas Zeiser, Donnerstag, 23. Juli 2009, 18:15

PINOMP_MASK for hybrid codes using Open-MPI

If recent Intel compilers and Open-MPI are used for hybrid OpenMP-MPI programming, the correct PINOMP_MASK seems to be 7 (instead of 6 for hybrid codes using Intel-MPI).

Thomas Zeiser, Montag, 22. Februar 2010, 20:41
PIN OMP and recent mvapich2

Also recent mvapich2 requires special handling for pinning hybrid codes: PINOMP_SKIP=1,2 seems to be appropriate.